Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python quirks (lshift.net)
111 points by luu on July 12, 2013 | hide | past | favorite | 56 comments


I'm surprised using mutable value for a default in a keyword argument wasn't mentioned. This certainly tripped me up early in learning python:

    def fn(x, my_dict={}):
        my_dict[x] = x * 2
        return my_dict

    >>> fn(1)
    {1: 2}
    >>> fn(2)
    {1: 2, 2: 4}
(I would have expected the second call to return {2: 4} when I was learning python)


This was discussed pretty recently in this thread on "Python Newbie Mistakes, Part 1".

Default values for functions in Python are instantiated when the function is defined, not when it’s called.

Thread: https://news.ycombinator.com/item?id=5999772

Direct link: http://blog.amir.rachum.com/post/54770419679/python-common-n...

It looks like the author posted a "part 2" to HN as well, but it never made the front page.

http://blog.amir.rachum.com/post/55024295793/python-common-n...


This is pretty bizarre, and seems like it would cause a memory leak if you don't know what you're actually doing. You're allocating a my_dict object on the heap, in a field of the function object, so my_dict never goes out of scope and never gets garbage collected until the function fn itself does, right?

Whereas if you did this instead:

    def fn(x):
        my_dict={}
        my_dict[x] = x * 2
        return my_dict
You'd get what you'd expect--a new my_dict object gets created, returned, and then goes out of scope every time fn is called, so it would get garbage collected once there are no more references to that return value. (I think...)

(I don't know that much about how memory allocation and GC works in Python yet, just trying to learn!)


this is not at all bizarre, you just need to know what dynamic means. the best answer i could come up with is "definition is execution". for an enlightening moment, see this piece of code:

    def a():
        print "a called"
        return []

    def fn(x=a()):
        x.append(1)
        print x

    fn()
    fn()
    fn()
i suggest typing this directly into the interpreter instead of a script for better effect.


You got me thinking about how python differs from lisp in this respect, and the subtle way that a functional API has helped this specific situation.

While lisp suffers from the same "definition is execution" gotcha, the effects are far rarer in practice because () is immutable and interned while [] is mutable and usually generated afresh each time it's executed.

  $ python
  >>> [] is []
  False

  $ sbcl
  * (eq () ())
  t
Since [] is mutable, appends of [] can do superficially 'the right thing'.

  >>> a = []
  >>> a.append(4)
  >>> a
  [4]

  * (setq a ())
  * (nconc a '(34))
  (34)
  * a
  ()
But there's a reason lisp does seemingly the wrong thing. According to the spec, nconc skips empty arguments (http://www.lispworks.com/documentation/HyperSpec/Body/f_ncon...) Reading between the lines, I'm assuming this makes sense from a perspective in lisp where we communicate even with 'destructive' operations through their return value. This is more apparent when you consider nreverse:

  * (setq a '(1 2 3 4))
  * (nreverse a)
  (4 3 2 1)
  * a
  (1)
Destructive operations can reuse their input, but they're not required to maintain bindings. There is no precise equivalent of python's .reverse(). Instead, a common idiom is:

  * (setq a (nreverse a))
It seems like a weird design decision, but one upshot of it besides encouraging a more functional style is that this optional-arg gotcha loses a lot of its power in lisps. It's very rare to define a default param of a non-empty list, and empty lists can't be modified without assigning to them.


Great explanation!


Basically. The idiomatic solution for this in python is:

  def fn(x, my_dict=None):
     if my_dict is None:
         my_dict = {}
      ...


You can use a sentinel if you want to allow None as an argument. More on that, along with a way to use mutable defaults productively:

http://effbot.org/zone/default-values.htm


I never did it myself, but this behavior is very useful if you want something to be used across invocations, like a cache.

To limit its size, you can use the ideas on http://stackoverflow.com/questions/2437617/limiting-the-size...


In other words, defaults are static (an in C) and initialized when the function is defined, not when it's called.


It is perhaps more correct to say "when the function is instantiated". Functions in python aren't exactly "defined". Functions are objects same as everything else. The "def" keyword is how you call the constructor for a function object. The kwarg default values are evaluated once at object construction time and saved.

The "def" statement may construct different distinct function objects from a single definition. Each distinct function object has different default value objects, but each function object reuses its own default value objects each time it's called.

Consider this:

    def outer_fn():
        def inner_fn(foo={}):
            # The id() function returns an internal
            # object identifier.  Different objects 
            # have different ids.
            print id(foo)
        return inner_fn
    
    inner_fn_1 = outer_fn()
    inner_fn_2 = outer_fn()
    inner_fn_1()
    inner_fn_1()
    inner_fn_2()
    inner_fn_2()


You are absolutely correct, of course. I was using a C-ism.


So it's instantiating the dict in the declaration and recycling it between invocations? I wouldn't have expected that either.


"def" in Python is an executable statement, not a declaration- it creates a function object that gets bound to the specified name in the local namespace. The defaults are evaluated when the function is created.


Rather the dict is bound during "compile" time, and not at each individual invocation (which would be much better, IMO). And during each invocation, it doesn't get re-bound.


it can't be, that wouldn't make sense. "def" is no different than "for" or "if".


It's useful for memoization though and is where I've typically used it.


> Tuples constructor

Confusion is avoided by understanding that comma ',' not paren '()' is the tuple constructor.

> Python doesn’t have multiline comments. Instead, multiline strings are used.

No the are NOT! Don't use strings for comments dammit.


PEP 257 recommends using triplequoted strings for multiline docstrings:

http://www.python.org/dev/peps/pep-0257/#multi-line-docstrin...


A docstring is not a comment. A docstring is actually a string that gets saved on the object, where as a comment is not used at all.


Right, and comments and documentation serve different purposes. But appropriate docstrings will greatly reduce the need for comments.


> No the are NOT! Don't use strings for comments dammit.

You're gonna have to back that up with a reason. I'm tired of pedantry for pedantry's sake.


Docstrings are (generally)available at the runtime, comments aren't: they are bypassed early in the parsing. They are completely different beasts (in python like in other languages).


I understand that they are different technically, but given that the performance implications are going to be nil for most people and docstrings have a similar function as comments, is it really necessary to be on such a high horse over something so minor?

I'll answer my own question: no.


Strings are not intended to act as comments. Docstrings are intended to act as docstrings. If you need inline comments beyond what is in your docstrings, there is no reason not to use Python's comment character as designed.


I love it: https://twitter.com/gvanrossum/status/112670605505077248

No offense, but your arrogant "knows it all" attitude is what annoys a lot of people about this community.


It's tangential to the original discussion, but that attitude nearly turned me off of Python entirely when I was first starting. I'd pop onto IRC or a discussion community and ask a question about something, explaining that I was new and the most common responses were:

1. Read the docs and figure it out yourself.

2. Why are you doing X? Only an intellectually feeble person would do X -- normal people do Y.

3. That's a waste of time and I'm not going to tell you how to do it because X, Y, and Z.


Sorry to hear that man. I hope you kept up with it. I find it to be so enjoyable and productive to work with.


The sectioned titled "Inconsistent get interface" compares get() to getattr() which are two unrelated functions. getattr is the same as a property lookup on an object (person.name, person['name']), get() is a method defined by some types which returns a stored value.

In the provided example he calls get on an empty dictionary for the key 1, then calls getattr of 'a' on an int. Finally he calls it again with an optional default argument of None.

The difference is made apparent by the example:

In [13]: test = {'values': 1}

In [14]: getattr(test, 'values') Out[14]: <function values>

In [15]: test.get('values') Out[15]: 1


Many of these "quirks" are just odd expectations on the author's part that were not fulfilled, but he does describe a few interesting odds and ends. And I agree completely about modules, importing, eggs, namespaces, and all that mess. One of the most refreshing things about learning Clojure after using Python for a while was the relative absence of confusion surrounding these issues, aided by Leiningen.


I think these "quirks" hit on legitimate inconsistencies and confusing interpretations. Most of his expectations seem pretty commonplace.


One of my favorite tomfooleries in Python is this:

>>> True, False = False, True

It doesn't have much practical effect, since most logical tests don't use the True and False constants directly. But it's a good way to perplex the unwary.


In Python 3.x, this gives me a "SyntaxError: assignment to keyword" error.


They changed the language definition for Python 3 to make True and False keywords, unlike in previous versions.


import random

if random.randint(0,1): (True,False) = (False,True)


(Some dialects of) Smalltalk had `true become: false`, which not only changed the names of `true` and `false`, but also all references to `true` are replaced with references to `false`


rev_null, you are dead. Make a new account.


I don't consider the sort example to be a quirk at all, though some of the other examples are reasonable. It's more a definition of what methods are supposed to do in the first place: a behavior that is applied on the instance of the object. Unless you're doing functional programming (or implementing certain specialized design patterns), you wouldn't expect dog.move() to return a new dog, so why should list.sort() return a new list?


I always figured it was because "sort" is a verb. Over time I've come to feel that functions named with a verb phrase ought to be procedure-like, in that they have side effects and return either nothing (which in python means None) or some kind of error indicator (and in python you'd generally use exceptions instead).

There are exceptions (e.g., the "get" prefix, used for getters, which tend to return values and have no side-effects), and, as with any rule of thumb, better to break it than create something ugly. But as rules go, I've found this one a pretty useful one to stick to.


I think this is where Ruby shines with the ! to indicate destructive.


It would be, if only it were true. Consider Array#delete_if or Array#pop, for example, although there are many others too.

Matz has written about this - https://www.ruby-forum.com/topic/176830#773946 - and said "The bang (!) does not mean "destructive" nor lack of it mean non destructive either. The bang sign means "the bang version is more dangerous than its non bang counterpart; handle with care".

This is one of the most commonly misunderstood things about Ruby in my experience (enough so that some library developers do apply a ! == destructive naming system) and would certainly make an equivalent "Ruby quirks" list IMHO! :-)


I generally agree with you but I think the recommendation would be that list.sort() should return the original list in its new, sorted state, not a new list.


It's just a coding convention in python.

All in-place operations in python are supposed to return None, as a way of explicitly signaling that the operation was done on the existing object. It's not always followed outside the standard library, but it's very rare to run across an exception.

It's counter-intuitive at first, especially in this case, but it is consistent.


it is consistent but it's arguably bad design. it seriously hurts composability.


Do you mean composability, or just having to use another line?


The one that always confuses me is that multiple "for"s in the same comprehension work the opposite way from how you expect:

    >>> [[(a, b) for a in [1, 2, 3]] for b in [4, 5, 6]]
    [[(1, 4), (2, 4), (3, 4)], [(1, 5), (2, 5), (3, 5)], [(1, 6), (2, 6), (3, 6)]]
    >>> [(a, b) for a in [1, 2, 3] for b in [4, 5, 6]]
    [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]


It makes sense to me; it is actually quite similar to summation notation.


The second one makes more sense to me. My brain reads it as:

  for a in [1, 2, 3]:
    for b in [4, 5, 6]:


I don't because the statement itself flips compared to for style, so it seems like the whole thing should flip. i.e.

    for XXXXX:
      for YYYYY:
        ZZZZZ
it would kind-of make sense to me if this translated into a for comprehension as

    [ZZZZZ for YYYYY for XXXXX]
but instead it translates to

    [ZZZZZ for XXXXX for YYYYY]
which seems decidedly middle-endian.


When I learned python I was surprised by the absence of a length() member on collections. But, apparently this was far from an accident. http://effbot.org/pyfaq/why-does-python-use-methods-for-some...


You could always use .__len__()


Ellipsis and slice are used for slicing. e.g:

  class MyContainer(object):
      def __getitem__(self, key):
          return key
  >>> c = MyContainer()
  >>> c[1]
  1
  >>> c[1:]
  slice(1, None, None)
  >>> c[1:2, 1:2:3, ...]
  (slice(1, 2, None), slice(1, 2, 3), Ellipsis)
NumPy uses this to allow advanced slicing of multidimensional arrays:

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.ht...


Some of these quirks are explained incorrectly. I find the real explanations interesting, so please allow me to provide them.

> 999+1 is not 1000

The examples given in this section do not actually show what the poster thinks! Python's interning of small integers is not relevant. It only applies to integers in the range -5 through 256.

The real explanation for why "1000 is 1000" evaluates to True has to do with the Python compiler. By evaluating this expression as a single statement in the interactive interpreter, the compiler notices that the constant value 1000 is repeated more than once. Therefore, it is able to re-use the same object.

But if you provide the value in more than one statement, the compiler is unable to do this:

  >>> a = 256
  >>> b = 256
  >>> a is b
  True
  >>> a = 257
  >>> b = 257
  >>> a is b
  False
Note that this behavior exists because each statement in the interactive interpreter is compiled separately. If you place the code within a function instead, the entire function is compiled at once, and the object is reused once more:

  >>> def f():
  ...     a = 257
  ...     b = 257
  ...     return a is b
  ... 
  >>> f()
  True
> Ellipsis?

Apparently Ellipsis is always “bigger” than anything, as opposite to None, which is always “smaller” than anything.

Not so. Ellipsis follows Python 2's default rules for the comparison of unrelated types.

  >>> Ellipsis < ()
  True
The default is documented as comparing objects "consistently but arbitrarily"[1]. The actual rules are:

1) None is the smallest object. 2) Followed by numbers. 3) Followed by all other objects. Objects of distinct types are compared by the lexical ordering of their type names.

This can easily lead to senseless orderings, when two types define an ordered relationship between themselves, but another type happens to have a name that is lexically between them, as with str, tuple, and unicode:

  >>> "def" < (1,)
  True
  >>> (1,) < u"abc"
  True
  >>> u"abc" < "def"
  True
The Ellipsis constant is an instance of a type named "ellipsis", and so it is smaller than instances of most of the other non-numeric builtin types, except for dict.

The actual use of Ellipsis has nothing to do with recursive containers printing an ellipsis in their repr. It's part of a wacky special syntax that exists for NumPy's benefit:

  >>> d = {}
  >>> d[...] = None
  >>> d
  {Ellipsis: None}
[1] http://docs.python.org/2/reference/expressions.html#not-in



print in 2.x lazy-evaluates its arguments.

Given

    def p(x):
        print x
        return x
then

    print 'a', p('b')
is not equivalent to

    a1 = 'a'
    a2 = p('b')
    print a1, a2


Article is from 2009.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: