Python quirks

emef · on July 12, 2013

I'm surprised using mutable value for a default in a keyword argument wasn't mentioned. This certainly tripped me up early in learning python:

    def fn(x, my_dict={}):
        my_dict[x] = x * 2
        return my_dict

    >>> fn(1)
    {1: 2}
    >>> fn(2)
    {1: 2, 2: 4}

(I would have expected the second call to return {2: 4} when I was learning python)

jliechti1 · on July 12, 2013

This was discussed pretty recently in this thread on "Python Newbie Mistakes, Part 1".

Default values for functions in Python are instantiated when the function is defined, not when it’s called.

Thread: https://news.ycombinator.com/item?id=5999772

Direct link: http://blog.amir.rachum.com/post/54770419679/python-common-n...

It looks like the author posted a "part 2" to HN as well, but it never made the front page.

http://blog.amir.rachum.com/post/55024295793/python-common-n...

kyllo · on July 12, 2013

This is pretty bizarre, and seems like it would cause a memory leak if you don't know what you're actually doing. You're allocating a my_dict object on the heap, in a field of the function object, so my_dict never goes out of scope and never gets garbage collected until the function fn itself does, right?

Whereas if you did this instead:

    def fn(x):
        my_dict={}
        my_dict[x] = x * 2
        return my_dict

You'd get what you'd expect--a new my_dict object gets created, returned, and then goes out of scope every time fn is called, so it would get garbage collected once there are no more references to that return value. (I think...)

(I don't know that much about how memory allocation and GC works in Python yet, just trying to learn!)

baq · on July 12, 2013

this is not at all bizarre, you just need to know what dynamic means. the best answer i could come up with is "definition is execution". for an enlightening moment, see this piece of code:

    def a():
        print "a called"
        return []

    def fn(x=a()):
        x.append(1)
        print x

    fn()
    fn()
    fn()

i suggest typing this directly into the interpreter instead of a script for better effect.

akkartik · on July 13, 2013

You got me thinking about how python differs from lisp in this respect, and the subtle way that a functional API has helped this specific situation.

While lisp suffers from the same "definition is execution" gotcha, the effects are far rarer in practice because () is immutable and interned while [] is mutable and usually generated afresh each time it's executed.

  $ python
  >>> [] is []
  False

  $ sbcl
  * (eq () ())
  t

Since [] is mutable, appends of [] can do superficially 'the right thing'.

  >>> a = []
  >>> a.append(4)
  >>> a
  [4]

  * (setq a ())
  * (nconc a '(34))
  (34)
  * a
  ()

But there's a reason lisp does seemingly the wrong thing. According to the spec, nconc skips empty arguments (http://www.lispworks.com/documentation/HyperSpec/Body/f_ncon...) Reading between the lines, I'm assuming this makes sense from a perspective in lisp where we communicate even with 'destructive' operations through their return value. This is more apparent when you consider nreverse:

  * (setq a '(1 2 3 4))
  * (nreverse a)
  (4 3 2 1)
  * a
  (1)

Destructive operations can reuse their input, but they're not required to maintain bindings. There is no precise equivalent of python's .reverse(). Instead, a common idiom is:

  * (setq a (nreverse a))

It seems like a weird design decision, but one upshot of it besides encouraging a more functional style is that this optional-arg gotcha loses a lot of its power in lisps. It's very rare to define a default param of a non-empty list, and empty lists can't be modified without assigning to them.

bfs · on July 12, 2013

Great explanation!

jvdongen · on July 12, 2013

Basically. The idiomatic solution for this in python is:

  def fn(x, my_dict=None):
     if my_dict is None:
         my_dict = {}
      ...

brownbat · on July 13, 2013

You can use a sentinel if you want to allow None as an argument. More on that, along with a way to use mutable defaults productively:

http://effbot.org/zone/default-values.htm

rbanffy · on July 13, 2013

I never did it myself, but this behavior is very useful if you want something to be used across invocations, like a cache.

To limit its size, you can use the ideas on http://stackoverflow.com/questions/2437617/limiting-the-size...

rbanffy · on July 12, 2013

In other words, defaults are static (an in C) and initialized when the function is defined, not when it's called.

adamtj · on July 13, 2013

It is perhaps more correct to say "when the function is instantiated". Functions in python aren't exactly "defined". Functions are objects same as everything else. The "def" keyword is how you call the constructor for a function object. The kwarg default values are evaluated once at object construction time and saved.

The "def" statement may construct different distinct function objects from a single definition. Each distinct function object has different default value objects, but each function object reuses its own default value objects each time it's called.

Consider this:

    def outer_fn():
        def inner_fn(foo={}):
            # The id() function returns an internal
            # object identifier.  Different objects 
            # have different ids.
            print id(foo)
        return inner_fn
    
    inner_fn_1 = outer_fn()
    inner_fn_2 = outer_fn()
    inner_fn_1()
    inner_fn_1()
    inner_fn_2()
    inner_fn_2()

rbanffy · on July 13, 2013

You are absolutely correct, of course. I was using a C-ism.

bsimpson · on July 12, 2013

So it's instantiating the dict in the declaration and recycling it between invocations? I wouldn't have expected that either.

bskap · on July 12, 2013

"def" in Python is an executable statement, not a declaration- it creates a function object that gets bound to the specified name in the local namespace. The defaults are evaluated when the function is created.

chrisdotcode · on July 12, 2013

Rather the dict is bound during "compile" time, and not at each individual invocation (which would be much better, IMO). And during each invocation, it doesn't get re-bound.

baq · on July 12, 2013

it can't be, that wouldn't make sense. "def" is no different than "for" or "if".

js2 · on July 12, 2013

It's useful for memoization though and is where I've typically used it.

njharman · on July 12, 2013

> Tuples constructor

Confusion is avoided by understanding that comma ',' not paren '()' is the tuple constructor.

> Python doesn’t have multiline comments. Instead, multiline strings are used.

No the are NOT! Don't use strings for comments dammit.

jmduke · on July 12, 2013

PEP 257 recommends using triplequoted strings for multiline docstrings:

http://www.python.org/dev/peps/pep-0257/#multi-line-docstrin...

ericholscher · on July 12, 2013

A docstring is not a comment. A docstring is actually a string that gets saved on the object, where as a comment is not used at all.

quacker · on July 12, 2013

Right, and comments and documentation serve different purposes. But appropriate docstrings will greatly reduce the need for comments.

d23 · on July 12, 2013

> No the are NOT! Don't use strings for comments dammit.

You're gonna have to back that up with a reason. I'm tired of pedantry for pedantry's sake.

cdavid · on July 12, 2013

Docstrings are (generally)available at the runtime, comments aren't: they are bypassed early in the parsing. They are completely different beasts (in python like in other languages).

d23 · on July 13, 2013

I understand that they are different technically, but given that the performance implications are going to be nil for most people and docstrings have a similar function as comments, is it really necessary to be on such a high horse over something so minor?

I'll answer my own question: no.

pekk · on July 13, 2013

Strings are not intended to act as comments. Docstrings are intended to act as docstrings. If you need inline comments beyond what is in your docstrings, there is no reason not to use Python's comment character as designed.

d23 · on July 13, 2013

I love it: https://twitter.com/gvanrossum/status/112670605505077248

No offense, but your arrogant "knows it all" attitude is what annoys a lot of people about this community.

atgm · on July 13, 2013

It's tangential to the original discussion, but that attitude nearly turned me off of Python entirely when I was first starting. I'd pop onto IRC or a discussion community and ask a question about something, explaining that I was new and the most common responses were:

1. Read the docs and figure it out yourself.

2. Why are you doing X? Only an intellectually feeble person would do X -- normal people do Y.

3. That's a waste of time and I'm not going to tell you how to do it because X, Y, and Z.

d23 · on July 14, 2013

Sorry to hear that man. I hope you kept up with it. I find it to be so enjoyable and productive to work with.

csdigi · on July 12, 2013

The sectioned titled "Inconsistent get interface" compares get() to getattr() which are two unrelated functions. getattr is the same as a property lookup on an object (person.name, person['name']), get() is a method defined by some types which returns a stored value.

In the provided example he calls get on an empty dictionary for the key 1, then calls getattr of 'a' on an int. Finally he calls it again with an optional default argument of None.

The difference is made apparent by the example:

In [13]: test = {'values': 1}

In [14]: getattr(test, 'values') Out[14]: <function values>

In [15]: test.get('values') Out[15]: 1

leephillips · on July 12, 2013

Many of these "quirks" are just odd expectations on the author's part that were not fulfilled, but he does describe a few interesting odds and ends. And I agree completely about modules, importing, eggs, namespaces, and all that mess. One of the most refreshing things about learning Clojure after using Python for a while was the relative absence of confusion surrounding these issues, aided by Leiningen.

craigyk · on July 12, 2013

I think these "quirks" hit on legitimate inconsistencies and confusing interpretations. Most of his expectations seem pretty commonplace.

aaronharnly · on July 12, 2013

One of my favorite tomfooleries in Python is this:

>>> True, False = False, True

It doesn't have much practical effect, since most logical tests don't use the True and False constants directly. But it's a good way to perplex the unwary.

wicknicks · on July 12, 2013

In Python 3.x, this gives me a "SyntaxError: assignment to keyword" error.

bluewres · on July 12, 2013

They changed the language definition for Python 3 to make True and False keywords, unlike in previous versions.

dorkrawk · on July 12, 2013

import random

if random.randint(0,1): (True,False) = (False,True)

taejo · on July 14, 2013

(Some dialects of) Smalltalk had `true become: false`, which not only changed the names of `true` and `false`, but also all references to `true` are replaced with references to `false`

Tomis02 · on July 13, 2013

rev_null, you are dead. Make a new account.

barik · on July 12, 2013

I don't consider the sort example to be a quirk at all, though some of the other examples are reasonable. It's more a definition of what methods are supposed to do in the first place: a behavior that is applied on the instance of the object. Unless you're doing functional programming (or implementing certain specialized design patterns), you wouldn't expect dog.move() to return a new dog, so why should list.sort() return a new list?

to3m · on July 12, 2013

I always figured it was because "sort" is a verb. Over time I've come to feel that functions named with a verb phrase ought to be procedure-like, in that they have side effects and return either nothing (which in python means None) or some kind of error indicator (and in python you'd generally use exceptions instead).

There are exceptions (e.g., the "get" prefix, used for getters, which tend to return values and have no side-effects), and, as with any rule of thumb, better to break it than create something ugly. But as rules go, I've found this one a pretty useful one to stick to.

manojlds · on July 13, 2013

I think this is where Ruby shines with the ! to indicate destructive.

petercooper · on July 13, 2013

It would be, if only it were true. Consider Array#delete_if or Array#pop, for example, although there are many others too.

Matz has written about this - https://www.ruby-forum.com/topic/176830#773946 - and said "The bang (!) does not mean "destructive" nor lack of it mean non destructive either. The bang sign means "the bang version is more dangerous than its non bang counterpart; handle with care".

This is one of the most commonly misunderstood things about Ruby in my experience (enough so that some library developers do apply a ! == destructive naming system) and would certainly make an equivalent "Ruby quirks" list IMHO! :-)

CognitiveLens · on July 12, 2013

I generally agree with you but I think the recommendation would be that list.sort() should return the original list in its new, sorted state, not a new list.

jofer · on July 12, 2013

It's just a coding convention in python.

All in-place operations in python are supposed to return None, as a way of explicitly signaling that the operation was done on the existing object. It's not always followed outside the standard library, but it's very rare to run across an exception.

It's counter-intuitive at first, especially in this case, but it is consistent.

baq · on July 12, 2013

it is consistent but it's arguably bad design. it seriously hurts composability.

pekk · on July 13, 2013

Do you mean composability, or just having to use another line?

lmm · on July 12, 2013

The one that always confuses me is that multiple "for"s in the same comprehension work the opposite way from how you expect:

    >>> [[(a, b) for a in [1, 2, 3]] for b in [4, 5, 6]]
    [[(1, 4), (2, 4), (3, 4)], [(1, 5), (2, 5), (3, 5)], [(1, 6), (2, 6), (3, 6)]]
    >>> [(a, b) for a in [1, 2, 3] for b in [4, 5, 6]]
    [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

Ixiaus · on July 12, 2013

It makes sense to me; it is actually quite similar to summation notation.

sp332 · on July 12, 2013

The second one makes more sense to me. My brain reads it as:

  for a in [1, 2, 3]:
    for b in [4, 5, 6]:

lmm · on July 12, 2013

I don't because the statement itself flips compared to for style, so it seems like the whole thing should flip. i.e.

    for XXXXX:
      for YYYYY:
        ZZZZZ

it would kind-of make sense to me if this translated into a for comprehension as

    [ZZZZZ for YYYYY for XXXXX]

but instead it translates to

    [ZZZZZ for XXXXX for YYYYY]

which seems decidedly middle-endian.

727374 · on July 12, 2013

When I learned python I was surprised by the absence of a length() member on collections. But, apparently this was far from an accident. http://effbot.org/pyfaq/why-does-python-use-methods-for-some...

est · on July 13, 2013

You could always use .__len__()

tveita · on July 12, 2013

Ellipsis and slice are used for slicing. e.g:

  class MyContainer(object):
      def __getitem__(self, key):
          return key
  >>> c = MyContainer()
  >>> c[1]
  1
  >>> c[1:]
  slice(1, None, None)
  >>> c[1:2, 1:2:3, ...]
  (slice(1, 2, None), slice(1, 2, 3), Ellipsis)

NumPy uses this to allow advanced slicing of multidimensional arrays:

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.ht...

kgm · on July 12, 2013

Some of these quirks are explained incorrectly. I find the real explanations interesting, so please allow me to provide them.

> 999+1 is not 1000

The examples given in this section do not actually show what the poster thinks! Python's interning of small integers is not relevant. It only applies to integers in the range -5 through 256.

The real explanation for why "1000 is 1000" evaluates to True has to do with the Python compiler. By evaluating this expression as a single statement in the interactive interpreter, the compiler notices that the constant value 1000 is repeated more than once. Therefore, it is able to re-use the same object.

But if you provide the value in more than one statement, the compiler is unable to do this:

  >>> a = 256
  >>> b = 256
  >>> a is b
  True
  >>> a = 257
  >>> b = 257
  >>> a is b
  False

Note that this behavior exists because each statement in the interactive interpreter is compiled separately. If you place the code within a function instead, the entire function is compiled at once, and the object is reused once more:

  >>> def f():
  ...     a = 257
  ...     b = 257
  ...     return a is b
  ... 
  >>> f()
  True

> Ellipsis?

Apparently Ellipsis is always “bigger” than anything, as opposite to None, which is always “smaller” than anything.

Not so. Ellipsis follows Python 2's default rules for the comparison of unrelated types.

  >>> Ellipsis < ()
  True

The default is documented as comparing objects "consistently but arbitrarily"[1]. The actual rules are:

1) None is the smallest object. 2) Followed by numbers. 3) Followed by all other objects. Objects of distinct types are compared by the lexical ordering of their type names.

This can easily lead to senseless orderings, when two types define an ordered relationship between themselves, but another type happens to have a name that is lexically between them, as with str, tuple, and unicode:

  >>> "def" < (1,)
  True
  >>> (1,) < u"abc"
  True
  >>> u"abc" < "def"
  True

The Ellipsis constant is an instance of a type named "ellipsis", and so it is smaller than instances of most of the other non-numeric builtin types, except for dict.

The actual use of Ellipsis has nothing to do with recursive containers printing an ellipsis in their repr. It's part of a wacky special syntax that exists for NumPy's benefit:

  >>> d = {}
  >>> d[...] = None
  >>> d
  {Ellipsis: None}

[1] http://docs.python.org/2/reference/expressions.html#not-in

agf · on July 13, 2013

http://stackoverflow.com/questions/7969552/why-does-4-3-retu...

valtron · on July 12, 2013

print in 2.x lazy-evaluates its arguments.

Given

    def p(x):
        print x
        return x

then

    print 'a', p('b')

is not equivalent to

    a1 = 'a'
    a2 = p('b')
    print a1, a2

sbierwagen · on July 12, 2013

Article is from 2009.