Released: Python 2.6.8, 2.7.3, 3.1.5, and 3.2.3

ComputerGuru · on April 11, 2012

I'm both very much surprised and disappointed that this feature is disabled by default and needs to be explicitly turned on.

Honestly, if the spec doesn't guarantee that iteration will be in order, no one has a right to complain if it actually isn't with this release.

In the C++ world I come from, the Spec is treated as being holy. You don't write code that invokes undefined behavior or relies on unspecified behavior. If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it. (Unless, of course, there really is no other way due to your compiler, etc. etc. in which case you litter the comments with warnings)

In this particular case, it's a simple matter of choosing the right data structure to suit your needs, or sorting contents before accessing them if you truly require in-order iteration.

ywrkdl · on April 12, 2012

>In the C++ world I come from, the Spec is treated as being holy.

That's ridiculous.

>You don't write code that invokes undefined behavior or relies on unspecified behavior.

This happens all the time, and not only is code written that relies on unspecified behavior, vendors maintain backward compatibility for programs that do this.

http://www.joelonsoftware.com/articles/APIWar.html

> If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it.

Not really. If you do this and you're a huge success, you're the center of the universe. If you do this and no one notices, you're fucked.

ComputerGuru · on April 12, 2012

Software APIs are one thing, the C/C++ spec is another. The latter undergoes revision after revision through hundreds of publicly released drafts and after years and years of planning and experimentation. The former is subject to the whims and caprices of whomever's software you're using.

durin42 · on April 12, 2012

As one who complained directly this being off-by-default to some python-dev folks, one small reason it's off-by-default in the bugfix releases is because it's likely to break people's tests if they use doctest. Pretty much everyone I've talked to sees this as an argument against doctest on at least some level.

IIRC there were also larger concerns about poorly implemented systems that depended on hash() being stable for persistence reasons. I don't know the details of that, as I appreciate the nod to backwards compatibility and not breaking people on a bugfix release.

obtu · on April 12, 2012

I was looking for something in the announcement that warns people that the flag will be on in the next major release. “Told you so (in specific detail)” is a good argument to oppose to users who didn't fix their doctests.

eevee · on April 11, 2012

JSON objects are explicitly unordered, but you would be amazed by how many gripes I've seen about JSON libraries or systems that don't preserve the order.

chubot · on April 11, 2012

Wow this is great. As I remember basically all language implementations like Ruby, PHP, Perl, V8, etc. were vulnerable to this problem. Is Python the first to provide a fix?

haberman · on April 11, 2012

Lua authors think the problem is overstated, but are fixing it anyway for "propaganda purposes": http://article.gmane.org/gmane.comp.lang.lua.general/87667

emboss · on April 12, 2012

I find their attitude worrying - how can something be overstated if it's essentially possible for anyone to take down servers as easy as that? It's possible, so it must be fixed, that's what basically any text book on security tries to convey: there's no such thing as "mostly secure" - either it's done right or there's no need to do anything at all. Ignorance won't help in making the web a safer (more secure) place.

rplnt · on April 12, 2012

It should be fixed in the web frameworks people are using. Some rolled out fixes after the bug disclosure (which came surprisingly late considering it was well known in theory and perl fixed it years ago (in 2003)). The fix is simple -- don't allow users to pass thousands of arguments/options or basically any user input which is later put into dictionary.

tedunangst · on April 12, 2012

Yes, that's exactly his point. There are many other issues as well that can DoS the average system.

And they are fixing it, so why say "it must be fixed"?

pfraze · on April 12, 2012

That guy is smart

LoonyPandora · on April 11, 2012

Perl has not been vulnerable to this kind of attack for 9 years (Perl 5.8.1)

[1] http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-...

axiak · on April 11, 2012

I believe perl fixed this issue years ago (and this recent wave of discovery was based on a paper which was sent to perl maintainers long ago). Ruby fixed this issue in 1.9 before it was a big deal. I am not aware if Ruby 1.8 has backported a fix yet.

1: http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...

jakubw · on April 11, 2012

Ruby seems to have fixed it in December, PHP in January and Node was trying to get a fix in for v8 but it's been rejected AFAIR and then they patched their version. Perl hasn't actually been vulnerable for a couple of years.

sigzero · on April 12, 2012

I thought Perl was not? At least 5.14 isn't.

gdg92989 · on April 12, 2012

This easily could have been the title of an article bashing python fragmentation

yk · on April 12, 2012

This was also my first thought. But with changes as big as Python 3, there is probably no way around this.

gregbair · on April 12, 2012

How does that explain the four "current" versions, then?

jgeralnik · on April 12, 2012

There aren't 4 current versions, there are 2. The other 2 are old versions that still get security updates.

davvid · on April 12, 2012

Security releases are for old distros/users.

obtu · on April 12, 2012

Security updates are always backported, this is how Linux distros operate.

kibwen · on April 12, 2012

Does anyone know if this behavior will be enabled by default in 3.3?

DasIch · on April 12, 2012

It will be.

hristov · on April 12, 2012

I am surprised they get into quadratic algorithmic complexities. If they use the most efficient data structures (e.g., balanced binary trees) their algorithms for storage and retrieval should be no higher than O(log(n)). Thus if everything collides at the same entry in a hash table, you should still be able to do stores and reads for O(log(n)) time. And if that is the case, it would be very difficult to execute an attack that would cause significant disturbances even if the bad guys could beat the hash function.

tedunangst · on April 12, 2012

I think you're assuming that python hash tables use linked list buckets. They don't. They use open addressing, and resize the table as necessary. There's no "data structure" that could be switched to a binary tree.

hristov · on April 12, 2012

Yes, that is what I was assuming.

mikeash · on April 12, 2012

I believe that using binary trees for hash chaining gives you better worst-case complexity but often worse performance in the common case. When your hashes are reasonably well spread out and the buckets are chained at most a couple of items deep, the overhead of maintaining a binary tree versus a linked list or using open addressing can be a considerable hit.

bmm6o · on April 12, 2012

Does anyone have a link to the commit that fixed this? I'd be interesting in seeing the before and after.

ken · on April 12, 2012

It looks like most of it is in http://hg.python.org/cpython/rev/f4b7ecf8a5f8

bmm6o · on April 12, 2012

For anyone else that's curious, it's a pretty simple change:

    23.1 --- a/Modules/datetimemodule.c
    23.2 +++ b/Modules/datetimemodule.c
    23.3 @@ -2566,10 +2566,12 @@ generic_hash(unsigned char *data, int len)
    23.4      register long x;
    23.5  
    23.6      p = (unsigned char *) data;
    23.7 -    x = *p << 7;
    23.8 +    x = _Py_HashSecret.prefix;
    23.9 +    x ^= *p << 7;
   23.10      while (--len >= 0)
   23.11          x = (1000003*x) ^ *p++;
   23.12      x ^= len;
   23.13 +    x ^= _Py_HashSecret.suffix;
   23.14      if (x == -1)
   23.15          x = -2;

The hash computation is initialized with a global random value, and a second one is xored in at the end. (-1 isn't allowed as a hash, since it's the sentinel value that indicates the hash hasn't been computed yet.)

obtu · on April 12, 2012

That's a lot of work to make os.urandom portable across all their platforms.

pyre · on April 12, 2012

I'm disappointed that they didn't fix pydoc to handle Partial objects correctly (i.e. show them as methods and display their __doc__ text rather than show them as variable of class Partial).

lawnchair_larry · on April 12, 2012

Wow, they are only fixing this just now? Terrible. And yeah, off by default is a failure. You should be secure by default.

jbarham · on April 12, 2012

This post by Russ Cox about random hash functions is timely: http://research.swtch.com/randhash

BarkMore · on April 12, 2012

The Russ Cox post is unrelated. See the last paragraph in his post.