Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Released: Python 2.6.8, 2.7.3, 3.1.5, and 3.2.3 (python.org)
76 points by johns on April 11, 2012 | hide | past | favorite | 36 comments


I'm both very much surprised and disappointed that this feature is disabled by default and needs to be explicitly turned on.

Honestly, if the spec doesn't guarantee that iteration will be in order, no one has a right to complain if it actually isn't with this release.

In the C++ world I come from, the Spec is treated as being holy. You don't write code that invokes undefined behavior or relies on unspecified behavior. If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it. (Unless, of course, there really is no other way due to your compiler, etc. etc. in which case you litter the comments with warnings)

In this particular case, it's a simple matter of choosing the right data structure to suit your needs, or sorting contents before accessing them if you truly require in-order iteration.


>In the C++ world I come from, the Spec is treated as being holy.

That's ridiculous.

>You don't write code that invokes undefined behavior or relies on unspecified behavior.

This happens all the time, and not only is code written that relies on unspecified behavior, vendors maintain backward compatibility for programs that do this.

http://www.joelonsoftware.com/articles/APIWar.html

> If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it.

Not really. If you do this and you're a huge success, you're the center of the universe. If you do this and no one notices, you're fucked.


Software APIs are one thing, the C/C++ spec is another. The latter undergoes revision after revision through hundreds of publicly released drafts and after years and years of planning and experimentation. The former is subject to the whims and caprices of whomever's software you're using.


As one who complained directly this being off-by-default to some python-dev folks, one small reason it's off-by-default in the bugfix releases is because it's likely to break people's tests if they use doctest. Pretty much everyone I've talked to sees this as an argument against doctest on at least some level.

IIRC there were also larger concerns about poorly implemented systems that depended on hash() being stable for persistence reasons. I don't know the details of that, as I appreciate the nod to backwards compatibility and not breaking people on a bugfix release.


I was looking for something in the announcement that warns people that the flag will be on in the next major release. “Told you so (in specific detail)” is a good argument to oppose to users who didn't fix their doctests.


JSON objects are explicitly unordered, but you would be amazed by how many gripes I've seen about JSON libraries or systems that don't preserve the order.


Wow this is great. As I remember basically all language implementations like Ruby, PHP, Perl, V8, etc. were vulnerable to this problem. Is Python the first to provide a fix?


Lua authors think the problem is overstated, but are fixing it anyway for "propaganda purposes": http://article.gmane.org/gmane.comp.lang.lua.general/87667


I find their attitude worrying - how can something be overstated if it's essentially possible for anyone to take down servers as easy as that? It's possible, so it must be fixed, that's what basically any text book on security tries to convey: there's no such thing as "mostly secure" - either it's done right or there's no need to do anything at all. Ignorance won't help in making the web a safer (more secure) place.


It should be fixed in the web frameworks people are using. Some rolled out fixes after the bug disclosure (which came surprisingly late considering it was well known in theory and perl fixed it years ago (in 2003)). The fix is simple -- don't allow users to pass thousands of arguments/options or basically any user input which is later put into dictionary.


Yes, that's exactly his point. There are many other issues as well that can DoS the average system.

And they are fixing it, so why say "it must be fixed"?


That guy is smart


Perl has not been vulnerable to this kind of attack for 9 years (Perl 5.8.1)

[1] http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-...


I believe perl fixed this issue years ago (and this recent wave of discovery was based on a paper which was sent to perl maintainers long ago). Ruby fixed this issue in 1.9 before it was a big deal. I am not aware if Ruby 1.8 has backported a fix yet.

1: http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...


Ruby seems to have fixed it in December, PHP in January and Node was trying to get a fix in for v8 but it's been rejected AFAIR and then they patched their version. Perl hasn't actually been vulnerable for a couple of years.


I thought Perl was not? At least 5.14 isn't.


This easily could have been the title of an article bashing python fragmentation


This was also my first thought. But with changes as big as Python 3, there is probably no way around this.


How does that explain the four "current" versions, then?


There aren't 4 current versions, there are 2. The other 2 are old versions that still get security updates.


Security releases are for old distros/users.


Security updates are always backported, this is how Linux distros operate.


Does anyone know if this behavior will be enabled by default in 3.3?


It will be.


I am surprised they get into quadratic algorithmic complexities. If they use the most efficient data structures (e.g., balanced binary trees) their algorithms for storage and retrieval should be no higher than O(log(n)). Thus if everything collides at the same entry in a hash table, you should still be able to do stores and reads for O(log(n)) time. And if that is the case, it would be very difficult to execute an attack that would cause significant disturbances even if the bad guys could beat the hash function.


I think you're assuming that python hash tables use linked list buckets. They don't. They use open addressing, and resize the table as necessary. There's no "data structure" that could be switched to a binary tree.


Yes, that is what I was assuming.


I believe that using binary trees for hash chaining gives you better worst-case complexity but often worse performance in the common case. When your hashes are reasonably well spread out and the buckets are chained at most a couple of items deep, the overhead of maintaining a binary tree versus a linked list or using open addressing can be a considerable hit.


Does anyone have a link to the commit that fixed this? I'd be interesting in seeing the before and after.


It looks like most of it is in http://hg.python.org/cpython/rev/f4b7ecf8a5f8


For anyone else that's curious, it's a pretty simple change:

    23.1 --- a/Modules/datetimemodule.c
    23.2 +++ b/Modules/datetimemodule.c
    23.3 @@ -2566,10 +2566,12 @@ generic_hash(unsigned char *data, int len)
    23.4      register long x;
    23.5  
    23.6      p = (unsigned char *) data;
    23.7 -    x = *p << 7;
    23.8 +    x = _Py_HashSecret.prefix;
    23.9 +    x ^= *p << 7;
   23.10      while (--len >= 0)
   23.11          x = (1000003*x) ^ *p++;
   23.12      x ^= len;
   23.13 +    x ^= _Py_HashSecret.suffix;
   23.14      if (x == -1)
   23.15          x = -2;
The hash computation is initialized with a global random value, and a second one is xored in at the end. (-1 isn't allowed as a hash, since it's the sentinel value that indicates the hash hasn't been computed yet.)


That's a lot of work to make os.urandom portable across all their platforms.


I'm disappointed that they didn't fix pydoc to handle Partial objects correctly (i.e. show them as methods and display their __doc__ text rather than show them as variable of class Partial).


Wow, they are only fixing this just now? Terrible. And yeah, off by default is a failure. You should be secure by default.


This post by Russ Cox about random hash functions is timely: http://research.swtch.com/randhash


The Russ Cox post is unrelated. See the last paragraph in his post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: