I'm both very much surprised and disappointed that this feature is disabled by default and needs to be explicitly turned on.
Honestly, if the spec doesn't guarantee that iteration will be in order, no one has a right to complain if it actually isn't with this release.
In the C++ world I come from, the Spec is treated as being holy. You don't write code that invokes undefined behavior or relies on unspecified behavior. If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it. (Unless, of course, there really is no other way due to your compiler, etc. etc. in which case you litter the comments with warnings)
In this particular case, it's a simple matter of choosing the right data structure to suit your needs, or sorting contents before accessing them if you truly require in-order iteration.
>In the C++ world I come from, the Spec is treated as being holy.
That's ridiculous.
>You don't write code that invokes undefined behavior or relies on unspecified behavior.
This happens all the time, and not only is code written that relies on unspecified behavior, vendors maintain backward compatibility for programs that do this.
Software APIs are one thing, the C/C++ spec is another. The latter undergoes revision after revision through hundreds of publicly released drafts and after years and years of planning and experimentation. The former is subject to the whims and caprices of whomever's software you're using.
As one who complained directly this being off-by-default to some python-dev folks, one small reason it's off-by-default in the bugfix releases is because it's likely to break people's tests if they use doctest. Pretty much everyone I've talked to sees this as an argument against doctest on at least some level.
IIRC there were also larger concerns about poorly implemented systems that depended on hash() being stable for persistence reasons. I don't know the details of that, as I appreciate the nod to backwards compatibility and not breaking people on a bugfix release.
I was looking for something in the announcement that warns people that the flag will be on in the next major release. “Told you so (in specific detail)” is a good argument to oppose to users who didn't fix their doctests.
JSON objects are explicitly unordered, but you would be amazed by how many gripes I've seen about JSON libraries or systems that don't preserve the order.
Wow this is great. As I remember basically all language implementations like Ruby, PHP, Perl, V8, etc. were vulnerable to this problem. Is Python the first to provide a fix?
I find their attitude worrying - how can something be overstated if it's essentially possible for anyone to take down servers as easy as that? It's possible, so it must be fixed, that's what basically any text book on security tries to convey: there's no such thing as "mostly secure" - either it's done right or there's no need to do anything at all. Ignorance won't help in making the web a safer (more secure) place.
It should be fixed in the web frameworks people are using. Some rolled out fixes after the bug disclosure (which came surprisingly late considering it was well known in theory and perl fixed it years ago (in 2003)). The fix is simple -- don't allow users to pass thousands of arguments/options or basically any user input which is later put into dictionary.
I believe perl fixed this issue years ago (and this recent wave of discovery was based on a paper which was sent to perl maintainers long ago). Ruby fixed this issue in 1.9 before it was a big deal. I am not aware if Ruby 1.8 has backported a fix yet.
Ruby seems to have fixed it in December, PHP in January and Node was trying to get a fix in for v8 but it's been rejected AFAIR and then they patched their version. Perl hasn't actually been vulnerable for a couple of years.
I am surprised they get into quadratic algorithmic complexities. If they use the most efficient data structures (e.g., balanced binary trees) their algorithms for storage and retrieval should be no higher than O(log(n)). Thus if everything collides at the same entry in a hash table, you should still be able to do stores and reads for O(log(n)) time. And if that is the case, it would be very difficult to execute an attack that would cause significant disturbances even if the bad guys could beat the hash function.
I think you're assuming that python hash tables use linked list buckets. They don't. They use open addressing, and resize the table as necessary. There's no "data structure" that could be switched to a binary tree.
I believe that using binary trees for hash chaining gives you better worst-case complexity but often worse performance in the common case. When your hashes are reasonably well spread out and the buckets are chained at most a couple of items deep, the overhead of maintaining a binary tree versus a linked list or using open addressing can be a considerable hit.
For anyone else that's curious, it's a pretty simple change:
23.1 --- a/Modules/datetimemodule.c
23.2 +++ b/Modules/datetimemodule.c
23.3 @@ -2566,10 +2566,12 @@ generic_hash(unsigned char *data, int len)
23.4 register long x;
23.5
23.6 p = (unsigned char *) data;
23.7 - x = *p << 7;
23.8 + x = _Py_HashSecret.prefix;
23.9 + x ^= *p << 7;
23.10 while (--len >= 0)
23.11 x = (1000003*x) ^ *p++;
23.12 x ^= len;
23.13 + x ^= _Py_HashSecret.suffix;
23.14 if (x == -1)
23.15 x = -2;
The hash computation is initialized with a global random value, and a second one is xored in at the end. (-1 isn't allowed as a hash, since it's the sentinel value that indicates the hash hasn't been computed yet.)
I'm disappointed that they didn't fix pydoc to handle Partial objects correctly (i.e. show them as methods and display their __doc__ text rather than show them as variable of class Partial).
Honestly, if the spec doesn't guarantee that iteration will be in order, no one has a right to complain if it actually isn't with this release.
In the C++ world I come from, the Spec is treated as being holy. You don't write code that invokes undefined behavior or relies on unspecified behavior. If you do, you're on your own and no one will touch your code with a ten foot pole if they can help it. (Unless, of course, there really is no other way due to your compiler, etc. etc. in which case you litter the comments with warnings)
In this particular case, it's a simple matter of choosing the right data structure to suit your needs, or sorting contents before accessing them if you truly require in-order iteration.