Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> we had an unusual discussion about a Python oddity

There are so many discussions about "X language is so weird about it handles numbers!" and it's just IEEE 754 floats.



The oddity here is not the float itself, it's that Python provided a default hash implementation for floats


Yeah IEEE 754 floating point numbers should probably not be hashable, and the weird (but standard-defined) behaviour with respect to NaN equality is one good reason for this.


Python supports arithmetic on mixed numeric types, so it makes sense that floats and ints should have a hash function that behaves somewhat consistently. I don't write a lot of python, but having used other scripting languages it wouldn't surprise me if numeric types get mixed up by accident often enough. You probably want int(2) and float(2) to be considered the same key in a dictionary to avoid surprises.

See: https://docs.python.org/3/library/stdtypes.html#hashing-of-n...


> You probably want int(2) and float(2) to be considered the same key in a dictionary to avoid surprises

There are a variety of problems here:

- floats can't represent whole numbers exactly as they get further from zero.

- error accumulates in floating point arithmetic, so calculated keys may not match constant keys

- Do you really want 2.0 and 2.00001 to refer to different objects?


>floats can't represent whole numbers exactly as they get further from zero.

>error accumulates in floating point arithmetic, so calculated keys may not match constant keys

Yes, but the parent used a toy example. It's a programming fundamental that you shouldn't be converting magic numbers to floats or doing math with floats without considering inaccuracy. These problems apply everywhere programmers use floats - they must understand them, regardless of whether they are using them as hash keys or any other purpose.

>Do you really want 2.0 and 2.00001 to refer to different objects?

Yes, very much so. They are different values. (Assuming you are using 2.00001 as shorthand for "a float close to 2")

The advantage of common primitives being hashable is theoretically very high. In Swift (where NaN hashes to a different value every time, since that makes sense), hashable primitives makes hashability composable, so anything composed of hashable parts is hashable. And hashability is important not just for storage in maps, but all sorts of things (e.g. set membership, comparison for diffs, etc).

The upsides of floats being hashable is potentially much higher than the downsides, which mostly stem from a programmer not understanding floats, not from floats being hashable.


> It's a programming fundamental that you shouldn't be converting magic numbers to floats or doing math with floats without considering inaccuracy.

This is all well and good in a language that makes you declare distinct floating point types, but in python things are maybe not so clear cut - the language uses arbitrary precision integers by default, and it's not always crystal clear when they are going to be converted to a (lossy) floating point representation

Yeah, the programmer should probably be aware of the ins and out here, but python folks often aren't all that in the weeds with the bits and the bytes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: