They use int instead of long which means if you start to group objects together in any magnitude you will get collisions fairly quickly.
Any large HashTable in Java starts to yield the problem of duplicate keys, it's just a weird situation, like you can 99.999% trust something ... but can't ever fully trust it so that over time, you're guaranteed to have something wrong.
This problem exhibits itself again under the hood in the JNI API when you have to identify objects in another domain i.e. C++.
It's not a 'Quirk' it's basically a big mistake.
The ability to uniquely identify objects is so important in so many ways.
Sometimes I wish every few versions of Java they would skip the reverse compatibility and make some needed changes.
Hashcode is not used for identity and is explicitly not intended for this and never was. The whole point of this article is that the modern default implementation does not even rely on object identity anymore (it used to). Java has object identity of course. Most good equals implementations rely on that to figure out if the object is being compared to itself before running more expensive operations do field by field comparison.
Collisions are a good thing actually if you are implementing a hash table. Otherwise you end up with one bucket per object; which does not scale very well. The reason hashcode can be overridden is so you can have some control over these collisions if you need to.
It should be. If every object had a 64 bit id, that was guaranteed to be unique, it would make so many things, so much easier and practical. That we're 25 years into this and there is still ambiguity is not helpful. I'll bet they could do that and have nice entropy in the id's and hashtable performance.
>Any large HashTable in Java starts to yield the problem of duplicate keys, it's just a weird situation, like you can 99.999% trust something ... but can't ever fully trust it so that over time, you're guaranteed to have something wrong.
hashCode() is a prehash function the outputs of which need to be mapped further to the (typically much smaller) number of buckets in a hash table of certain size (which would depend on the number of objects currently in the table), those "duplicate keys" are not a problem, they're how hash tables work in any language. Objects' hashcodes are used to find the relevant bucket, then this bucket is properly examined using equals(). HashMap and Hashtable are backed by arrays which have the max size of Integer.MAX_VALUE (minus some change) in JVM anyway, so those would need to be indexed by an int. I hope this helps to overcome the trust issues you have with Java data structures.
I understand hashtables effectively work from 'hashes' which imply collisions etc..
I'm so used to using the term 'hasthable' I forgot that it implies a specific implementation, I should have use the term 'Map' or 'Key/Value' table, I'm resigned to having used the terms interchangeably too often.
The notion of 'hashes' which can produce 'collisions' creates a bunch of unnecessary concerns and complications given the ultimate objective of a hashtable, i.e. as a key-value store.
If every object had a guaranteed unique global id, which we could use as a key, then this would provide a lot of clarity and avoid problems. Of course the word 'hash' doesn't really even belong in the context of the higher level abstraction of key-value store as it's implementation specific.
Unfortunately, Java uses the word 'identity' in the System.identityHashCode which is really confusing. It's not really an 'identity'. It's misleading and I bet tons of Java devs are unaware (or forget). There's actually a bug on it [1]
A few years ago, I had to spend a day down this rabbit hole, as many devs have to and it's just unnecessary. A better use of I think would really help.
Any large HashTable in Java starts to yield the problem of duplicate keys, it's just a weird situation, like you can 99.999% trust something ... but can't ever fully trust it so that over time, you're guaranteed to have something wrong.
This problem exhibits itself again under the hood in the JNI API when you have to identify objects in another domain i.e. C++.
It's not a 'Quirk' it's basically a big mistake.
The ability to uniquely identify objects is so important in so many ways.
Sometimes I wish every few versions of Java they would skip the reverse compatibility and make some needed changes.