> Also, without some additional measures, neither plain arrays or vectors suppor...

jonv98 · on Aug 20, 2019

> Maybe std::unordered_set might be what you want?

Modern video games are typically now both GPU-bound and CPU-bound. But also, latency is way more important than throughput.

Imagine you have 1000+ different std::unordered_set objects in your game, that are being used and accessed every frame. Most of the time, your game is using around 30% of the CPU. But on one frame, you get unlucky and 900 of your std::unordered_set objects run out of space and have to be re-allocated at the same time. Suddenly your frame rate drops from 30fps to 3fps and then back up again. This is totally unacceptable in a video game - gamers hate it, and they have a name for it, called "stuttering".

For that reason, most video games allocate large blocks of memory upfront and use their own custom allocators, usually very different from the doug lea malloc() that iirc new is still a wrapper for. (I'm aware of std::allocator, but that's a whole other topic...)

Basically if you think about how, in some first person shooter, the oldest bodies and bullet decals start disappearing when new enemies appear, the whole engine is based around that philosophy.

codesushi42 · on Aug 21, 2019

For that reason, most video games allocate large blocks of memory upfront and use their own custom allocators, usually very different from the doug lea malloc() that iirc new is still a wrapper for. (I'm aware of std::allocator, but that's a whole other topic...)

That is only half of the story. The full story is using a preallocated object pool, and reusing entity objects without ever having to dynamically allocate new instances on demand.

einpoklum · on Aug 20, 2019

Remember `std::unordered_set` is typically rather slow.

saagarjha · on Aug 20, 2019

Well, it depends what you're doing with it.

B4TMAN · on Aug 20, 2019

Can you elaborate why is it slow? Shouldn't it be faster tham `std::ordered_set` which uses a Red Black Tree as the undelying data structure thus proviing a O(logn) time complexity on the other hand `std::unordered_set` uses hash functions to `index` in an array and retrieve which essentially is a O(1) time complexity.

kllrnohj · on Aug 21, 2019

This is where we get into O(1) != fast territory. Algorithmic complexity has a weak relationship to CPU performance, not a strong one.

If you want to find something in a set storing it as an array and doing a linear scan will beat a std::unordered_set up to a shockingly large number of items due to how CPU's work.

In particular it's the pointer chasing aspect of std::unordered_set that becomes a problem (an issue shared with _most_ hash set implementations). Remember an unordered_set is not an array of items, it's an array of buckets of items (this is how hash collision is handled). Worse still, those buckets are usually linked lists. It typically can't be speculated effectively and it can't be prefetched effectively, so you become memory latency bound during an un-cached lookup. And memory latency is just shy of absolutely terrible. If you're expecting L1/L2/L3 cache hits on lookups then you're not dealing with vary large sizes probably and you're going to get much better cache density with the flat array than the array-of-buckets.

There are alternative hashsets that are flat and avoid this, but they are less common and as far as I know no standard implementation on any language uses such a hash set. There's a good talk about such a dense, flat hash set here: https://www.youtube.com/watch?v=ncHmEUmJZf4

yxhuvud · on Aug 21, 2019

Some languages have hash tables (and hence sets, as they tend to be implemented with them) that use open addressing, in a way that doesn't end up being bad cachewise unless there are unreasonably many collisions for the same hash code.

It is also not uncommon for mature implementations to optimize the cases with few elements in the hash to use linear lookup. In some cases that optimization is also used while storing data in small hash tables.

Pointer chasing hash tables was good when cache was nonexistant or small (ie the 90s), but nowadays, open addressing is just superior.

dragontamer · on Aug 21, 2019

I am unfamiliar with std-library details, but the solution is to use Hash Tables with linear-probing, with maybe Robin-hood hashing.

Linear Probing means that if h(key) has a collision, you store the value at h(key)+1. If that has a collision, you store at h(key)+2. Etc. etc. (wrap-around if necessary). Linear Probing means that most accesses will have one-main memory fetch, and then after that, its all pre-fetchers to linear-scan for the data.

Robin-hood hashing means that the "poor" steals from the rich. "Poor" values are the ones who have to be probed significantly: maybe 10x or 20x, or 100x, away from their preferred spot. A "rich" value is one which has been placed close to its ideal spot.

The idea is that you average-out your linear probing distances, so that instead of having 100x (on some probes) and 0x on other probes, all of your hash values will tend towards... roughly 3x probes or so. (I forget the math exactly).

Its as simple as checking on insertion: if values[h(foo)+rehash_count].rehash_count > current_rehash_count, you insert foo into that location. Then, you take THAT value, and hash it forward.

---------

It seems like hash-tables + linear probing with robinhood hashing is popular these days for L1 caches (everyone at offset 5 away from their ideal is way more cache friendly than most at offset 0, but some at offset 100). But I'm not aware of any standard-lib implementation of the technique.