Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People like to talk about C++ giving complete control over memory layout, but there's one very important cache-relevant technique that I've not seen done transparently: "column based storage".

That is, if you have class foo { int x, y, z; }, and make an array or vector of them, then they will normally be laid out in that order. For locality, you might want to have all X be together - ie, three separate arrays of X, Y and Z.



That is called a "struct of arrays" (vs an "array of structs").

https://en.wikipedia.org/wiki/AOS_and_SOA

I don't know of any language that transparently supports it other than Jai, which isn't available yet.

https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md...


Haskell's "unboxed" vectors do this on a library level so `Vector (a, b)` is stored as `V_2 {-# UNPACK #-} !Int !(Vector a) !(Vector b)`. But to implement it for new types you need some boilerplate.

Alternatively you could use template haskell or cpp to generate the boilerplate, though. The default instances use cpp https://github.com/haskell/vector/blob/master/internal/unbox...


Jovial 73 has built in support for SoA arrays.


C++ does give you the power to lay your data members how you would like. But in this case you've just said that you want those three members to be laid out sequentially in memory. If you want to use column storage because you need to apply a specific transformation on the data then lay it out that way.

In the vast majority of cases having classes be automatically laid out in column based storage would be a detriment and not an advantage. You would actually be fighting against the cache in those cases.

For example I've seen rendering engines go from storing data in AoS to SoA, then back, then back again, then back again all depending on the hardware and tech stack available.


There are libraries to do it almost transparently : https://github.com/crosetto/SoAvsAoS


To be fair, the same is true of Rust. The issue is that with column-based storage you can't have pointers/references to individual structs within the vector. You need to use indexes.


This is just the old array of structs vs struct of arrays trade-off, right? This is true of pretty much any programming language's low level data structures.


Tell me to me. I try for several months to build a way around this on rust (ie: Have a generic interface to both columnar and row-oriented layout and "just" switch to optimized operations later).

I think is impossible. The closest thing is use a NDArray and pick a winner/default layout... that is row-oriented, despite my intention to be columnar first, and later develop an alternate, complete rewrite, for support columnar.


The thing is, in modern C++ you can fake a wrapper to access it like a pointer or reference.

Of course it will not be high performance, but it can be done. (E .g. Eigen library.)


Why wouldn't an implementation along these lines be performant?

  template<typename... Ts>
  class SoA : public tuple<vector<Ts>...> {
          // ...
          template<size_t... Is>
          tuple<Ts&...> subscript(size_t i, index_sequence<Is...>) {
                  return {get<Is>(*this)[i]...};
          }
  public:
          // ...
          auto operator[](size_t i) {
                  return subscript(i, index_sequence_for<Ts...>{});
          }
  };


You can however do this pretty easily in your class definition in a way that is transparent to callers. The "transparent to callers" part is pretty common to any OO model, while the "pretty easily in your class definition" is I think what most people say when they mean "complete control" -- you get much higher degrees of control than you do with, say, Java, though it will be effort beyond the default case.

I use this very approach in a code base I"m working on right now. Some object members are stored in the object and some are not but they all look like class members to the callers.


On the other hand, if you do random accesses to an index but need x, y, and z, then a struct of arrays as you propose will incur three cache misses instead of one for an array of structs. It really depends on the application which is preferable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: