I really wish that there was a column-oriented storage native to Postgres. My im...

anarazel · on Sept 15, 2017

FWIW, I think currently column vs. row-store is not in the critical path of making postgres more suitable analytical query processing. The CPU bottlenecks are just elsewhere - and we (various postgres devs) are working on them. The storage density parts of column stores can currently partially be achieved by using a filesystem that does compression - far from great, but not disastrous either.

elvinyung · on Sept 15, 2017

I thought the bigger advantage was that you could do sequential-only I/O on projections?

Or are you saying that you're only CPU-bottlenecked?

anarazel · on Sept 15, 2017

> I thought the bigger advantage was that you could do sequential-only I/O on projections?

Not quite following - a column store will usually not have more sequential IO than a row-store. Often enough to the contrary, because you have to combine column[-groups], for some queries. What you get is: Higher compression ratios, better IO & cache access patterns for filter-heavy queries, easier to vectorize computations. Especially if you either filter heavily or aggregate only a few rows, you can do a lot less overall IO in total, but the sequential-ness doesn't really improve.

> Or are you saying that you're only CPU-bottlenecked?

Oftentimes, yes. You might be storage space constrained, but storage speeds for individual sequential-IO type queries are usually fast enough. Parallelism helps with that (if you can push down enough work, a lot of it added in 9.6 & 10), plain old code optimizations (better hash-tables, new expression evaluation framework, both in 10), as does JITing parts of the query processing (WIP, patches posted for 11).

tnndk · on Sept 15, 2017

There is the cstore foreign data wrapper, https://github.com/citusdata/cstore_fdw, but it is still an extension. Hopefully it will go into mainline at some point.

gdulli · on Sept 15, 2017

There sort of was, but unfortunately Amazon bought it and renamed it to Redshift and now you have to rent it.

elvinyung · on Sept 15, 2017

I thought Redshift was built completely from scratch, and only looks like Postgres at the wire protocol level?

Also, it would be really nice if I could do both my transactions and my analytics in the same box. Then ETL is basically just maintaining a materialized view.

scapecast · on Sept 15, 2017

Amazon built Redshift by using technology from Paraccel. Paraccel was built on Postgres.

See a Quora answer I wrote:

https://www.quora.com/Amazon-redshift-uses-Actians-ParaAccel...