Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really wish that there was a column-oriented storage native to Postgres. My impression is that people are forced to abandon Postgres for analytical processing way too early, at relatively small scales.


FWIW, I think currently column vs. row-store is not in the critical path of making postgres more suitable analytical query processing. The CPU bottlenecks are just elsewhere - and we (various postgres devs) are working on them. The storage density parts of column stores can currently partially be achieved by using a filesystem that does compression - far from great, but not disastrous either.


I thought the bigger advantage was that you could do sequential-only I/O on projections?

Or are you saying that you're only CPU-bottlenecked?


> I thought the bigger advantage was that you could do sequential-only I/O on projections?

Not quite following - a column store will usually not have more sequential IO than a row-store. Often enough to the contrary, because you have to combine column[-groups], for some queries. What you get is: Higher compression ratios, better IO & cache access patterns for filter-heavy queries, easier to vectorize computations. Especially if you either filter heavily or aggregate only a few rows, you can do a lot less overall IO in total, but the sequential-ness doesn't really improve.

> Or are you saying that you're only CPU-bottlenecked?

Oftentimes, yes. You might be storage space constrained, but storage speeds for individual sequential-IO type queries are usually fast enough. Parallelism helps with that (if you can push down enough work, a lot of it added in 9.6 & 10), plain old code optimizations (better hash-tables, new expression evaluation framework, both in 10), as does JITing parts of the query processing (WIP, patches posted for 11).


There is the cstore foreign data wrapper, https://github.com/citusdata/cstore_fdw, but it is still an extension. Hopefully it will go into mainline at some point.


There sort of was, but unfortunately Amazon bought it and renamed it to Redshift and now you have to rent it.


I thought Redshift was built completely from scratch, and only looks like Postgres at the wire protocol level?

Also, it would be really nice if I could do both my transactions and my analytics in the same box. Then ETL is basically just maintaining a materialized view.


Amazon built Redshift by using technology from Paraccel. Paraccel was built on Postgres.

See a Quora answer I wrote:

https://www.quora.com/Amazon-redshift-uses-Actians-ParaAccel...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: