Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A list of PostgreSQL libraries, tools and resources (github.com/dhamaniasad)
184 points by lauriswtf on Feb 15, 2016 | hide | past | favorite | 28 comments


What is the best way to ETL data from one Postgres cluster to another (by ETL I mean things like data sanitization and denormalization)?

a) I think most people end up doing some kind of batch synchronization, but I'm interested in streaming solutions.

b) A lot of folks use trigger based replication, but triggers have to be on the primary/master node, and not just on the replicas.

c) Another common solution is to force the database client to write to a message broker, but that opens the door to data discrepancies and synchronization issues.

d) In theory I think the best way is to do something like bottledwater-pg or pg_kafka [1] [2] [3], but I'm not sure how battle hardened these are. I think logical replication of the WAL is the right approach, but there is still not much tooling around this.

[1] https://github.com/confluentinc/bottledwater-pg

[2] https://github.com/xstevens/pg_kafka

[3] https://github.com/xstevens/decoderbufs

PS: There are a bunch of interesting MySQL solutions out there, such as Zendesk's Maxwell:

https://github.com/zendesk/maxwell


Google "Postgres BDR". It is new, but it allows you to replicate changes on master to another master. With some configuration it can even change replicated data.


Thank you. I believe that bidirectional replication will solve the ETL problem, if it can be configured to stream from a replica in the source/oltp cluster (to minimize network IO on the source cluster master db) to the master of the destination/olap cluster. The olap master can mirror the oltp data, but also use triggers to denormalize oltp tables.

One thing I'd like to see is the ability to sanitize/munge data from the source cluster before it is sent to the destination cluster, for masking sensitive data in the source cluster (PCI, HIPAA, etc).


Bucardo does this and the setup it a lot easier than most tools. I'm currently using it to migrate a live postgres database to another location but it can perform ETL as well.

EDIT: doesn't really address your point b) however.


With regards to psql2csv: The default psql can already do this very nicely. Just use \copy.

\copy (select whatever from whatever) to 'yourlocalfile.csv' with (format 'csv')

and if you want column headers, add a "header true" inside of the with clause.

Generally, \copy works just like COPY[1] but it does so from the remote server to the local machine, whereas file names given to COPY are relative to the server.

Yes. A dedicated tool might feel easier initially, but once you know how \copy works, you can always get a CSV file from whatever database you're connected to and no matter what machine you're on.

[1]: http://www.postgresql.org/docs/current/static/sql-copy.html


psql2csv is just a wrapper around what you're describing. The point is that you don't have to write the "boilerplate" when all you need is a single query. Plus, it works nice with stdin and stdout, making it handy for queries stored in a file, or for piping the CSV to another program.


I found this talk by Christophe Pettus [1] very informative. The title is somewhat misleading as most of the talk has little to do with Python, but it's a good introduction to more advanced Postgres concepts. Also available in PDF form [2].

[1] https://www.youtube.com/watch?v=0uCxLCmzaG4

[2] http://thebuild.com/presentations/pycon-2014-pppp.pdf


Does anyone know of sample PostgreSQL databases? Something akin to Northwind (https://northwinddatabase.codeplex.com/)?


MusicBrainz is powered by Postgres. You can download a snapshot of their database here: https://musicbrainz.org/doc/MusicBrainz_Database

It's a few gigabytes of data, compressed.


I just found this list of sample databases -- https://wiki.postgresql.org/wiki/Sample_Databases

I'm not sure how good they are though.


Not sure if this is what you mean but PostgreSQL 9.5 introduced TABLESAMPLE. https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9....


Sorry, no, I'm meaning example databases with 'fake' or sample data.


Some time ago I converted Northwind to PostgreSQL for testing. I should still have that somewhere. If you'd like an SQL dump of that, drop me a line: jakob@eggerapps.at


Can't it just be a 'list' of Postgres stuff? Do we really expect that it's generated by an algorithm if someone doesn't write 'curated'? If it's a good list, I'm going to find it interesting no matter how it was created.


We took 'awesome' and 'curated' out the title, as per usual.


Thanks for your awesome curation of this site :-)


Yes. Thanks for your great work on this site.Keep up the good work.


'Curated' implies ongoing maintenance, and not just a brain-dump.


Good work. I am a big fan of PostgreSQL. It has made life easier for me.


Amazon RDS doesn't make the list of "hosted Postgres"?


Is there a PostgreSQL that anyone here can recommend?


I have read every book on Postgres, they are literaly all crap, boring, tedious and simplistic. On contrast, the on-line docs on the postgres web site, are superb, matching the very high quality of Postgres itself. It is possible to write interesting books with personality even for a dry topic such as Sql eg Itzik Ben-Gan for sql server for general sql Joe Celko opinionated but entertaining, Stephane Faroult one if the best ever sql books "The Art of SQL" , Use the Index luke is great - even Chris Date's academicy books are more interesting than any single postgres book which is a shame since Postgres is truly amazing application.


You can try Postgres "zero install" with Postgres.app for OSX, otherwise docker. http://www.win-vector.com/blog/2016/02/databases-in-containe...


Thank you! But I meant book, a PostgreSQL book; but somehow I didn't type that :(


The PostgreSQL manual [1] is pretty good IMO. Have you looked at that?

[1] http://www.postgresql.org/docs/manuals


The PostresSQL manual & documentation are one of my favourite things about PostresSQL, especially when I was first starting out :).


Here are some books: http://www.postgresql.org/docs/books/

I can't vouch for the quality of most, but I know that Schönig's books usually are good.


It isn't PostgreSQL, but I like "SQL for smarties."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: