Hierarchies with Postgres (2013)

jlg23 · on Jan 10, 2016

The ltree[1] extension that is shipped with PostgreSQL does that pretty well.

[1] https://www.postgresql.org/docs/current/static/ltree.html

radiowave · on Jan 10, 2016

Right, and there's a huge number of index-assisted ways in which in can be queried, but ltree has some restrictions on what nodes can be called, which can be a problem if you're trying to represent hierarchies of nodes that already exist.

Most of the operations described in the article are performed by using the && (overlaps) operator, which can make use of a GIN index (though the article doesn't mention this), so probably the article's approach will work OK.

netghost · on Jan 10, 2016

One other factor, I think ltree is based around strung keys, while the approach we took used integers representing priority ordering . So an integer array worked quite nicely.

Ltree seems like it would be good for representing a named categorical hierarchy.

I'd love to see how ltree performs though.

cabirum · on Jan 10, 2016

Is it still supported and tested? The last entry of the extension's homepage is from 2006 [1], I assume this is when the original authors stopped it's development.

[1]: http://www.sai.msu.su/~megera/postgres/gist/

radiowave · on Jan 10, 2016

It's supplied as standard with Postgres. If they weren't willing to accept a bug report against it, they wouldn't be shipping it.

What testing it receives I don't know, but I'm quite prepared to suppose that nobody's found a bug in it since 2006.

agnivade · on Jan 13, 2016

Maybe its just me, but I feel that we maybe stretching the limits of Postgres to cover all of the use cases.

When you need a custom extension to cover your use case, maybe you should use a different tool altogether. I have nothing against Postgres though, its a fantastic DB and I use it everyday.

But there are better tools out there in the market which can be used for these sort of data structures.

peedy · on Jan 10, 2016

Is there a reason for recursive queries not being the first choice?

I had versioning problem to solve recently, and recursive CTE was my first choice (did take some time to wrap my head around but was also performant enough), https://github.com/recipehub/recipehub-service/blob/master/d...

netghost · on Jan 10, 2016

Hey, author here. I'm surprised this got posted to HN. Anecdotally, when I first looked into ways to handle this, we tested recursive CTEs, and they didn't perform as well for our workload.

The indexed arrays work well for doing lots of queries of descendants, though are harder to manage.

One nice thing is that this approach worked well with ActiveRecord, while at the time CTE support was a bit awkward.

I'd love to see some people benchmark other approaches.

rpedela · on Jan 10, 2016

With JSON support complete in 9.5, wouldn't that be a great way to handle hierarchies? The only downside I can think of is the 1 GB field size limit.

saurik · on Jan 10, 2016

Some people, maybe you, have been confusing the updates that allow you to more easily manipulate jsonb data values, including being able to change values inside of them easily with SQL, with the ability to do in-place updates of parts of the field in the database. If you have a field in a row that is over the database page size (something around 8kB), it will get compressed to TOAST, and in any case any change to the content of that field will rewrite the row. If you have a field that is even megabytes large, much less a gigabyte large, that you are intending to update, you are in for a world of hurt. You also will not be able to index inside the value if you do this: usually you want to be able to efficiently say "I want parts of my tree that look liked this"; PostgreSQL is not going to be able to provide you any help, and is only going to be able to use an index to know "ok, your megatree row might match your query", and then it will have to load the entire value into memory so it can manipulate it with SQL and possibly return a sub-result.

netghost · on Jan 10, 2016

I think it depends on how your data is represented. If your data only exists in a single hierarchy, and you don't need to index other values for lookup, a JSON object might work fine.

orf · on Jan 10, 2016

Why would the 1GB field size limit be an issue?

rpedela · on Jan 10, 2016

If your hierarchy exceeds 1GB then you need to break it up somehow into multiple fields, rows, or tables depending on the specific use case.

jacobr1 · on Jan 10, 2016

If your hierarchy for a given node exceeds 1GB Then you almost certainly are using the wrong tool and should consider some kind of graph db.

rpedela · on Jan 10, 2016

Hierarchy != graph (although it can)

There are always better tools but that doesn't mean that it is "wrong". For example, S3 is generally a better tool than Postgres for storing files but there are several legitimate use cases where storing the files in the DB is better than S3.

petepete · on Jan 10, 2016

SQL Antipatterns by Bill Karwin has an chapter that covers this problem in depth.

http://shop.oreilly.com/product/mobile/9781934356555.do

duncanawoods · on Jan 10, 2016

Can you give a summary please - does he say it is an anti-pattern?

petepete · on Jan 10, 2016

If I remember correctly (the book is on my desk at work) he says that adjacency-list hierarchies, without the support of certain DB-specific functionality, can be an antipattern.

For example, if you're using a database without window functions, obtaining the entire tree in one query is impossible. Also, deleting nodes without removing the subtree is complicated compared to other approaches like Path Enumeration, Nested Sets and Closure Tables.

In the application I'm currently writing (with PostgreSQL 9.5) I opted to use the adjacency list approach, mainly due to its simplicity; nothing other than the parent_id field has to be maintained.

netghost · on Jan 10, 2016

How are you using window functions for querying the hierarchy?

petepete · on Jan 12, 2016

Sorry, my post wasn't very clear. I meant I am using a recursive query to get the hierarchy and a window function to visualise it.

netghost · on Jan 10, 2016

There is also SQL For Smarties of you want other approaches.

ape4 · on Jan 10, 2016

materialized path seems really bad for add/delete/inserts.

noir_lord · on Jan 10, 2016

Every approach has trade offs I guess.

This one looks simpler than using Nested Sets which is a plus, if you have a seldom changing tree with lots of reads then this approach is probably a good one.

netghost · on Jan 10, 2016

Adds and deletes are pretty cheap. Inserts can be expensive if you need to re-number the following nodes.

Depends on if you're concerned with ordering as well as parentage.