Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you sort the result? Do you use ts_rank?

What is the size of your database on disk?



I use ts_rank combined with many other factors (as I also rank by time, by message type, etc).

The size of the database is a few dozen gigabytes by now, but that isn’t relevant with tsvector, only the row count has an effect on search speed.


Thanks for following up :-)

I was asking because ranking can be slow in PostgreSQL. PostgreSQL can use a GIN or GiST index for filtering, but not for ranking, because the index doesn't contain the positional information needed for ranking.

This is not an issue when your query is highly selective and returns a low number of matching rows. But when the query returns a large number of matching rows, PostgreSQL has to fetch the ts_vector from heap for each matching row, and this can be really slow.

People are working on this but it's not in PostgreSQL core yet: https://github.com/postgrespro/rum.

This is why I'm a bit surprised by the numbers you shared: fulltext search on 270 million rows in below 65ms on commodity hardware (sub 8€/mo).

A few questions, if I may:

- What is the average number of rows returned by your queries? Is there a LIMIT?

- Is the ts_vector stored in the table?

- Do you use a GIN or GiST index on the ts_vector?

Cheers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: