Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The first paragraph,

> Over the last few years, the software development community’s love affair with the popular open-source relational database has reached a bit of a fever pitch. This Hacker News thread covering a piece titled “PostgreSQL is the worlds’ best database”, busting at the seams with fawning sycophants lavishing unconditional praise, is a perfect example of this phenomenon.

is exactly the kind of gratuitous over the top statement that makes me immediately lose respect for the author.

And the skimming the rest of the article, most the the "points" are just complaints about the tradeoffs made in various design decisions like MVCC, heap tables with indexes on the side, etc. The author is basically complaining "it's not MySQL".

Don't waste your time with this article.



This comment is very far from the mark: it would be closer to the truth to say the article is mostly complaining about lack of tools to help ameliorate problems caused by the architecture.

The author clearly likes Postgres and ends the piece by saying he expects all the problems he talks about to be solved in time.


Ok, what do you say about this one?

> #9: Ridiculous No-Planner-Hints Dogma

One of these "query shifts" that the author mentions happened with a production database where I work. It was down for two days. The query planner used to like using index X but at some point decided it didn't want to use that and decided it wanted to do a table scan inside a loop instead. Meaning: one day a certain query was working fine, the next day the same query never finishes. In my opinion this is unacceptable.

What's your take?


(I'm not not very familiar with Postgres, but this is common among RDBMSs). What changed is size and/or statistics. Also, if the Query Optimizer supports something like parameter sniffing, that happened. Unacceptable? Not really. Annoying? Very much so.


I feel that is the least fair of the complaints (I agree with several of them and have some of my own too). Not because query hints are not disreable but because who is going to pay for maintaing them? It is not really dogma (I, with dome help, managed to convince them to merge one very specific query hint: MATERIALIZED for CTEs) but that they do not want to expose more of the innards of the query planner than necessary to not slow down planner development. Planner development is hard enough as is, with query hints it is going to become at least twice as hard due to even more fear of breaking applications.


STRAIGHT_JOIN is probably my favourite feature of MySQL in terms of planner hints; but there's actually a deeper inconsistency behind the philosophy.

Usually, when you get a bad query plan, it's because the join order isn't right. Outside the start table and hash joins, indexes need to match up with both predicates and the join keys. Get the wrong join order and then your indexes aren't used.

Since you need to specify which indexes to build and maintain, and such indexes are generally predicated on the query plan, why not ensure that the query is using the expected indexes?

If one really wants to go down the route of no optimizer hints, then the planner should start making decisions about what indexes to build and update. Go all in.


> why not ensure that the query is using the expected indexes?

How should your database know this unless you explicitly named the indexes in your query? Just because there exists an index on a particular predicate does not mean that using the index would result in a faster query.


Right db developers decide what indexes to add but aren't allowed decide if and how they are used.

Join order / type and which indexes to use would go a long way, thats pretty much all I need to do on MSSQL server if the planner is not cooperating.


> Join order

Had to fight this a few times, planner thought it was smart to scan an index for a few million rows, then throw almost all of them away in a join further up, ending up with a few hundred rows.

Caused the query to take almost a minute. Once the join order was inverted (think I ended up with nesting queries) the thing took a second or two.


I wouldn't paint all hints with the same brush - it's not like the very fact of having a hint exposes query planner internals for arbitrary usage. Some hints may be more useful than others, and some may be less complicated to maintain - why not try to investigate if there's an intersection of these two sets that would be a valuable addition to Postgres?


Maybe. The traditional query hinting implementations like MySQL's are all about exposing planner internals, but maybe if someone proposed a form of query hints which is less intrusive then there might be fruitful discussion about it. I think a huge issue is that as soon as someone mentions query hints they directly think about MySQL's and similar solutions.


We've had a few cases like that at work with SQLAnywhere, where it suddenly switches to table scans of some critical table.

In almost all cases simply recalculating statistics fixes it. We had one or two cases where we needed to drop and recreate some of the indexes, which was much more annoying.

Doesn't happen often, but really annoying when it does.


i think you need to provide more details for a good reply. what changed between the time index was used and when it wasn’t? I also had to “convince” postgresql to use my index but that lead to a much better design


> i think you need to provide more details for a good reply. what changed between the time index was used and when it wasn’t? I also had to “convince” postgresql to use my index but that lead to a much better design

I disagree: given that nothing changed, I don't think any details need to be provided.

The question is NOT "Is postgresql's choice better than mine?" The question is "A certain design was working and suddenly broke because one day the query planner decided to start choosing a different (and unusable) plan - is this ever acceptable?" and the answer is obviously No, regardless of the details.


I guarantee you that something changed. Maybe the row count passed a certain threshold. Maybe you upgraded the database version.

If you don't want the query planner to pull arbitrary execution behaviour out of its ass, why are you using an SQL database in the first place? The whole point of SQL is that you declare your queries and leave it up to the planner to decide, and for that to be at all workable the planner needs to be free to decide arbitrarily based on its own heuristics, which will sometimes be wrong.


Thing is, MySQL, with judicious use of STRAIGHT_JOIN, won't do the same thing. And generally MySQL is much more predictable because it's much less sophisticated: it only has a couple of join strategies (pre 8.0, only nested loop join) and quite limited query rewriting, so you can - with practice - expect a query plan as you write the SQL. And in practice, there's usually only two or three really big tables involved in performance-sensitive queries, tables which you need to pay attention that you don't end up doing scans on. The rest of the tables you can leave up to the planner.


> And in practice, there's usually only two or three really big tables involved in performance-sensitive queries, tables which you need to pay attention that you don't end up doing scans on.

The problem is that those few big tables are often critical to most of the queries and so each query has to carefully use the right hints or query order if the planner isn't doing much.

It almost makes me wonder if indexes themselves should get hints or at least priorities to help the planner order operations.


I honestly think that for live operations an SQL database is more trouble than it's worth - sooner or later you need more control than it gives you, so you're better off using a datastore that gives you lower-level access to construct and use your own indices explicitly. SQL makes sense for reporting-type use cases where you don't know exactly what queries and aggregations you'll be doing ahead of time (but have a rough idea of which columns you might need to index on), but that's all.


The problem is something changed at a random time in a production db on the weekend in the middle of the night, what changed, is that logged somewhere?

Other databases show that you can have the planner decide if you don't specify but with some simple hints you can override because I as the developer am in charge not the planner.


> I disagree: given that nothing changed, I don't think any details need to be provided.

You sound like a typical enterprise customer. "The whole system stopped working!!!!" "What did you change?" "Nothing!!!" "Are you sure?" "Yes!!!" .. searching around, looking into logs, and so on .. "Could it be that someone did x? The logs say x has happened and had to be done manually." "Oh yes. x was done by me."

But, obviously, nothing has changed.


You sound like you have to deal with idiots all the time and resent that.

You also sound like you don't know much about PostgreSQL if you can't immediately see what happened.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: