Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Aurora (Postgres) looked like it would be great, and the performance for our use case was much better than RDS. But there was a showstopper for us with the replication: If the cluster leader fails, _every read replica will force a restart_ to elect a new leader, resulting in minimum ~30s complete cluster downtime. In our situation we have no problem with a short downtime in write availability, but the fact that we can't maintain read availability via replicas is a deal breaker.


I'm no DBA, but is that not the entire point of a cluster/replicas?

Sure, maybe writes won't be accepted or replicated until a new master is elected, but you can't even read from the cluster?

I'm curious at this point: what is it exactly that you gain from a cluster? Is it only having to wait 30s vs. however long it takes for the master to come back online?


That was the mind-boggling part to me. After testing failover and noticing the behavior we had a number of discussions with AWS reps and support engineers that confirmed it. And Aurora's failover documentation hints at it but downplays the severity.


The thing you gain from the replicas is removing read load from the primary during normal operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: