Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, link aggregation doesn't work how they think it does. And not having a separate network for Ceph is going to bite them in the arse.

GitLab is fine software but fuck me, they need to hire someone with actual ops experience (based on this post and their previous "we tried running a clustered file system in the cloud and for some reason it ran like shit" post).



They'd save themselves a whole lot of time, effort, and money if they looked at partitioning their data storage instead of plowing ahead with Ceph. They have customers with individual repositories. There is no need to have one massive filesystem / blast radius.


I don't know much about this stuff, but won't that stop working if they ever decide to expand to multiple geographical sites, to reduce latency to customers in different locations? In that case, different sites can receive requests for the same repositories, and ideally each site would be able to provide read only access without synchronization, with some smarts for maintaining caches, deciding which site should 'own' each file, etc. They could roll their own logic for that, but doesn't that pretty much exactly describe the job of a distributed filesystem? So they'd end up wanting Ceph anyway, so they may as well get experience with it now.


Seems this is one of the goals of the article: "We're hiring producton engineers and if you're spotting mistakes in this post we would love to talk to you (and if you didn't spot many mistakes but think you can help us we also want to talk to you)".


My bad, I didn't make it that far down the post before hitting rant mode.


Ceph should get a separate network which is only used for re-replication in case something happens. Consider a node goes down.

Another thing I might recommend is a third network (just a simple 1GB and a quality switch) for consensus. Re-replication can max the network out and further cause consensus fails, causes more re-replication winding down everything ... If that's not possible, add firewall rules to prioritize all consensus related ports high.


They said very little about how they think link aggregation works. Just that they can send packets on both and continue working with only one port. That's basically the definition of link aggregation. So what's wrong about the post?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: