My work life balance at Amazon as an SDE is great but I don't have 24/7 oncall, make of that what you will.
That being said, I agree that the internal tooling at Amazon is not that great. It's actually a place where newer employees tend to clash with older Amazonian because it compares unfavourably with GitHub actions or any modern CI/CD while being significantly better than anything that existed in the early 2010s.
I've heard a lot of testimonies of SDEs maintaining pipelines that are making them miserable.
This is really baffling to me. In fact, since leaving Amazon in June, I've been really frustrated by how much extra work setting up a CI/CD pipeline is in (say) Drone and Argo/Flux than with Amazon's internal tooling. You can set up a standard Amazon pipeline with a single command and answering a few prompts about naming, whereas with the open-source systems you have to hand-craft everything yourself, hack in a way to update the infra repo every time the source repo changes (I blogged more about that here: https://blog.scubbo.org/posts/ci-cd-cd-oh-my/), and it seems like Drone literally doesn't have metrics that indicate broken builds. To say nothing of how well-integrated Pipelines is with alerting and ticketing - yet _another_ integration you would have to build for yourself in OSS-land.
Literally the only advantage I'm aware of for OSS systems is that they allow you to use whatever language, dependencies, build system, etc. that you want - which, yes, fair, if that's something that you need, then that's a deal-breaker, but for the 99.99...% of internal cases that Pipelines works for, it (seems to me to) work flawlessly and smoothly.
What are some unfavourable comparisons that I'm missing?
I've actually really been of two minds about this since leaving AWS, though it's been a few years. On one hand, I remember spending an absurd amount of time debugging esoteric internal tooling errors that were un-googleable. The cases where you were lucky enough to find someone else who had the same issue on the internal stack-overflow or wiki were the good ones.
On the other hand, I've actually found myself missing features from amazon internal tools at the jobs I've had since. One example: I remember a pipelines feature where you could look at any change and quickly tell exactly how far that change had made it in your pipeline. At my current job, determining exactly when change X deployed to region Y has been a multi-step exercise in manually comparing commit hashes.
Right!? The observability on OSS CI/CD pipelines seems to be pretty lacking. Maybe I'll look for an opportunity there for my next position... :)
In fairness, I guess this is the tradeoff you get for being forced into "one and only one way to do it". The benefit is that all the tools are likely to play nice with one another "all the way through the flow", because they only have one representation format to be compatible with. The downside is, well, there's only one way to do it; and if that way happens to suck for you (as, for instance, for an ex-coworker who needed to build multiple different versions of the same code against different base OS images for...reasons), the tools are going to hinder more than they help.
Perhaps it's just a green engineer thing. I know GitHub actions, CircleCI, Drone, etc... I don't know pipelines and sometimes it feels like dark magic to me.
As far as the other tooling goes, I often find that they are very Java-oriented (which is fine considering the AWS background) and everything in Python feels janky.
Ah! OK, now "dark magic" I can totally understand. Pipelines _is_ a little opaque, which is a totally fair criticism. In my experience it does _most_ of the things that you would want if you know a few fairly common (and easily searchable) incantations ("How do I add a new stage to this pipeline?", "How do I put tests after this stage?", etc.) - but I can definitely sympathise with the feeling of "this tool is doing a bunch of stuff for me and I don't understand how it's doing it". Thanks!
One point that I've found _really_ strange after transitioning from Pipelines to other CI/CD (I'm using Drone and Argo, but I think this applies for Flux as well) is that infrastructure-definition repos define the _image-version_ that they deploy to a particular stage, rather than just defining the source repo (and automatically deploying the latest image-version which has passed all preceding tests). That seems...odd. Unless you hack together your own automation[0], that means you need two actions to get new App Code out to an environment:
* Push the App Code commit (and have a new image-version built)
* Make a change to your Infra Code, updating the desired image version
Am I missing something?
---
I'm surprised to hear that you consider AWS to be Java-oriented. When I think AWS, I think TypeScript - to the extent that the primary reason I started learning TypeScript was to be able to write CDK more fluently (and, then, to write Lambdas without a whole bunch of Java type boilerplate - though Python also works there).
[0] Which I've done, but the fact that I had to hand-build it suggests I'm doing something that the tool author's discourage
That being said, I agree that the internal tooling at Amazon is not that great. It's actually a place where newer employees tend to clash with older Amazonian because it compares unfavourably with GitHub actions or any modern CI/CD while being significantly better than anything that existed in the early 2010s.
I've heard a lot of testimonies of SDEs maintaining pipelines that are making them miserable.