GitHub Actions was having an outage

the_duke · on June 1, 2022

Friendly reminder of the day:

Always have an established and regularly tested workflow for manual deployments that doesn't rely on CI jobs. You'll need it sooner or later.

pid-1 · on June 1, 2022

That's why my CI pipelines never rely on proprietary stuff.

I use multistage Dockerfiles + Buildkit to implement secrets caching, etc... My CI scripts just do "docker build" and "docker publish".

It makes doing things locally much easier.

imdsm · on June 1, 2022

I've always resisted the change to GHA, preferring instead tried and tested Jenkins and docker workflows. Sure, multi-agent GHA is nice, but when it's down, what do you do? We don't even work in offices to have swordfights in anymore.

zinekeller · on June 1, 2022

I agree here, looking now it seems that I would need three pairs of hands to count how many times Actions have gone down every year. It feels like Actions itself is not ready for production use.

Also nice XKCD reference (https://xkcd.com/303/)!

mrweasel · on June 1, 2022

I wasn't aware that there's a "docker publish" command. Is that an add-on or how does that work?

UndefinedRef · on June 1, 2022

Could he mean "docker push"? I never heard of any publish command.

pid-1 · on June 1, 2022

I meant docker push :(

rozenmd · on June 1, 2022

Your CI should be the automation of an existing, manual process for this reason.

Should the automation fail, your manual process still works.

madjam002 · on June 1, 2022

Anyone recommend any tools that can do CD pipelines on the CLI, maybe something a bit nicer than Makefiles?

neolithicum · on June 1, 2022

There's also taskfile. https://taskfile.dev/

jordemort · on June 1, 2022

I’m liking Earthly a lot: https://earthly.dev/

AbhyudayaSharma · on June 1, 2022

Jenkinsfile Runner is exactly that!

https://github.com/jenkinsci/jenkinsfile-runner/

the_duke · on June 1, 2022

https://dagger.io

major505 · on June 1, 2022

Or a shell script. Always have a backup shell script.

mrweasel · on June 1, 2022

I feel like most of my build or deployment pipelines always end up as just a bash script executed by the CI/CD tool anyway.

major505 · on June 2, 2022

Yeah. I the end is the only thing Jenkins do for me.

roomey · on June 1, 2022

Problem is who will maintain and test it. Without DR (disaster recovery) test days its worth nothing cause something will have been changed and not propagated

plonk · on June 1, 2022

Use the script in your regular CI/CD pipeline. Just add arguments for things that vary between GitHub runners and local machines, like secrets and paths.

yakak · on June 1, 2022

It's certainly a risk to be relying on any proprietary SaaS or Cloud but I don't really get the reason to be doing CI/CD as if it can never be down. There are a lot of risks in having credentials on hot spare alternative systems you rarely need and the number of times I've really had to rollback a deployment quickly is something I can count on one hand. Isn't it better to focus on quality to not need to deploy on a moment's notice?

plonk · on June 1, 2022

You can keep the credentials in a password manager. Presumably the sysadmin already uses the same corporate PM for his regular credentials, so this doesn't increase risk much.

We're a small org but I'd be pretty annoyed if we planned a deployment on a day and ended up having to wait for GitHub to go back up. Deployments can be disruptive and we have to coordinate with other teams.

yellow_lead · on June 1, 2022

SCP, SSH, and cp to the rescue

stingraycharles · on June 1, 2022

Worth noting that this is a Github Actions outrage, as far as I can see normal git operations are working normal.

Maybe it should be included in the title.

Jamie9912 · on June 1, 2022

do you mean outage?

rellfy · on June 1, 2022

why not both?

ntoskrnl · on June 1, 2022

We're living in an outage culture

rvz · on June 1, 2022

Again? Last time that happened was just 5 days ago. [0] Looks like going all in on Github and GitHub Actions doesn't make any sense.

Perhaps for larger organizations like ARM and open-source orgs like ReactOS, RedoxOS, GNOME, etc perhaps it is now time to self-host?

[0] https://news.ycombinator.com/item?id=31527039

viraptor · on June 1, 2022

Keep in mind how likely is it that you'll be able to maintain a better uptime when self-hosting and what's your expected MTTR. Github results may not be that bad.

tpxl · on June 1, 2022

In all my time, every internal downtime combined was better than every external downtime combined, and the worst internal downtime was way better than the worst external downtime.

This shit tends to just work, unless you fiddle with it. And if you do fiddle with a self-hosted version, you can do it after-hours, not in the middle of the day.

apple4ever · on June 1, 2022

You got it exactly right. Every time I use an external service, there is times when its down. Every time its an internal service, there are zero times its down outside of maintenance.

Its why I continue to favor running everything in house (preferably on-prem too),

tete · on June 1, 2022

I read this trope so many times, but actually there's some things to consider on the other side as well.

In reality even GMail had more outages than my self-hosted email for example.

The other is that with self-hosted services and a somewhat sane setup most outages would come from updates. I wouldn't be surprised if that's the case for GitHub Actions too, but usually updates, especially for non public things not accessible for the public are planable.

Self-hosting often allows for very simple setups, and simple things don't break as easily as something that is huge and constantly being changed, due to growth, trying to have more features than the competition, etc.

Self-hosting you theoretically could just have an SSH server with repos on it and no other services running. That thing will likely have a higher uptime than GitHub. You can use gits web interface, which still likely results in a higher uptime. And depending on your needs you can use simple CIs. These things compared to SSH, cgit and some webserver are probably the least stable, but something like buildbot is also pretty stable. And if you don't tinker on it all the time things will remain that way.

Of course if you want/need all the features of GitHub and its competitors you can get that, but depending on what you do it might not be true, simply because self-hosted solutions tend to be a lot less complex. That at least includes all the things that make them commercial SaaS products.

In other words: If it was all of GitHub just doing the hosting that self-hosted solution only for one customer/you that's something where I'd say that GitHub or others would likely do a better job, but for a person/non-huge company self-hosting there's just so many counter-examples. Maybe GitHub is maintaining a better uptime, maybe not.

Also plain MTTR is not what you are looking for, because if you can't (easily) get out a critical update or something for a presentation, because GitHub just made an update that really sucks and you can literally do nothing other than to hope - maybe build a workaround.

If you are having a critical update to push out you likely won't likely do an update of your CI-infrastructure at the same time.

Of course you really should have a plan B for pushing out updates, regardless of the situation. Sadly, that's not the case in many companies. GitHub or other services having outages are considered higher power, while self-hosted solutions being down are usually viewed as failure.

And since we are on biases already I want to emphasize the tinkering part. A lot of us are developers and infrastructure engineers (SRE, DevOps, sysadmins, etc.). We are paid to do and tinker on stuff, but the reality is that we use tons of services that effectively never cause issues. Be it OpenSSH, nginx, apache or others. When was the last time even doing a major upgrade any of these caused issues? Finished releases of software that is built to do one thing and doesn't try to do everything usually works extremely well, often good enough that even a complete novice doing that on the side will achieve higher uptimes.

I think the same is true for simple CI-pipelines.

This is not meant to convince anyone to not use managed solutions or GitHub, but to not end up with wrong assumptions, because the comparisons we do in our heads aren't what we actually want to compare. Self-hosting CI pipelines doesn't mean running GitHub.

We don't think we are able to run a better restaurant, because we are cooking our own food, yet cooking ones own food might be a more reliable approach to not getting hungry, than guessing that the restaurant is always open.

But whatever you do, be sure to keep a backup solution in your freezer. ;)

jacooper · on June 1, 2022

I mean if paying for unlimited CI/CD is just some outages in every few months, then I think its worth it.

Even though i know its certainly a trap, github wouldn't keep actions free.

timvdalen · on June 1, 2022

GitHub Actions isn't free

exdsq · on June 1, 2022

It is for most startups, but not megacorp which has the money anyway. I’ve never gotten close to hitting the CI minutes cap in companies < 100 engineers.

synu · on June 1, 2022

The cap is 2000, will 50 engineers for example really only need 40 minutes of build time each in an entire month? That’s like 5-10 builds, depending on your build duration.. just one or two a week or so. You might use all that up in a single pull request.

In a really productive month my two person startup had to buy more minutes.

exdsq · on June 1, 2022

I could have sworn it was more, but fair enough! I do heavily optimize CI to reduce minutes though, with easy access to local tests (and git hooks to enforce these) and more thorough tests ran nightly on merged branches.

timvdalen · on June 2, 2022

Well, a certain level is included in the price, but that doesn't mean it's free. That level is also quite low if you really want to use it.

jacooper · on June 1, 2022

For open source projects(my use case) it is.

Vogtinator · on June 1, 2022

GNOME (like KDE) self-host their own GitLab.

the_duke · on June 1, 2022

Gitlab, Redox and ARM all all on custom Gitlab deployments. They were probably intended as examples of self-hosting companies, just with confusing wording.

gigatexal · on June 1, 2022

This is where a locally hosted gitlab instance becomes very handy.

Yuioup · on June 1, 2022

Can you do CI/CD with it?

pritambarhate · on June 1, 2022

Yes Gitlab has support for CI/CD. It's available in OSS version also. https://docs.gitlab.com/ee/ci/

thunfisch · on June 1, 2022

Yes. We use a self-hosted GitLab instance with autoscaling runners.

Benjamin_Dobell · on June 1, 2022

Not that it's in any way official, however you can run Github Actions locally with https://github.com/nektos/act

adamgordonbell · on June 1, 2022

I found this so hard to get working. Actually I didn't even get it working, I think I had some dependencies in my build that weren't in ACT or something.

plonk · on June 1, 2022

Seems to have been resolved in less than an hour.

mupuff1234 · on June 1, 2022

What infrastructure/cloud is github using?

phphphphp · on June 1, 2022

Azure.

mupuff1234 · on June 1, 2022

Was this always the case or did they migrate after the acquisition?

exdsq · on June 1, 2022

I’m not convinced their constant outages is related to Azure itself, as Azure is very stable. It could be a change of internal staff.

mupuff1234 · on June 1, 2022

It doesn't have to be due to Azure itself but the migration process itself (which can sometimes take years)

exdsq · on June 1, 2022

I misread the OP as a suggestion it was stable before Azure and is no longer stable on Azure as a general Anti-Microsoft comment. Yes, you’re right.

jabiko · on June 1, 2022

GitHub Actions was only launched after the acquisition. It always ran on Azure.

tete · on June 1, 2022

Given that it's only GitHub Actions I don't think it's likely that in this very situation Azure is to blame.

agilob · on June 1, 2022

This is your daily GitHub Actions outage post

smcleod · on June 1, 2022

Third outage in the last week, I have to say GitHub has seemingly become a lot less reliable over the last 3-4 years.

ankraft · on June 1, 2022

It’s already resolved.

bluish29 · on June 1, 2022

shouldn't title be github actions having outage? because title implies that github is not working.

Mordisquitos · on June 1, 2022

Plus, I would argue that the "another" is unnecessary and editorialising.

cube2222 · on June 1, 2022

Fixed both, thanks!

shaky-carrousel · on June 1, 2022

It is probably windows forcing a reboot on all the machines.

retSava · on June 1, 2022

"Don't worry, all your files are exactly where you left them."

ntoskrnl · on June 1, 2022

Some poor intern spent the morning walking from server to server signing them all into a Microsoft account

katejo · on June 1, 2022

wow done on quickly bases. Great