Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  > It's particularly easy to see that the "atomic changes
  > across the whole repo" story is rubbish when you move
  > away from libraries, and also consider code that has 
  > any kind of more complicated deployment lifecycle, 
  > for example the interactions between services and 
  > client binaries that communicate over an RPC interface.
This seems exactly wrong to me. Getting rid of complicated deployment lifecycles is exactly the job that people use monorepos to solve. Wasn't that one of the reasons they are used at Google, Facebook, etc? As well as being able to do large-scale refactors in one place, of course.

You should be able to merge a PR to cause a whole system to deploy: clients, backend services, database schemas, libraries, etc. This doesn't preclude wanting to break up commits into meaningful chunks or to add non-breaking-change migration patterns into libraries -- but consider this: is it meaningful for a change to be broken into separate commits just because it is being done to independent services? What benefit does cutting commits this way give you?

What you want to avoid is needing to do separate PRs into many backend service and client repos, since: (1) when the review is split into 10+ places it's easier for reviewers to miss problems arising due to integration, (2) needing to avoid breaking changes can sometimes require developers to follow multi-stage upgrade processes that are so difficult that they cause mistakes, and (3) when there are separate PRs into different repositories these tend to start independent CI processes that will not test the system as-it-will-be (unless you test in production or have a very good E2E suite -- which would be a good idea in this situation).

I will say that, even in a monorepo, a big change might still happen gradually behind feature flags. But I think that generally it's nice to be able to deploy breaking changes in a more atomic fashion.



I can only attest to how Google does (did) it, but they use the monorepo as a serialized consistent history. There is no concept of deploying "the whole system" -- even deploying a single service requires, mechanically, multiple commits spread across time.

In fact, when I was last there in 2017, making a backwards-incompatible atomic change to multiple unrelated areas of the codebase was forbidden by policy and technical controls (the "components" system). You had to chop that thing up, and wait a day or two for the lower-level parts of it to work their way through to HEAD.

I would generalize this to say that the idea of deploying clients, schemas, backends, etc all at once is an inherently "small scale" approach.


Interesting, RE: Google. Though, even if no other large-scale company is doing this, it seems on face value to be an easier and safer way to develop software up to perhaps a medium scale system/problem (and assuming that you're not needing to make database changes). I've yet to see a benefit to straddling changes across multiple repositories...


The benefit of changes straddling repositories is having separation of control.

For example different npm (cargo, etc) packages are controlled by different entities. Semver is used to (loosely) account for compatibility issues and allow for rolling updates.

A single company requiring multiple repositories for control reasons might be an antipattern and might indicate issues with alignment/etc.


Monorepo tooling (as opposed to a big repo with a bunch of stuff tossed into it) generally provide access controls.

Also remember that non DVCS repos generally have find-grained access controls.


  > The benefit of changes straddling repositories 
  > is having separation of control.
Good point, although a monorepo with a `CODEOWNERS` file could be used to give control of different areas of a codebase to different people/teams.


> This seems exactly wrong to me. Getting rid of complicated deployment lifecycles is exactly the job that people use monorepos to solve. Wasn't that one of the reasons they are used at Google, Facebook, etc? As well as being able to do large-scale refactors in one place, of course.

No, neither of those is why big companies use monorepos. Clearly the kinds of things you wrote are why the general public thinks big companies use monorepos, which is why this argument keeps popping up. But given making atomic changes to tens, hundreds, or thousands of projects does not actually match the normal workflows used by those companies, it cannot be the real reason.

Monorepos are nice due to trunk based development, and a shared view of the current code base. Not due to the capability of making cross-cutting changes in one go.


You do not need Monorepos for trunk based development. In principle you "only" need a common build system. E.g. You can just force the consumption of head via the build system. I think the rollback is still a factor. E.g. in a Monorepo rolling back a change is straightforward, whereas with multiple repos you would have to somehow track that commits from several repos belong together. This would create an additional layer which is avoided by using a monorepo


Without a mono repo and the ability to build from HEAD all the components, it's much harder to be sure a change to a library _is_ actually backwards compatible (think a complicated refactoring).

Otherwise, there is much more fear that a change to an important library will have downstream impact and when the impact does arise, you've moved on from the change that caused it.


Even within the same repo, it is very likely that the old version of your code will coexist with the new version during the deploy roll out. Often having different commits and deploys is a requirement. For instance, imagine that you add a column in the database and also is using it in the backend service. You probably have add the column first and then commit and deploy the usage of the column later, because you can't easily guarantee that the new code won't be used before the column is added. Same would apply for a new field in the API contract.


The old version won't coexist when the change is contained within a single binary, which seems like it would be true in a bunch of cases.

In our monorepo we have to treat database changes with care, like you mention, as well as HTTP client/server API changes, but a bunch of stuff can be refactored cleanly without concern for backwards compatibility.


Do you only have a single instance of the binary running across the whole org? And during deployment to you stop the running instance before starting the new one?


Any change that won't cross deployable binary boundaries (think docker container) can be made atomically without care about subsequent deployment schedules. So this doesn't work for DB changes or client/server API changes as mentioned by OP, but does work for changes to shared libraries that get packaged into the deployment artifacts. For example, changing the interface in an internal shared library, or updating the version of a shared external library.


Seems like a common misconception to me that people seem to believe that you can never change an interface. You actually can as long as it is not published to be used outside of your repositories.


That is only likely to apply in very small-scale environments or companies.

And if only a single binary is produced, quite likely a single source code repo would be used as well - sounds like 'single developer mode', well 'small team' at most.


Our monorepo is millions of lines of source code, and hundreds of developers. Not small scale.

In this scenario, the single binary is the key encapsulation boundary, but your monorepo could be producing N binaries, each of which receives the change.

For example, if the change is to remove a single, unnecessary allocation in a low-level library function used across the repo, you can refactor it out and push the change as N binaries without worrying about compatibility.


We commit the column schema change and the code that uses it in a single commit.

This is handled by our deployment tool. It won't allow the new executable to run until the database has been updated to the new schema.


It's simply unfeasible to do that. The best you can do is blue-green deployments with a monorepo but you will still need at least data backwards compatibility to be able to roll back any change. The only thing you gain with blue-green deployments is slightly easier API evolution.


Nit: you need backward compatibility to roll out the new change. You need forward compatibility to roll it back.

Right?


Yes, but usually you need both.


Oh, agreed. My point was it was easy for folks to think they are safe because they have backwards compatibility, when they actually need both if they are concerned with running a rollback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: