I mean, it's not that surprising that you'll learn better in a stack you already know well - you know enough there to know what you don't know and need to learn. But if you don't know anything about a language, it will be very difficult for you to sort fact from fiction.
It's basically doing the same thing that, say, `return true` might do to indicate a function succeeded, but with more explicit types. However, because it uses `Result`, it can be used with the `try`/question mark operator which can be convenient in some situations.
That said, a couple of the examples here feel a bit strange - they're clever things you can do, but they're not necessarily things you often have to do, particularly for a relatively simple task like this. I think the problem with the author's approach is that they can't distinguish between "weird because Rust is weird" and "weird because the LLM generated bad code", because they (understandably) don't have enough experience in what good Rust code looks like.
As I understand it, in teaching there's an idea of the "Zone of Proximal Development" (ZPD). Some things you can do without help, other things you can only do if others do it for you, and then in between there's all the stuff that you can do with some amount of assistance. Being in this zone is important for learning, at least in theory.
I suspect that's kind of happening here. If you're trying to learn something too abstract or distant from what you currently know, you'll probably use more polished or eli5-y sorts of material, because you don't yet have the skills to understand a more complex version. You're probably not in the ZPD. But if you can figure some things out by yourself, possibly with some amount of help, then you're in the learning zone and can meaningfully progress.
I have similar experiences to you with 3B1B - it's interesting, but I rarely retain anything meaningful after I've finished - and I think it's because he has to explain every part for me to understand what's going on. I'm not in the zone of proximal development because I can't do enough of the work myself. So the end result is an interesting video where someone explains a cool concept to me, but it's not learning because it's not also doing all the foundation work that gets me to the point where I can understand the video for myself.
This is true, although I have to say as someone who doesn't own a car, good public transport can avoid most of those issues. I live in a small-ish city (500K - 1M pop, depending on how you count it), and I can get pretty much anywhere I need to without worrying about schedules and certainly without worrying about strikes. The biggest issue is getting out of the city - that's when it's usually more important to worry about schedules, but it's still mostly doable - and occasionally transporting furniture or something like that.
On the other hand, the benefits I get from that public transport are incredible - it's cheap, it's always there, it requires minimal logistics in groups (no trying to figure out who goes in what car and needs to be dropped off where at what time), it works regardless of my level of inebriation (admittedly I've not pushed that one to any sort of extreme yet), it's safe enough for children to travel independently (no dropping them off and picking them up), and it's largely accessible for people with difficulties walking or moving about.
I think a big part of the issue is that people have tried out poor public transport infrastructure and recognised - often correctly - that their car is way better for them. But good public infrastructure can often be far more convenient than cars, it just requires people to be motivated enough to build and finance it. A neighbour of mine didn't notice his car had been towed for a week because he used public transport so much and so rarely touched his car. When he'd parked his car it was fine, but then they needed to block of the street to do some work somewhere, and he didn't notice they'd confiscated all the cars there. That's the sort of effect that good public transport can have - so comfortable that you can forget you even have a car.
I think the right intuition to have with jj is that `jj st` should show an empty change unless you are actively working on something. `jj commit`, as mentioned below, is a good example of this - it automatically creates a new change and checks it out. The "squash flow" also does this well - you use the branch tip as a staging area and squash work into other changes on the branch as you go along. Either way, once the work is finished, there's an empty change at the tip of the branch.
This is also supported by jj implicitly - whenever you check out a different commit, if the change you were on is empty, has no description, and is the tip of a branch, it's automatically deleted to clean things up for you.
But branches are not just named pointers to a commit. If they were, then checking out the pointer would be the same as checking out the commit itself. But I can check out a commit and I can check out a branch and depending on which I've done, I'm in two different states.
Either I'm in branch state, where making a commit bumps the branch pointer and means the commit will be visible in the default log output, or I'm in "detached head" mode, and making a commit will just create a new commit somewhere that by default is hidden into I learn what a reflog is. And the kicker is: these two states look completely identical - I can have exactly the same files in my repository, and exactly the same parent commit checked out, but the hidden mode changes how git will respond to my commands.
In fairness, none of this is so difficult that you can't eventually figure it out and learn it. But it's not intuitive. This is the sort of weirdness that junior developers stumble over regularly where they accidentally do the wrong kind of checkout, make a bunch of changes, and then suddenly seem to have lost all their work.
This is one of the ways that I think the JJ model is so much clearer. You always checkout a commit. Any argument you pass to `jj new` will get resolved to a commit and that commit will be checked out. The disadvantage is that you need to manually bump the branch pointer, but the advantage is that you don't necessarily need branch pointers unless you want to share a particular branch with other people, or give it a certain name. Creating new commits on anonymous branches is perfectly normal and you'll never struggle to find commits by accidentally checking out the wrong thing.
No they don't. As you noted, one state is "detached head" and any competently set up shell PS1 will tell you that, or that you're on a branch by displaying the name of the branch vs the commit.
> Creating new commits on anonymous branches is perfectly normal
Sorry, that that's an example of more intuitive behavior on jj's partc, you've lost me. I've done that intentionally with git, but I know what I'm doing in that case. For someone new to version control, committing to an unnamed branch doesn't seem like a desired operation no matter which system you're using. What's wrong with requiring branches to be named?
> For someone new to version control, committing to an unnamed branch doesn't seem like a desired operation no matter which system you're using.
We have data on this! I can't cite anything public, but companies like Meta have to train people who are used to git to use tools like sapling, which does not require named branches. In my understanding, at first, people tend to name their branches, but because they don't have to, they quickly end up moving towards not naming.
> What's wrong with requiring branches to be named?
Because it's not necessary. It's an extra step that doesn't bring any real benefits, so why bother?
Now, in some cases, a name is useful. For example, knowing which branch is trunk. But for normal development and submitting changes? It's just extra work to name the branch, and it's going to go away anyway.
Fascinating. The benefit it brings is you can map the branch to its name. Of the, say, 10 branches you've got checked out, how do you know which branch maps to jira-123 and which one maps to jira-234, or if you're using names, which anonymous branch maps to addFeatureA or fixBugB?
More to the point though, what tooling is there on top of raw jj/git? Specifically, there's a jira cli (well, multiple) as well as a gh cli for github as well as gitlab has one as well. When you call the script that submits the branch to jira/github/gitlab, how does it get the ticket name to submit the code to the system under? Hopefully no one's actually opening up jira/github/gitlab by hand and having to click a bunch of buttons! So I'll be totally transparent about my bias here in that my tooling relies on the branch being named jira-123 so it submits it to jira and github from the command line and uses the branch name as part of the automated PR creation and jira ticket modification.
> Of the, say, 10 branches you've got checked out, how do you know which branch maps to jira-123 and which one maps to jira-234, or if you're using names, which anonymous branch maps to addFeatureA or fixBugB?
The descriptions of the changes. I shared some jj log output in another comment, here it is with more realistic messages, taken from a project of mine:
@ vvxvznow
│ (empty) (no description set)
│ ○ uuowqquz
├─╯ Fix compiler panic in error rendering for anonymous struct methods (rue-fwi9)
│ ○ uvlpytpm
├─╯ Stabilize anonymous struct methods feature
◆ lwywpyls trunk
│ Fix array return type unification in type inference
That (rue-fwi9) is the equivalent of jira-123, if I super care about it being obvious, I might put it in the message. But also, I might not, as you can see with the other two. You could also pass flags to see more verbose output, if the first line isn't clear enough, but in general, the convention for git as well is to have that short summary that explains your change, so if it's confusing, you probably need to do better on that.
> Specifically, there's a jira cli (well, multiple) as well as a gh cli for github as well as gitlab has one as well.
These systems do require branches in order to open a pull request. In these cases, I use `jj git push -c <change id>`, which will create a branch name for me, and push it up. This is configured to produce a branch name like steveklabnik/push-mrzwmwmvkowx for a change with the id mrzwmwmv, and ultimately, it's still easier to name locally with m or mr depending on if the prefix is ambiguous. That said, from there I do usually just click the button and then "open pull request" on GitHub, but like, all of these tools (gh is the only one I've used, but I can't imagine that the others do not work, since ultimately, it's a git repo) just work if you want to use them.
Other systems do not even require a branch to submit, and so you don't even need to do this. I would say "submit mr" and it would return me the URL for the created change request. Gerrit does this on top of plain old git.
> how does it get the ticket name to submit the code to the system under?
I haven't worked with Jira in a long time, but with GitHub, if I make a change that fixes issue 5, I put "Fixes #5" in my description, and when the PR is created, it updates ticket #5 to link the PR to that change automatically, no other process needed.
Right, this is a good point: you can if you want to, or if you're working with a system that requires them.
Just in practice, anonymous branches end up feeling very natural, especially during development, and especially if your code review tooling doesn't require names.
They look identical to people who don't know what to look for, and who don't realise that these two states are different, which is the key thing. You can also distinguish them by running `git status`, but that's kind of the point: there's some magic state living in .git/ that changes how a bunch of commands you run work, and you need to understand how that state works in order to correctly use git. Why not just remove that state entirely, and make all checkouts behave identically to each other, the only difference being which files are present in the filesystem, and what the parent commit was?
What's wrong with unnamed branches? I mean, in git the main issue is that they're not surfaced very clearly (although they exist). But if you can design an interface where unnamed branches are the default, where they're always visible, and where you can clearly see what they're doing, what's wrong with avoiding naming your branches until you really need to?
I think this is the key thing that makes jj so exciting to me: it's consistently a simpler mental model. You don't need to understand the different states a checkout can be in, because there aren't any - a checkout is a checkout is a checkout. You don't need to have a separate concept of a branch, because branches are just chains of commits, and the default jj log commands is very good at showing chains of commits.
or
fragmede@laptop:(abcdef)~/projects/project-foo$
Depending on if abranch is checked out, or abcdef which may be HEAD of abranch is checked out.
If you're having to run `git status` by hand to figure out which of the two states you're in, something's gone wrong. (That something being your PS1 config.) If people are having trouble with that, I can see why switching to a system that doesn't have that problem, it just that it doesn't seem like it should even be problem to begin with. (It's not that it's not useful to have unnamed branches and to commit to them, just that it's not a intro-to-git level skill. Throwing people into the deep end of the git pool and being surprised when some people sink, isn't a good recipe for getting people to like using git.)
> What's wrong with unnamed branches?
As you point out, those commits kinda just go into the ether, and must be dug out via reflog, so operationally, why would you do that to yourself. Separate from that though, do you "cd" into the project directory, and then just randomly start writing code, or is there some idea of what you're working on. Either a (Jira) ticket name/number, or at least some idea of the bug or feature you wanna work on. Or am I crazy (which I am open to the possibilty) and that people do just "cd" into some code and just start writing stuff?
VCS aside, nothing worse than opening Google docs/a document folder and seeing a list of 50 "Untitled document" files an my habit of naming branches comes from that. Even though I'm capable of digging random commits out of reflog, if all of those commits are on unnamed branches, and have helpful commit messages like "wip" or "poop", figuring out the right commit is gonna be an exercise in frustration.
As long as you've got something that works for you though, to each their own. I've been using too long for me to change.
The only thing that changed in the two things you wrote was `ranch` -> `cdef`. Every other part of that PS1 output was the same.
Now put yourself in the shoes of a git novice and ask yourself if you'd always notice the difference. At least from my experience, they often don't, especially if they're concentrating on something else, it if they're using an IDE and the visual information about which branch/commit is checked out.
I don't think you're crazy, I think you're just too used to this sort of stuff to remember what it was like to still be learning git. When I say people make these sorts of mistakes, I'm thinking about real colleagues of mine who have made exactly these mistakes and then panicked that commits suddenly had disappeared.
Similarly, I think to you, unnamed branches feel like something complicated because in git that are. Git makes it very easy for commits to seemingly disappear into the ether, even though they are still there. But in jj, they don't disappear - they remain very visible, and the log UI shows them in a way that makes it clear where they come from. The default log UI is something like git's --graph output, which means you see how the different commits interact with each other. I really recommend having a look at the output of `jj log`, because I think then it'll be a lot clearer what I mean when I say that it's not hard to figure out what the right commit is.
Sometimes it seems to me that's only in SWE we allow people to proceed in the workplace without any training. There's enough learning material that people should take a week or something to practice git and not be git novice anymore.
Or you make tools that are easier to use, so that you can spend that week learning something more useful than the finicky details of branch vs detached head checkouts.
Don't get me wrong, git has some accidental complexity (as will any tool introduced, including what I am seeing with jj). But a lot of it is just incidental complexity. It doesn't matter how much lipstick you put on it, at the end of the day some concepts need to be learned.
Did you mean inherent complexity instead of incidental complexity?
I think the inherently complex things in git are (1) the content-accessible object store, snapshots, plus the merkel tree approach to keeping track of commits and parenthood, (2) merges, rebases, and resolving conflicts between two different changes with a common ancestor, (3) possibly syncing commits between different remotes, although I think Git's approach adds accidental complexity to the problem as well.
Everything else is a question of the user interface: how you choose to show a commit, how you choose to update the project files, how you choose to let people create new commits, etc. And I think the Git CLI is a poor user interface in a lot of places. There are a lot of features which are really powerful but difficult to use, whereas actually they could be just as powerful but far more intuitive to use.
In fairness, this is no slight to the Git developers - they are improving a lot of that interface all of the time. And finding a simpler interface is often a lot more hard work than finding the complicated interface, and I don't think I would have figured out how to create something like jj until I'd seen it before.
I think you need to setup something like magit, tig, and maybe lazy git, to truly see the power of git. Most people don't do version control other than snapshotting things every once in a while. They might as well use git inside cron.
A patch is an idea to take the code from a state to another state. It contains both the mechanical work, the intent, and extra metadata. It's essential when doing distributed development work over a single code base. A git repo is a node and it lets you accept and produce new message to the other nodes.
If you care about the state of the canonical repo, you want every step to be able to compile/build/be verified. So that means accepting good patches from everyone else.
But the work of producing new patches can be messy. But you still need to track each steps, branch off to do experiments, catchup with new updates to the foundational model of the code,...
The design of how git store information makes all those operations simple while giving you the maximum control over them. But there's one mechanism that is kinda the bane of every novice: diffing and patching. Because git doesn't store the file deltas. It stores the changed files. Diffing is the interface for inspecting the changes and patching is how those changes are propagated to files.
The default diff mechanism relies heavily on lines. Which suits most programming languages as a line is often the unit of intent. Conflict occurs when git detect that two diffs is trying to alter the same section of the files. It's actually a solution mechanism instead of a problem as most novices see it.
But I don't blame them as a lot of novice don't have experience with reading diff files or use the patch tool to apply such changes. The tree of commits and object store is more easily explained (although it rarely get explained). But a lot of fellows I met are genuinely terrified of resolving conflicts, because they can't read diff files.
I genuinely think that git is an awesome tool. But you need to get familiar with some concepts like: What a commit is, what a branch is actually, why HEAD is important,... The operations like fetch, pull, push, rebase, cherry-pick, commit, checkout,... become way more obvious.
I'd add jj to that list, tbh. It simplifies a lot of stuff, but in doing so it exposes a lot of those core ideas. That's where I got my list of essential complexity, really - the stuff that comes to the fore when you start using jj.
Is the difference between these 3 really that subtle? How do you miss that the line lengths are different and that one of them is a pile of letters and the other is a word? I'm not trying to die on some "git is easy to use" hill because it isn't. Just that the difference between unnamed branches and named branches isn't this hidden undiscoverable thing. I don't have access to this dataset of Mr Klabnik's, but apparently it is.
I don't know how much it matters in AI-codepocalypse because I'll be honest, I haven't used git manually in days. AI writes my commit messages and deals with that shit now. (Claude ends a session with "Want me to push it?" and I just reply "yes".
> Similarly, I think to you, unnamed branches feel like something complicated because in git they are.
It's not that working with unnamed branches in git seems complicated, it's that it seems opposite to how I, and by extension, other people work. Now, obviously that assumption of mine doesn't hold true, otherwise we wouldn't be having this discussion, but going back to my Google docs example, staring at a page of documents called Untitled document isn't helpful, so in my mind, there's just a bit of digital hygiene that's necessary under any system.
> I really recommend having a look at the output of `jj log`, because I think then it'll be a lot clearer what I mean when I say that it's not hard to figure out what the right commit is.
You tell me which commit I want from this `jj log` output:
The effort to label those with `jj describe` what "nrqwxvzl" is vs "nkuswmkt" could just as well be put into using named branches.
Again, I'm not trying to die on a "git is easy to use" hill, because it isn't. My point was that "> these two states look completely identical" isn't true if you setup PS1 to tell you what state you're in.
jj gets a lot of things right, there's no panicking that work got lost. That is a big deal! The emotion toll that the possibility of that happening with git causes on new users is hard to understate. It's hard to not be scared of git after you've lost hours of work after running the wrong command.
I didn't raise $17m to build what comes after git, although I've thought a lot about that problem. Improvements in that area are welcome, and the easier it is for people to work with stacked change sets and branches, globally, can only be a good thing. I can't make everyone else get as good at git as I am (or better!), so I welcome better tooling.
I don't understand your example. Why haven't you added commit messages? Would you do that with git? In what situation are you creating different branches like that and not using either `jj commit` or `jj describe`?
> any competently set up shell PS1 will tell you that
I certainly hope your shell is not running `git` commands automatically for you. If so, that is a RCE vulnerability since you could extract a tarball/zip that you don't expect to be a git repository but it contains a `.git` folder with a `fsmonitor` configured to execute a malicious script: https://github.com/califio/publications/blob/main/MADBugs/vi...
Might want to let git know. It's been a part of the git source code since 2006. If there were an RCE vulnerability from using __git_ps1, one would hope it would have been found by now!
I was able to reproduce it using that script in my PS1 when `GIT_PS1_SHOWUNTRACKEDFILES=1` which triggers a call to `git ls-files`. Without that, it seems to be just calling `git rev-parse` which does not execute fsmonitor.
I was also able to reproduce it with `GIT_PS1_SHOWDIRTYSTATE=1` which invokes `git diff`.
I find rebases are only a footgun because the standard git cli is so bad at representing them - things like --force being easier to write than --force-with-lease, there being no way to easily absorb quick fixes into existing commits, interdiffs not really being possible without guesswork, rebases halting the entire workflow if they don't succeed, etc.
I've switched over pretty much entirely to Jujutsu (or JJ), which is an alternative VCS that can use Git as its backend so it's still compatible with Github and other git repos. My colleagues can all use git, and I can use JJ without them noticing or needing to care. JJ has merges, and I still use them when I merge a set of changes into the main branch once I've finished working on it, but it also makes rebases really simple and eliminates most of the footguns. So while I'm working on my branch, I can iteratively make a change, and then squash it into the commit I'm working on. If I refactor something, I can split the refactor out so it's in a separate commit and therefore easiest to review and test. When I get review feedback, I can squash it directly into the relevant commit rather than create a new commit for it, which means git blame tends to be much more accurate and helpful - the commit I see in the git blame readout is always the commit that did the change I'm interested in, rather than maybe the commit that was fixing some minor review details, or the commit that had some typo in it that was fixed in a later commit after review but that relationship isn't clear any more.
And while I'm working on a branch, I still have access to the full history of each commit and how it's changed over time, so I can easily make a change and then undo it, or see how a particular commit has evolved and maybe restore a previous state. It's just that the end result that gets merged doesn't contain all those details once they're no longer relevant.
+1 on this, I also switched to jj when working with any git repo.
What's funny is how much better I understand git now, and despite using jj full time, I have been explaining concepts like rebasing, squashing, and stacked PRs to colleagues who exclusively use git tooling
The magic of the git cli is that it gives you control. Meaning whatever you want to do can be done. But it only gives you the raw tools. You'll need to craft your own workflow on top of that. Everyone's workflow is different.
> So while I'm working on my branch, I can iteratively make a[...]which means git blame tends to be much more accurate and helpful
Everything here I can do easily with Magit with a few keystroke. And magit sits directly on top of git, just with interactivity. Which means if I wanted to I could write a few scripts with fzf (to helps with selection) and they would be quite short.
> And while I'm working on a branch, I still have access to the full history of each commit...
Not sure why I would want the history for a specific commit. But there's the reflog in git which is the ultimate undo tool. My transient workspace is only a few branches (a single one in most cases). And that's the few commits I worry about. Rebase and Revert has always been all I needed to alter them.
I think there's a sense that magit and jj are in some way equivalent tools, although I don't have enough experience with magit to be sure. They both sit in top of git and expose the underlying model of git far more cleanly and efficiently than the standard git cli. The difference is that magit uses interactivity to make git operations clearer, whereas jj tries to expose a cleaner model directly.
That said, there are additional features in jj that I believe aren't possible in magit (such as evolog/interdiffing, or checked-in conflicts), whereas magit-like UIs exist for jj.
You want the history of a specific commit because if you, say, fixup that commit, you want to know how the commit has changed exactly over time. This is especially useful for code review. Let's say you've got a PR containing a refactor commit and a fix commit. You get a review the says you should consider changing the refactor slightly, so you make that change and squash it into the existing refactor commit. You then push the result - how can the reviewer see only the changes you've made to only the refactor commit? That is an interdiff.
In this case, because you've not added any new commits, it's trivial to figure out which commit in the old branch maps to which commit in the new, fixed branch. But this isn't possible in general (consider adding new commits, or reordering something, or updating the commit message somewhere). In jj, each commit also has a change ID, and if multiple commits share the same change ID, then they must be different versions of the same changeset.
You want the history of the repository which includes the history of each commit, because it's a lot easier to type `jj undo` to revert an operation you just did than it is to find the old state of the repository in the reflog and revert to it, including updating all the branch references to point at their original locations. The op log in jj truly is the ultimate undo tool - it contains every state the repository has every been in, including changes to tags and branches that aren't recorded in the reflog, and is much easier to navigate. It is strictly more powerful than the reflog, while being simpler to understand.
> But there's the reflog in git which is the ultimate undo tool.
That one sentence outs you as someone who isn't familiar with JJ.
Here is something to ponder. Despite claims to the contrary, there are many git commands that can destroy work, like `git reset --hard`. The reflog won't save you. However there is literally no JJ command that can't be undone. So no JJ command will destroy your work irretrievably.
I’ve just tested that exact command and the reflog is storing the changes. It’s different from the log command which displays the commit tree for the specified branch. The reflog stores information about operations that updates branches and other references (rebase, reset, amend, commit,…). So I can revert the reset, or a pull.
`git reset --hard` destroys uncommitted changes. There is no git command to recover those files. JJ has a similar command of course, but it saves the files to a hidden commit before changing them.
Are there any major libraries for OT? I've been looking into this recently for a project at work, and OT would be completely sufficient for our use case, and does look simpler overall, but from what I could tell, we'd need to write a lot of stuff ourselves. The only vaguely active-looking project in JS at least seems to be DocNode (https://www.docukit.dev/docnode), and that looks very cool but also very early days.
Author here. I think it depends what you're doing! OT is a true distributed systems algorithm and to my knowledge there are no projects that implement true, distributed OT with strong support for modern rich text editor SDKs like ProseMirror. ShareJS, for example, is abandoned, and predates most modern editors.
If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.
If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.
When I was starting my research into collaborative editing as a PhD student 20+ years ago, rebase-and-resubmit was well known. It was used in one Microsoft team collab product (I forgot the name). It is 100% legit algo except intermittently-connected clients may face challenges (screw them then).
Unless you have to support some complicated scenarios, it will work. I believe Google Docs initially used something of the sort (diff-match-patch based). It annoyed users with alerts "lets rebase your changes", esp on bad WiFi. So they borrowed proper OT from Google Wave and lived happily since (not really).
One way to think about it: how many users will your product have and how strange your data races / corner cases can get. At Google's scale, 0.1% users complaining is a huge shit storm. For others, that is one crazy guy in the channel, no biggie. It all depends.
First of all, thanks for chiming in! I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing."
Second of all, I actually think we're more aligned than it seems here. What we're really advocating for is being super clear about what your end-user goals are, and deriving technology decisions from them, instead of the reverse. Our goals for this technology are (1) users should be able to predict what happens to their data, (2) the editor always run at 60fps, and (3) we are reasonably tolerant of transient periods of disconnection (up to, say, 30s-1m).
Because of (1) in particular, a lot of our evaluation was focused on understanding which situations users would be unable to predict what was going to happen to their data. This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data. So, the name of the game is to decrease the likelihood of direct editing conflicts, e.g. presence carets in the live-collab case. In particular, we did not notice a meaningful difference between how users view reconciliations of OT and CRDT implementations.
Since our users could not tell the difference, and in fact viewed all options as equally bad ("disastrous" as one user said), this freed us up to consider a much broader class of algorithms, including prosemirror-collab and prosemirror-collab-commit.
I know there is a rich history of why OT is OT, but our final determination was made pretty simple by the fact that the source of the majority of race conditions in our view come from the difficulty of integrating CRDTs and OT directly into modern editing stacks, like ProseMirror. As far as I am aware, prosemirror-collab-commit behaves as good or better on every dimension than, say, an OTTypes implementation would... and mostly that is because it is native to the expressive `Transaction` model of the modern editor. If we had to do interop I think we would have shipped something noticably worse, and much slower.
If you have a different experience I would love to hear about it, as we are perennially in the market for new ideas here.
> I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing.
I'd be very happy to contribute to this if someone wanted to do some storytelling.
I also interviewed Kevin Jahns, the author of Yjs several years ago to get his take on how Yjs works and how he thinks about it all [1]. The conversation was long but I very much enjoyed it.
> This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data.
That's not been my experience. Have edits been visible in realtime? I think about it as if there's essentially 2 use cases where collaborative editing shows up:
1. Realtime collab editing. So long as both users' cursors are visible on screen at the same time, users are often hesitant to type at the same place & same time anyway. And if any problems happen, they will simply fix them.
2. Offline (async) collab editing. Eg, editing a project in git. In this case, I think we really want conflict markers & conflict ranges. I've been saying this for years hoping someone implements conflicts within a CRDT, but as far as I know, nobody has done it yet. CRDTs have strictly more information than git does about what has changed. It would be very doable for a CRDT to support the sort of conflict merging that git can do. But, nobody has implemented it yet.
Sorry, I see why this was confusing. What I'm saying is that because users perceive the results of OT, CRDTs, prosemirror-collab, etc. as data corruption, they require presence carets as a UI affordance, to steer them away from direct edit conflicts.
If you can't have presence carets, yes... likely best to have a diff view. And in our case, we use git/jj for this, rather than CRDTs.
For the "history of" stuff... in spite of the fact that we have our disagreements, I actually think it would be very nice to have Kevin and everyone else on the record long-form talking about this. Just because I am a non-believer doesn't mean it wasn't worth trying!
> For the "history of" stuff... in spite of the fact that we have our disagreements, I actually think it would be very nice to have Kevin and everyone else on the record long-form talking about this. Just because I am a non-believer doesn't mean it wasn't worth trying!
I'd enjoy that too. Hit me up if you wanna chat about this stuff - I'd enjoy the opportunity to convince you over video. Maybe even recorded, if people are interested in that.
> Maybe even recorded, if people are interested in that.
I would be. IIRC you interviewed the Yjs creator years ago in a YT video I watched? This post has been fun spicey discourse not withstanding. It's not often a lot of the people in the collab space are together in the same spot, and the clash of academics and product builders is valuable.
As an aside I'd put Marjin in the product builder side. A lot of people dabble in collab algorithms and have a hobby editor on the side, but he created and maintains the most popular rich text editor framework on the planet(made up statistic, but seems true!).
In our case, we're not using a text editor, but instead building a spreadsheet, so a lot of these collab-built-into-an-editor are, like you say, pedagogically useful but less helpful as direct building blocks that we can just pull in and use. But the advice is very useful, thank you!
Interesting! I am building a spreadsheet and the next few months will be building the collaborative side of it. I think many of the things that work for text don't necessarily translate for spreadsheets.
We made a spreadsheet on top of OT several years ago. Most OT related documentation doesn't talk about how to do this. But it worked pretty well for us.
Author of DocNode here. Yes, it’s still early days. But it’s a very robust library that I don’t expect will go through many breaking changes. It has been developed privately for over 2 years and has 100% test coverage. Additionally, each test uses a wrapper to validate things like operation reversibility, consistency across different replicas, etc.
DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.
I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.
I've looked through the site, and right now it's probably the thing I'd try out first, but my main concerns are the missing documentation, particular the more cookbook-y kinds of documentation — how you might achieve such-and-such effect, etc. For example, the sync example is very terse, although I can understand why you'd like to encourage people to use the more robust, paid-for solution! Also just general advice on how to use DocNode effectively from your experience would be useful, things like schema design or notes about how each operation works and when to prefer one kind of operation or structure over another.
All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.
Thanks! I've recently made some improvements to the documentation. I agree the synchronization section could be improved more. I'll keep your feedback in mind.
If you'd like to try the library, feel free to ask me anything on Discord and I'll help you.
More recently, it's been designed so this is the case. Namespaces, enums, and the property constructor shortcut thing were all added relatively early on, before the philosophy of "just JS + types" had been fully defined.
These days, TypeScript will only add new features if they are either JavaScript features that have reached consensus (stage 3 iirc), or exist at the type system only.
There have been attempts to add type hints directly to JavaScript, so that you really could run something like TypeScript in the browser directly (with the types being automatically stripped out), but this causes a lot of additional parsing complexity and so nothing's really come of it yet. There's also the question of how useful it would even be in the end, given you can get much the same effect by using TypeScript's JSDoc-based annotations instead of `.ts` files, if you really need to be able to run your source code directly.
reply