Nov 16, 2024

Lies I was Told About Collaborative Editing, Part 1: Algorithms for offline editing

AC
By Alex Clemmer

In early 2024, I began investigating collaborative editing systems for use in Moment’s core text editor.

In some ways, we are in a golden era for the problem. Several algorithms now claim to solve not only the online case (where many people simultaneously edit a document), but also the offline case (where people edit offline for an unbounded amount of time, come online, and “magically” their changes are merged together). I had read the papers and listened to the talks, and like others, I was initially optimistic the community had settled on the “right answer” to the whole problem space.

One of the earliest surprises in the evaluation process was that this is not true——that despite the marketing copy, blog posts, and technical papers stating otherwise, these collaborative editing algorithms provide a systematically poor offline editing experience. Specifically:

The two most popular families of collab editing algorithms (CRDTs and OT) resolve direct editing conflicts unintuitively enough that users generally interpreted their results as data corruption.

Offline editing dramatically increases the likelihood of direct conflicts, which means these algorithms are not, on their own, appropriate for offline editing.

In many ways, this is bad news. And we will spend much of this post discussing why we think these things are true—example anomalies, failure modes, what the practical limitations are, and so on.

But, the news is not all bad. We are heartened to see some recent research (e.g.) reversing the trend of treating collab editing as a pure algorithms problem, and instead treating it as a UI/UX problem in which algorithms assist. One of the goals of this post is to help motivate this line of reasoning, and we will make time at the end to encourage that, too.

Example: a seemingly trivial editing conflict

Let’s start with a game. I provide a trivial editing conflict and you try to guess what happens. Ready? Here’s the scenario: Alice and Bob are both offline. They are editing a document containing the following text:

return render(() => ( <div className="flex text-sm -ml-1"> <div className="flex flex-col space-y-2 py-2 border border-primary rounded"> <div className="px-5">Original document</div> <div className="-ml-px bg-tertiary border-primary p-2 px-4 font-mono font-medium text-sm" style={{ borderLeftWidth: "4px", }}> The Color of Pomegranates </div> </div> </div> ));

Bob changes the spelling of Color to the British Colour. Alice deletes all of the text.

return render(() => ( <div className="flex -ml-1 text-sm"> <div className="flex space-x-2"> <div className="flex flex-col border border-primary py-2 grow space-y-2 rounded"> <div className="px-5">Bob's version (adds 'u' to 'color')</div> <div className="bg-tertiary border-positive p-2 px-4 font-mono font-medium text-sm" style={{ borderLeftWidth: "4px", }}> The Colo<span className="bg-positive py-0.5 text-inverse-primary" style={{ padding: "2px" }}>u</span>r of Pomegranates </div> </div> <div className="flex flex-col border border-primary py-2 grow space-y-2 rounded"> <div className="px-5">Alice's version (all contents deleted)</div> <div className="-ml-px bg-tertiary border-negative p-2 px-4 font-mono font-medium text-sm" style={{ borderLeftWidth: "4px", }}> <span className="bg-negative py-0.5 text-inverse-primary">The Color of Pomegranates</span> </div> </div> </div> </div> ));

Later, both Alice and Bob come online. These edits conflict. The system must reconcile these two conflicting edits, even though it has no knowledge of which one came first.

Q: When all reconciliation is complete and the machines agree, what text will the document contain? Guess by typing your answer into the text field below and clicking ‘submit.’ Note that we accept the empty string as a valid guess.

return render(({ useSetter }) => { const [value, setValue] = React.useState(""); const [committedValue, setCommittedValue] = React.useState(null); useSetter(committedValue); return ( <form className="flex gap-2" onSubmit={e => { e.preventDefault(); e.stopPropagation(); setCommittedValue(value); }}> <input type="text" placeholder="Type your guess here..." className="w-64 text-sm bg-transparent border border-primary hover:border-primary-hover focus:border-primary-focus focus:outline-none focus:ring ring-default rounded h-8 p-2" onChange={e => setValue(e.target.value)} /> <designSystem.ButtonAlpha className="shrink-0" size="xs" label="Submit Guess" onClick={() => setCommittedValue(value)} /> <designSystem.ButtonAlpha size="xs" label="Reset" onClick={() => setCommittedValue(null)} /> </form> ); }) return render(({ useSetter }) => ( <designSystem.InputTextAlpha id="qagejr7axyd1aEYuH5NggAlQ6gNCEEmTPSlYK1l4EU9se79X" placeholder="Type your guess here..." pushCondition="onSubmit" allowEmpty={true} useSetter={useSetter} /> ))

return render(() => { const tests = [ { library: "yjs", libraryHref: "https://github.com/yjs/yjs", libraryVersion: "v13.6.20", expectedDocumentContents: "u", }, { library: "share.js", libraryHref: "https://github.com/josephg/ShareJS", libraryVersion: "v0.7.40", expectedDocumentContents: "u", }, { library: "Peritext", libraryHref: "https://github.com/inkandswitch/peritext", libraryVersion: "89c162d", expectedDocumentContents: "u", }, ]; const testResults = tests.map(test => { const result = structuredClone(test); result.passed = guess === result.expectedDocumentContents; result.resultIcon = result.passed ? <span className="font-bold">[<span className="text-positive">PASS</span>]</span> : <span className="font-bold">[<span className="text-negative">FAIL</span>]</span>; result.resultMessage = result.passed ? "test passed!" : <>expected document text <code>'{result.expectedDocumentContents}'</code>, got <code>'{guess}'</code></>; return result; }) const anyErrors = testResults.filter(test => !test.passed).length > 1; const summary = testResults.filter(test => !test.passed).length > 1 ? <div className="flex gap-2"> <span className="text-sm mt-0.5">❌</span> <div style={{ maxWidth: "32rem" }}> <span className="text-negative font-bold">You'd think so, but no.</span>{" "} <span>The final result for all {testResults.length} collaborative editing algorithms is a document containing <strong>only the string <code>'u'</code></strong>. Yes, you read that right! <code>The Color of Pomegranates</code> is deleted, but Bob's addition, <code>'u'</code> is kept! The reason for this is that they all work roughly the same: in direct conflicts, they first accept any deletes, <em>and then</em> apply any additions.</span> </div> </div> : <div className="flex gap-2"> <span className="mt-0.5 text-sm">✅</span> <div style={{ maxWidth: "32rem" }}> <span className="text-primary font-bold">You did well.</span> <span>You correctly guessed the output of {testResults.length}/{testResults.length} collab editing algorithms! But how did you know what the answer was?</span> </div> </div>; if (guess === null) { return <div className="-mt-2 text-tertiary italic text-sm">Waiting for a guess...</div> } return ( <> <div className="flex"> <div className="-mt-2 flex flex-col gap-2 border rounded border-primary p-4"> {summary} <div className={`flex flex-col space-y-1 ${anyErrors ? "mt-2" : ""}`} style={{ marginLeft: "1.25rem" }}> {testResults.map((test) => { return ( <div className="font-mono font-medium text-sm"> <span>{test.resultIcon}</span>{" "} <span className="font-bold text-secondary "> <a href={test.libraryHref}>{test.library}</a> </span>{" "} <span className="text-sm text-secondary">@</span>{" "} <span className="font-mono text-sm text-secondary font-medium">{test.libraryVersion}:</span>{" "} {test.resultMessage} </div> ); })} </div> </div> </div> </> ) });

✋ 🛑 Did you guess correctly? Seriously—you really should try this before continuing on.

These anomalies are common on direct conflicts.

The example above is interesting because it is a trivial, direct editing conflict which these (popular) collaborative editing algorithms do seem to claim to support, but which empirically produces a document that no human would ever write on their own. This is also a case where users interpreted the result as the Moment product corrupting their data.

So how common is this problem? Every use case is different, but, anecdotally we found that roughly 20-30% of direct conflicts we tried produced results that were similarly—in our opinion—unacceptable, for our use case of offline editing. [nb., I’m not sure if it’s helpful to list some of the anomalies we encountered here, if it is, let me know at [email protected] or the Moment Discord server and I’ll follow up with a more thorough review.]

Ultimately, though, because of the negativity of the feedback and the frequency of its occurrence, we felt we could not defend this result to our users.

These algorithms do seem to claim to support offline editing.

Initially, when we encountered this result, we thought that we must have misunderstood the semantics offered by these tools. Even now, though, when I read the technical copy for these projects, I can’t help but think this scenario really is in scope.

The Yjs readme explicitly states that it “supports […] offline editing.”

The ShareJS landing page claims to allow collaboration “with any amount of lag (from zero to an extended holiday).”

The Peritext authors describe it as “allow[ing] users to edit independent copies of a document” which can be “automatically merg[ed] […] back together in a way that preserves the users’ intent as much as possible.”

I admit to still not being quite sure what to make of this. We are in the market for a solution to this problem! I want to believe! I just don’t see how to reconcile these claims with the frequency and kinds of errors we’re seeing in practice.

Coordination-free algorithms will always have anomalies on direct conflicts.

At some point in the evaluation process we resigned ourself to the fact that the algorithms did not do what we wanted. But a lingering question remained: is this a fundamental problem for the algorithms, or with some careful contributions, could we contribute back and fix it?

Unfortunately, we think it’s a fundamental problem for the algorithms, for a few reasons.

The algorithms only have enough information to guess what the result should be. The algorithms can’t email Alice and Bob to see what they wanted or review it in the GitHub Pull Request UI. It receives Alice’s proposal to delete all the text and Bob’s proposal to change the spelling without knowing what their intent was, and it must use a heuristic to decide what to do with them.

The algorithms operate on characters and have very, very poor guarantees about their output. This is why in the example above, Alice and Bob end up with a document simply containing the letter u. Not a valid sentence, and not even a valid word!

Alice and Bob might make different decisions if they knew what the other was doing. Alice and Bob both made decisions based on the document containing the text, The Color of Pomegranates. If Alice knew Bob changed the spelling, or Bob knew Alice deleted the paragraph, they might change their decision to edit at all.

I think this is probably worth a post all on its own, particularly with more treatment given to the fact that these algorithms don’t—and cannot—obey a causal ordering (cf., Jamie Brandon’s post). But for now, I think we can settle for the intuitive version of this argument.

Offline editing is a UI/UX problem, not an algorithms problem; and other conclusions.

So far the news has been bad. We entered the evaluation hopeful that we’d be able to support true offline editing “incidentally” for having implemented a tricky algorithm. And it is true that we left the evaluation left process, well, bereft of hope—forced to conclude that users were right to think of these algorithms were corrupting their data, that the anomalies were fundamental to the algorithms themselves, and that they would be frequent enough to be a real issue. YMMV, but that is where we ended up.

As I said, though, the news is not all bad. Another way to view this result is as a strong argument for putting significant resources into collaborative editing as a UI/UX problem. The algorithms cannot completely solve the problem, but they can play a role in the solution.

There are a few reasons to be optimistic about this. One is that we already have one widely-adopted UI for merging documents: git! So in some sense, the research question is how much better it can be—more approachable, more accessible, more automatic, etc.

I think this resemblance is not superficial. In 2009, a surprising amount of the discourse was focused on the algorithms git used to automatically merge changes together. git adapted the Myers O(ND) diff algorithm, then used mostly by biologists to do BLAST-like sequence analysis. Bram Cohen thought the diff results were unintuitive, and invented the patience diff algorithm, which was then adopted by bzr, a now-defunct git competitor. The list goes on. I think the primary change is that these debates were focused on producing diffs that humans read; now the debate is whether the algorithms can accomplish this result with no human involvement at all.

Another reason to be optimistic is that some researchers do seem to be concentrating on this problem as a UI/UX problem. I am thinking particularly of Ink & Switch, e.g., the work on collaborative history. I understand Ink & Switch to be operating purposefully a few years ahead of production, but I am excited to see where this lands when it settles down.

Acknowledgements

Heartfelt thanks to Kyle Kingsbury, Sean Gillespie, Ian Livingstone, David Adrian, Ben Linsay, Alex Rasmussen, Lita Cho, Lenny Pruss, Michael Yamnitsky, Camille Fournier, James Turnbull, and Ryan Cox for their comments and feedback.

return render(() => ( <div class="bg-brand-translucent p-8 rounded"> 👋 If you enjoyed this, you may enjoy other entries in <a href="https://www.moment.dev/blog">the Moment devlog</a>. We're also hiring! <a href="mailto:[email protected]">[email protected]</a> </div> ));