Concurrent Compaction in Automerge Repo

November 17, 2024

Introduction

automerge-repo is a library for integrating automerge with pluggable storage and networking providers. One interesting challenge for storage is that Automerge documents have to be regularly compacted (losslessly) otherwise they take up a lot of storage space, but we wanted to have a very minimal storage interface so that it is simple to write a new storage plugin. In particular, we wanted it to be possible to have multiple concurrent writers using the same storage backend without requiring that the user implement transactions or other concurrency controls. Implementing compaction without requiring concurrency control is a bit fiddly and I’ve described our approach a few times in the Automerge discord so I’m writing it up here to have something to point people at.

The Problem

We’ll start with an abstract description of this problem and later we’ll talk about some Automerge specifics. Imagine that our application is appending events to a log. Every event is immediately written to disk. The storage interface is a simple key/value store where the keys are strings and the values are byte arrays. To write an event to disk we’ll encode the event into a byte array somehow (maybe we just serialize it to JSON) and then hash that encoded byte array to get a key to use for the event.

A diagram of three events laid out vertically, with the first event at the top and the last at the bottom

There can be multiple instances of the application writing to the same storage at the same time, so things can actually look a bit like this:

A diagram of two columns of three events next two each other, the left column is labelled ‘instance 1’ and the right column is labelled ‘instance 2’

Here the visual offset of the events and the event names suggest that these events are ordered somehow. The storage interface is just a key/value store though so we have to encode the ordering into the events themselves. There are lots of ways to achieve this, for this example let’s assume all the applications have access to the same clock and they write the timestamp of the clock into the bytes of the event.

Every so often the application compacts the log. The simplest way to describe how to do this is that we read all the events off of the disk, perform the compression, write the compressed log back to disk and delete all the original events. What key should we write the compressed log to? Let’s say "compressed" for now.

A diagram of a column of three events being turned into a single compressed log

To read the log then an application first checks if there is a compressed key and reads that, and then reads any remaining keys. Unfortunately this scheme will have problems in the case of concurrent writers. Consider this situation:

A diagram of a column of three events labelled ‘shared’, then below that a column to the left labelled ‘instance 1’ with one event in it and below that a column to the right labelled ‘instance 2’ with a single event in it. A horizontal rule appears after the ‘instance 1’ event labelled ‘Instance 1 begins compaction’ and then another horizontal rule after the ‘instance 2’ event labelled ‘instance 2 begins compaction’

Here there are two instances of the application which have loaded a shared log - events 1 through 3. Each instance then generates a new event and decides to compact the log. To compact the log each instance reads everything which is currently on disk and then starts compacting it. This means that the events each instance will see are the following:

Instance 1	Instance 2
1,2,3,4	1,2,3,4,5

So far so good. The problem is that we have no guarantees about which order writes to storage will complete in. If instance 1 writes first and then instance 2 then everything is fine, but if instance 2 finished first and then instance 1 then the compressed log on disk will be missing event 5.

The Solution

The basic problem here is that writing to the "compressed" key deletes whatever was there before and because we don’t coordinate around writes to a key it’s always possible that a key has changed since we last loaded it. There’s one easy way to avoid this problem: give every write a unique name. We actually already do this with the individual events which we write using their hash, we can extend this same principle to writing the compressed log.

Instead of writing the compressed log to the "compressed" key we’ll write it to "compressed/<hash> where the hash is the hash of the compressed log. It’s possible that two nodes will independently compress the same events and write to the same key, but that’s fine as it will contain the same data. This means our read and compress operations now become:

When we are reading from the log we’ll make sure to read every key which begins with compressed/. We will need to have some way of discarding duplicate entries from the log - we could give each event a unique ID for example.
When we are compressing the log we read all the individual events off of disk as well as any compressed/<hash> keys, then write a new compressed/<hash> key with the new compressed log in it. Finally we delete all the <hash> keys we loaded and also any compressed/<hash> keys which we loaded.

Altogether what this means is that occasionally we will read some redundant data, but we never lose data and we don’t have to coordinate between processes using the same storage.

Optimising Loads

In cases where the application already has the log in memory when it produces new events (as is the case in Automerge) then the scheme described above can lead to a lot of redundant reads. Every time we compress we read everything off of disk, even though we probably already have most data in memory. We can avoid this problem by having the application keep track of what keys it loaded when it first loaded the log - let’s call this the “live keyset”. Then, every time the application writes a new event to disk it adds that events key to it’s live keyset and every time the application compresses the log it adds the new compressed log key to it’s live keyset and removes all of the previous keys. Some care must be taken to make sure the live keyset is not updated until after storage operations are complete (e.g. don’t add a key to the live keyset until the write operation is complete).

Automerge Specifics

For automerge-repo we implement this scheme as follows:

Every time we load a document we keep track of which storage keys we loaded it from
Every time the user makes a change to a document (using DocHandle.change) we obtain any new changes using Automerge.getChangesSince and we write them to <document ID>/incremental/<change hash> and add the key to the live keyset
Whenever the total size of the incremental changes on disk is larger than the size of the compressed document we compress the document, then we save the compressed document to <document ID>/snapshots/<hash of document heads> and remove all the previous keys from the live keyset (after deleting them)

Note that we use the hash of the document heads instead of the hash of the compressed file as the key for the compressed log. This is because the Automerge compression format uses DEFLATE internally, which is non deterministic and so it’s possible that writes with the same data would produce different encodings. Happily the heads of an automerge document unquely identify the changes in the document, so we can use this as a stable identifier.