Domain-driven CRDT patterns, are we CALM yet?
By Arslan Akhmetyanov (arakhmetyanov_1@edu.hse.ru)
Introduction
Development of distributed systems often involves the challenge of replicating data. Аccording to CAP-theorem, a distributed system can deliver only two of three desired characteristics: consistency, availability, and partition tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP) [1] and this fact might lead to problems. To address these issues, the Conflict-Free Replicated Data Type (CRDT) was developed.
CRDT stands for Conflict-Free Replicated Data Type. It is a data structure that can be concurrently updated by multiple clients without the need for coordination, making it perfectly suitable for distributed environments where data conflicts may occur. CRDT ensures that any changes can always be merged into a consistent state [2].
Another important concept to be detailed is the abbreviation CALM, which stands for Consistency as Logical Monotonicity. This principle postulates that if the updates to a distributed system are monotonic, consistency can be achieved without coordination. Therefore, in the following essay, we will study whether the existing CRDTs follow this condition.
What is CRDT and if it is CALM?
As we have already mentioned, CRDTs are a class of data types that are specifically designed to handle conflicts in data by themselves, without the need of using external regulation mechanisms. To start with, let’s have an overview of existing CRDTs in order to understand what CRDT is and find out what the similarities and differences are. In fact, there are several existing approaches for building such data types:
- State-based CRDTs [3],
- Operation-based CRDTs [3],
- Delta-based CRDTs [4].
The first category is a type of CRDT that works by applying a replica of the entire data structure to each node in the system. Each replica is updated independently locally by merging with new external data, and changes are propagated to other replicas by exchanging messages between nodes. When two replicas are merged, they simply compare their states and take the union of any differences. They are relatively easy to implement, however, they are less efficient than the other two other types.
As a result, an idea of op-based CRDTs has been developed. Instead of copying the whole structure, which might be verbose enough, a log of operations is propagated through the replicas. Each node in the system independently applies these operations to its copy of the data structure, so when two replicas are being merged, they simply apply any operations that were not already present in their local log. Operation-based CRDTs also might include data compression, and as a result, this approach is more efficient than the previous one, as it processes and modifies less data.
Lastly, a combination of these two approaches were created - so called delta-based CRDT. As the updating the whole structure whether by another instance or operation is not desired any more, only a small partial operation might be applied. Delta-based CRDTs can be more efficient than both state-based and operation-based CRDTs in terms of memory usage and message passing, as they only need to store and transmit the minimal set of changes required to keep all replicas consistent. They also handle conflicts more naturally, as conflicting changes can simply be merged together. Nevertheless, such CRDTs are also the most complex in implementation.
Despite the different implementation approaches, it is important to notice that they should have something in common that allows CRDTs to deliver required properties. Such common part are the the conditions, that is followed by any implementation of the CRDT:
- Commutativity: The order in which updates are applied to different replicas should not affect the final state of the data.
- Associativity: The order in which updates are applied to the same replica should not affect the final state of the data.
- Idempotence: Applying the same update multiple times should have the same effect as applying it only once.
Optionally, sometimes eventual consistency is also stated as a condition [4]. Nevertheless, to our opinion it seems to be more of a consequence rather than a precondition, so it should not be included in the list.
Returning to the CALM-theorem, it is important to remember what its statement is. The theorem states that if operations performed on the data in the system are monotonic, then the system can achieve eventual consistency without requiring any explicit coordination mechanisms such as locking or synchronization. And it is the condition CRDTs designed for!
Let’s show this more explicitly. Correctly implemented CRDT, that follows all the requirements, guarantees that the operations might be applied in any order and multiple times providing the same result. As a result, monotonicity is also being guaranteed. In fact, this means that no extra organization is required, as data will become consistent because of its structural properties. This ensures that the data do not require any explicit coordination mechanisms, and, consequently, this is what is exactly meant by the CALM theorem.
Conclusion
In conclusion, while CRDTs are a powerful tool for achieving consistency in distributed systems, they are not a silver bullet. There are still challenges to be addressed, such as the complexity of implementing delta-based CRDTs and ensuring that all replicas converge to the same state. Additionally, CRDTs may not be suitable for all use cases, as might be an overkill for small systems.
Moreover, while CRDTs provide only eventual consistency, they do not guarantee strong consistency. In some applications, strong consistency may be necessary, and alternative solutions such as distributed transactions or consensus algorithms may need to be implemented.
Nonetheless, in the case of the essay these points are not so important. The main idea is still valid. As CRDTs fully satisfy the theorem requirements, it could be said - ‘Yes, we are CALM already’.
References
- What is the cap theorem?. IBM. (n.d.-a). https://www.ibm.com/topics/cap-theorem
- About crdts • conflict-free replicated data types. Conflict-free Replicated Data Types. (n.d.). https://crdt.tech/
- Keep calm and CRDT on - arxiv.org. (n.d.). https://arxiv.org/pdf/2210.12605.pdf
- Zagorskii, A. (2018, August 4). CRDT: Conflict-free replicated data types. Хабр. https://habr.com/ru/articles/418897/