Introduction

Data consistency is one of the main difficulties in distributed systems. To solve it, methods such as distributed transactions, conflict resolution mechanisms and consensus algorithms are often used to ensure data consistency in the system. However, achieving high level of data consistency in a distributed system often requires compromises with performance and availability (according to CAPELC theorem [1]), which makes solving this problem a difficult task. This essay will analyze several strategies that are trade-offs between the components of the CAPELC theorem. The two-phase commit protocol is used in distributed systems to ensure a high degree of data consistency through the transaction confirmation stage on multiple nodes [2]. The saga pattern is commonly used in distributed systems to manage long-running transactions spanning multiple services or components by dividing them [3]. In a system with low consistency requirements and an emphasis on simplicity and performance, the use of neither two-phase commit nor saga is more preferable than those mentioned above.

Selection criteria

The choice between two-phase commit, saga, or neither (i.e. using a simpler approach like auto-commit) depends on the specific requirements and limitations of the system. Let's consider a comparison of these strategies based on the PACELC theorem:

  1. Consistency: If a system requires strict consistency of distributed transactions, two-phase commit may be an appropriate choice. If the final consistency is not so critical, then, in that case, saga may be more appropriate. If consistency is not a critical requirement, a simpler approach without explicit coordination mechanisms – auto-commit - could be considered.
  2. Performance and delay: Two-phase commit leads to additional delay (in the first phase), and may be subject to blocking in the event of a failure of one of the participants. Saga provides greater flexibility and minimizes latency by allowing asynchronous and non-blocking interactions, as well as acting sequentially. If low latency is a priority, saga may be a better fit. The non-use of saga and the two-phase commit ensures minimal delays.
  3. The nature of the system: If the system is highly distributed and contains many independent services or components, the saga template can provide a more natural fit to manage long-running transactions between these distributed objects. Whereas two-phase commit will be needed with a large number of services or replicas dependent on each other. If the simplicity on the first place – using auto-commit is more preferable due to its implementation simplicity.

According to CAPLEC's theorem, the choice between two-phase fixation, saga, or neither is a foregone conclusion and depends on trade-offs between consistency, performance, fault handling, and the distributed nature of the system being designed.

Usage examples

I think using a two-step commit protocol is more appropriate in the financial sector, where high consistency and reliability are crucial. For example, a banking system where the transfer of funds between accounts involves many sub-operations. In this scenario, it is important to ensure that debiting from one account and crediting funds to another account are recorded atomically and among the required number of nodes (to achieve consistency, which can be different in systems). The two-phase commit protocol provides a way to coordinate this distributed transaction across multiple resources, while ensuring that the required number of system participants either commit or abort the transaction sequentially. In this case, violations of invariants with financial assets are excluded, for example, leaving a debit account in the negative or exceeding the credit limit. The use of two-phase commit allows you to maintain consistency at a high level in a system where consistency is one of the important criteria for successful operation.

It seems to me that the saga pattern is applicable in operations involving independent or almost independent areas. For example, a scenario in which a customer places an order for goods and the order fulfillment process includes various steps such as inventory reservation, payment processing, notification and etc. All these services operate in their subject area and almost do not depend on each other, and it is convenient to use a compensation mechanism to handle failures in any case. So, there are several independent events at the time of order creation:

  1. The initial step may include reserving inventory for ordered items. If this step is not performed, a compensating transaction may be started to release the reserved inventory.
  2. The next step may include processing the payment. If payment processing fails, a compensating transaction may be initiated to refund the funds to the customer and release any resources allocated during payment processing.
  3. Notifying the seller of the completed order and changing the seller's statistics (which can be stored in the seller itself, so as not to recalculate every time). This step may be repeated in case of an error, as it is not particularly important in the purchase and sale process.

Using the saga template, the order fulfillment process can maintain consistency and recover from failures by coordinating a series of local transactions with compensating actions.

The non-use of the above strategies can be used in systems where the speed of their operation is important due to the large amount of data. For example, a social media platform with microservices for managing users, creating posts and notifications:

  1. User Service: Manages user profiles, authentication and authorization.
  2. Mail service: Handles the creation and management of user records.
  3. Notification Service: Sends notifications to users about various events such as likes, comments or mentions.

My opinion is that in a scenario where a user creates a record, it is possible not to use complex strategies to achieve consistency. So, when creating an entry, the postal service needs to update the user profile with a new entry and send information about the new entry to all subscribed users. Using a two-phase commit for this operation can lead to an unnecessary waste of time to achieve consistency which is not critical in this scenario. The saga pattern is not needed here, since it cannot be easily broken down into independent actions and processed separately. Moreover, compensating events are not needed here. Instead, a possible consistency model can be used, in which the user service asynchronously updates the user profile after creating a record.

Conclusion

As a result, I think that if consistency is the most significant criterion, then the use of two-phase commit is preferable, if it is more important that any operation affects several relatively independent components, then using the saga pattern will be preferable. In the case where the speed of development and simplicity of the application are the main priorities, the use of auto-commit may be preferable.

References

  1. “PACELC Theorem: an ELC extension of CAP,” DEV Community, Nov. 23, 2022. https://dev.to/pragyasapkota/pacelc-theorem-an-elc-extension-of-cap-2nkm (accessed Dec. 24, 2023).
  2. P. A. Bernstein and E. Newcomer, Principles of transaction processing. Amsterdam: Morgan Kaufmann, 2009.
  3. ‌“Saga Pattern Made Easy,” DEV Community, May 24, 2023. https://dev.to/temporalio/saga-pattern-made-easy-4j42 (accessed Dec. 24, 2023).