Microservice: What is Distributed Transaction and How to achieve it.

Roshan choudhary
8 min readMay 30, 2021
Microservices

NOTE: In this article i am assuming the reader has basic understanding of Microservice Architecture and they are here for for understanding the advance concept. If you are newbie in Microservice Architecture world, I would like to recommend 2 websites — where you can find everything related to Microservice Architecture from beginner to advanced level:

https://martinfowler.com/microservices/

https://microservices.io

Microservice Architecture becomes very popular now days But at the same time it also comes with lot of challenges and data management is one of them. One of the most common and important key aspect of data management is Transaction. As we all know transaction say’s “All or Nothing” and in monolithic architecture where there is one data repository for all the services we can easily apply transaction. But in distributed system where there are multiple repository to talk with to complete one set of task its become complex to maintain the data consistency and actual goal of transaction.

So first understand the problem we face while developing a microservice in aspect of data management and data consistency and then we will see how we can resolve this.

Lets take a example of placing order flow of a microservice architecture in any of the online website. So to place order, we will be having two microservices. one is customer wallet and second is order microservice.

Order flow of microservice.

Now let’s understand this order flow in microservices through data flow diagram.

Flow diagram of order request.

Here user with id 7 makes a request to orchestrator for placing an order and than orchestrator makes further two calls here. One for checking and deducting the balance to customer wallet service and second call to order service for checking the inventory and creating an order.

Here problem is, as these two are different service and each service has its own database So while deducting balance and creating order, we can not have transaction of single unit where if one service fails whole transaction got rolled back. Let say during first call the balance got deducted from the wallet service and second call to the order service is preparing the order. in the mean time orchestrator got another request from the user and this time the wallet service will have deducted balance state but that should not be there as till now order for the first request is not completed. So data consistency is missing here.

Also let say in first call to wallet service balance got deducted but the second call to prepare order got failed due to lack of that product in inventory than who will rollback the operation happened in wallet service in respect to the place order call.

To solve these problems and to achieve distributed transaction there are so many methodologies and algorithm which actually helps to solve this problem and some of them are Two phase commit, SAGA, Raft, chubby. Out of all these we will broadly discuss here two of them, that are Two Phase Commit and SAGA.

Before discussing these two methodologies in detail, we can also think of to solve above mentioned problem by having solution like, as we know two services that is wallet and order are dependent to each other so can we have both the services pointing to one database only so in that case we can easily implement SQL transaction.

The answer is yes we can do that but that is not recommended in distributed architecture or microservice as it is anti-pattern where two different services should not be connected to the same database. This is not very reliable and scalable. Also someone can suggest of having replication strategy where instead of connecting to one database we can have two instance of database for each service and that are replicating data with each other. it is some what fine but the only problem you see here is due to replication the consistency will not be there or will be on a tough edge. Now lets discuss our solution.

Two Phase Commit : As name suggests two phase, that means we have two phase to do it. First one is Prepare and second one is Commit. So basically who tells the solution to prepare and commit. So if you remember earlier diagram we had orchestrator, two services and there respective database. Now we have introduced one new component called coordinator. For the sake of simplicity i have written it separate in below diagram but in real world it could be present in any of the microservice or it could be present as a separate service.

So coordinator here is the guy who takes care of two phase commit. Let see how it acts.

When user places an order, the coordinator here first create a transaction id. This is the id which is given to all of the service with whom the coordinator is going to talk in this whole process. After this coordinator have two phase, first to prepare phase and than commit phase.

In prepare phase the coordinator will ask the wallet service and order service to prepare their states that means if they have the required balance and sufficient inventory in their stock. If they have the balance and product they will return the ok message and the state of that particular row in database will be locked with help of transaction in their respective database. Now as coordinator has received ok response for both of the service it will proceed further for the next phase. But if coordinator haven’t received ok message from any one of the service than it will wait until the message is not received and will not initiate the next phase of commit.

In commit phase the coordinator will ask both the service, now they can commit their states that they have prepared and had locked. And again both the service will respond with a ok commit message and than the whole cycle of placing an order completes here.

Now lets discuss failure scenarios here. Let say in prepare phase wallet service has responded with ok message by locking the state as it has enough balance but at the same time order message throws an error as stock in the inventory is not there. So in that case the coordinator will abort the whole transaction. Same goes with if in prepare phase every thing is ok and now coordinator has asked both of the service for commit but due to any reason wallet service failed than also coordinator will abort the transaction. Now here comes one special scenario what if the wallet service committed the transaction but at the same time order service goes down and it was not able to commit the transaction. In this case when the order service node will be up, the coordinator will tell it again to commit the last transaction.

Also here isolation is maintained well. As once the whole two phase commit is in process , the next request can not read or write the data of that particular row as that data is locked by local transaction for the user with id 7.

One thing here we need to keep in mind is to definitely have a timeout. As we have discussed earlier, the coordinator will move to next phase only if he got the message from both the services it requested. The coordinator will not know in this case weather it has to abort the transaction or need to continue so until that time it will keep waiting. But what if one of the service never replied due to some downtime or x y reason. So we must have a time out implemented in the system so after 2 minutes or 5 minutes the request will be aborted automatically.

So two phase commit provides us with a strong consistence model of transaction but at the same time it is bit slow as well as all the calls are http here so there is latency to it. Also it uses locks and for the time the one request is in process the resources are getting held for that time period.

SAGA : We have learnt about two phase commit that in term of consistency how strong this technique is in case of distributed transaction. But also they are bit slow due to latency and main disadvantage of two phase commit is it is synchronous. Saga is also widely used pattern and main advantage of this is that it works asynchronously. The dats is shared between client and server is asynchronously so it is fast in nature. In this methodology all the microservices talks with each by messages over queue or event bus.

Lets take a same example of placing an order. When user places and order the request comes to order microservice and it checks here if the product that is ordered is in the inventory or not. If it is there, it puts the message in the queue that order is created.

Here wallet microservice will be listening to the messages of queue and once he got the message that order is created, it deducts the balance listed of that product from the wallet. Here if all goes well, every thing is succeeded, so the whole cycle is kind of completed.

Now if due to some reason the wallet service fails, let say due to not having enough balance in that case wallet service will add a message to a different queue stating rollback. And there will be a microservice which will be listening to this message queue and will take care of the failure state of the service and rolling back of the transactions of other services associated with this request. It totally depends how you have configured you services to handle the failure scenarios.

This a asynchronous way also we are not using lots of locks here and the transactions are also local to per service and not like two phase commit where the whole process is treated as a part of single transaction. So this methodology is faster than two phase commit. Also isolation is well maintained here since any request is going through the queue all of the messages are sequential (sequential + local transaction = isolation). That’s the reason why microservice will not get the multiple request at the given point of time. you can scale these microservices horizontally and still these microservices will receive these messages sequentially and hence there will not be a isolation problem at all so the complete atomicity will still be provided in this way.

So we have discussed here two ways of dealing with distributed transaction in microservices out of number of ways available. Both has its pros and cons and it depends totally on our requirements that which methodology we are going to choose.

--

--