Yes, this sort of explains my situation. A requirement appears down the line tha...

dagss · on Dec 16, 2021

I would say this is another problem. If an external call to a web service is involved, then you can NEVER have an atomic call in the first place. One always needs to just have a state machine to navigate these cases.

Even with a monolith, what if you have a power-off at the wrong moment?

What you are describing here is to me pretty much the job description of a backend programmer to me -- think through and prepare for what happens if power disappears between code line N and code line N+1 in all situations.

In your specific example one would probably use a reserve/capture flow with the payment services provider; first get a reservation for the amount, then do the external webservice call, then finally to a capture call.

In our code we pretty much always write "I am about to call external webservice" to our database in one DB transaction (as an event), then call the external webservice, and finally if we get a response, write "I am done calling external webservice" as an event. And then there's a background worker that sits and monitors for cases of "about-to-call events without matching completed-events within 5 minutes", and does according required actions to clean up.

If a monolith "solves" this problem then I would say the monolith is buggy. A monolith should also be able to always have a sudden power-off without misbehaving.

A power-off between line N and N+1 in a monolith is pretty much the same as a call between two microservices failing at the wrong moment. Not a qualitative difference only a quantitive one (in that power-off MAY be more rare than network errors).

Where the difference is is in the things that an ACID database allows you to commit atomically (changes to your internal data either all happening or none happening).

Philip-J-Fry · on Dec 16, 2021

Well that's the thing isn't it. As soon as you move away from the atomicity of a relational database you can't guarantee anything. And then we, like you do to, resort to cleanup jobs everywhere trying to rectify problems.

I think that's one of the things people rarely think of when moving to microservices. Just how much effort needs to be made to rectify errors.

marcosdumay · on Dec 16, 2021

> you can't guarantee anything

You can always guarantee atomicity. You will just have to implement it yourself (what is not easy, but always possible, unless there are conflicting requisites of performance and network distribution).

And yes, the cleanup jobs are part of how you implement it. But you shouldn't be "trying to rectify the problems", you should be rectifying the problems, with certainty.

baskethead · on Dec 16, 2021

You create a state machine.

Create the order, and set the status to pending. Keep checking the web service until it allows the transaction. Set the status to authorized and set the status to payment. Keep trying the payment until it succeeds, officially set the order and set status to ordered.

I really find it hard to believe that the regulations won't allow you to check the web service for authorization before creating the order. If that's really the case then create the order and check, and if it doesn't work, then cancel the status of the order and retry. It's only a few rows in the database. If this happens often then show the data to your local politician or what not and tell them they need to add more flexibility to the regulation.