Small system, big system: Message queue & producer/consumer pattern
If you prefer a video, here you are.
We are going to look at another common pattern that you can use to solve your system design question. I'll call this message queue and producer-consumer pattern. This pattern is really useful when it comes to an application that needs to deal with a lot of unreliable tasks such as networking tasks.
One of the most popular use cases is when you have to integrate with a 3rd party services like payment gateway, notification services that help you deliver your email, in-app notifications etc. These kinds of tasks are rather unreliable because 3rd party services might become unavailable.
The second use case I can think about it is to increase your application parallelism. Imagine when we need to process a very large file such as a video. Instead of having one node to handle the job alone you could spread the job load to multiple nodes so that you can process the file faster
During your high-level design phase, you could still complete the job your application needs to do without using this pattern but at the cost of low reliability and scalability. Let's talk about reliability with this super naive application managed by ourselves and 3rd party services we use
For example, let's say if our application HTTP requests fail for whatever reason. It could be a network problem, it could be a 3rd party service outage. In this situation, application layer retry might work but we can't retry like forever right since it will consume our resources and our own service might becomes unavailable during retry and this might lead to data (that we are supposed to use with the HTTP request) loss.
With this approach, we don't have a way to scale. Let's imagine that we have to deal with a large file like a video. We could only process the next file sequentially after the first process is complete t
Pros and Cons
This is when we enter the message queue and producer-consumer pattern. What are the pros and cons that you bring by introducing this pattern?
Before we interact with the party service we always have a copy of data inside our message queue and this is how it becomes so powerful. In case a 3rd party service becomes unavailable our consumer can stop retrying and proceed with other tasks without wasting the resources. In the meantime, our data will be kept in this message queue and get processed later without worrying about the risk of data loss
Pros: Reliability and Scalability
With our data get stored inside this message queue, our consumers become stateless as they don't have to persist any data. They can always connect and disconnect with the message queue at any time without the risk of losing data. This is a big win for scalability. We can always add our consumers horizontally and thus increase our system's scalability
Cons: Asynchronous communication, time lag
Of course like in life every choice comes with some trade-offs
our programming models have to change from synchronous to asynchronous. in case you are processing a long-running task for your user, you have to find a way to notify your user after you complete this task. Communication just becomes harder
we also introduce time lag since there will be another component between producer and third-party service
These are trade-offs that we are willing to make in most cases. Of course, you have one more failure point in the system since we introduced another component. Most of the time we are willing to give up this in order to provide our system scalability and reliability
One thing that we want to consider is if we want to have another separate queue. Let's say we're integrating with two different payment gateway vendors. To be exact, let's say each of your vendors has an SLA of 95% if you are using a single queue for both vendors. Your pipeline SLA is going to be multiplied by 95% and 95% which is less than 95%
In order to prevent that we could have a separate queue for different vendors. That means each pipeline is going to get 95% per cent of SLA.
Let's imagine that we are designing a news system. In our system, there are breaking news and normal news which has a different priority. We want to prioritize breaking news to help our users to get the best user experience. In this case, although our consumer has to be a little bit smart aka being able to recognize the priority of each queue
At least once guarantee queue vs at most once guaranteed queue
Finally we could also mention if you want an at most once guaranteed queue or at least once guarantee queue.