Instant Messaging or Real-time Communication: Behind the scene

Published in

CodeX

8 min readSep 8, 2021

To many people, especially the guys working with Node.js, this topic is not quite new. There are a lot of tutorials like “Create a simple chat app using Node.js”, “Realtime chat app using Node.js Express, and Socket.io”, etc. They somehow make chat applications become a “hello world” for any of us who desire to start our journey with Node.js. The good thing about their existence is to give you self-confidence. Up to some extent, how are you not confident if you are able to build something like Facebook Messenger within such a short time (1h maybe?)? The bad thing is that it is not that easy.

The fact is, no matter how much effort has been proposed, real-time communication (RTC) is still a hot topic in both the industry and academic community. Due to many out-of-control impacts, in terms of unstable network connectivity, duplicate/conflict connection, duplicate message, etc., it is challenging to obtain a real real-time experience (no, it is not the one you feel as finishing the hello-world tutorials. In this post, I will provide you with an overview of practical issues that have been considered by big guys like Facebook, Slack, Discord, Telegram, etc.

Note that as it is only about real-time communication, I would suggest you take a look around at basic concepts like 2-way communication, WebSocket, server-sent events, short/long polling, MQTT. Also, it is not about DB system design or selection or coding, just RTC.

From big guys

What would you do if your boss asks you to build a chat system like Facebook Messenger, or Whatsapp? Here is the list you might go through:

Find tutorial?
System design ebook
Design a chat app for millions of users on educative.io
Hopefully, the Facebook engineering blog leaked something about their work
Google “how to make a chat app” with more than 3 billion results.

But there is an essential step you should do, either from the beginning or after all those steps: open their web application, inspect the data exchanged between browsers and servers. That’s what I am going to show you in the next section, on the basis of the main features of a chat backend:

List threads, list messages
Send message
Receive message
Typing
Seen

Ready? Let’s go.

Facebook messenger

Technology: MQTT over WebSocket

Feature: send/receive messages, typing, request, etc. all over Websocket

Look at the response data of WebSocket requests to their edge-chat.messenger.com. All the requests to load data, list items, send/receive messages, typing, are performed via publishing messages and subscribing to Websocket, not API.

Instagram

Same company, and thus same technologies. But not the same origin, therefore with some difference.

Technology: MQTT over Websocket + API

Feature: send/receive messages and typing over Websocket, list items, seen, list user online, etc. over API.

Technology: Websocket

Feature: all over Websocket

Slack

Technology: Websocket + API

Feature: receive messages and typing over Websocket, list items, sending messages over API.

Discord

Technology: Websocket + API

Feature: receive messages over Websocket, list items, sending messages, typing, seen over API.

Zalo

Technology: API and HTTP long polling

Feature: receive messages over HTTP long polling, list items, sending messages, typing, seen over API.

Our turn

So we come up with some observations:

Big guys completely rely on WebSocket/TCP
Not very big guys combine Websocket to receive messages from the server and API to send messages to the server.
Zalo case — no idea

Here are some of my thoughts:

While the performance of using Websocket is much better than calling HTTP requests, scaling Websocket server is a non-trivial task since it is a stateful communications protocol as it keeps a single, persistent connection open. Thus it is only used for servers to send messages to clients. If you use it for other logic tasks like recording messages from clients, you are asking it to do what it is not supposed to do.
HTTP API is useful to send messages from clients as scaling stateless APIs is much easier than it is to scale Websocket. Additionally, it can reuse existing middleware layers for authentication, authorization, rate-limit, etc.
For those who can build scalable Websocket systems, it is preferable to completely move HTTP requests to use Websocket in order to optimize message transmission. It requires synchronizing the entire mechanism of requesting data, even for non-real-time tasks.
It is understandable that Instagram is bought, not originally developed, by Facebook, there must be a gap between Facebook’s synchronization system (Websocket-based) and the existing Instagram one.
For apps like Slack, Discord, they leverage the scalability of HTTP APIs and only use Websocket to send messages from servers or for high-rate requests like typing.
The approach that Zalo follows can be explained by 3 advantages i) supporting very old browsers with HTTP long polling compatibility, not Websocket, ii) Web Zalo could be an extra version to mobile Zalo, iii) easier to scale API than Websocket.

Next time, if you hear anyone suggesting to fully use Websocket for both sending and receiving messages, the guy must be:

Either new to real-time requirements and lacking practical experience and at the end of the day, the system is difficult to be scaled and unable to handle as the traffic demand increases.
Or very very experience with large-scale systems like Facebook or Telegram.

Behind the scene

In this last section, let’s (theoretically) experience things (pains in the a**) that hello-world teachers never had a chance to tell you.

Socket.io?

This library seems to be a magic panacea as it helps us from boring tasks like setting up a server, ping/pong, keep alive, store session, blah blah. All you guys could start with it if working with real-time Node.js. Unluckily, there are things you will face as deeply involving in it.

No QoS mechanism. Soon you will experience discomfort when the network at the client-side is not that stable as expected. Messages are lost as the connectivity keeps going up and down and socket.io — without guaranteeing the QoS, can’t be trusted.
Scaling? When a Websocket node is overloaded, you will scale it up or out. That’s what you normally do, correct? At this moment, the Redis adapter will come to the picture as a bridge among nodes via Redis pubsub. The thing is, the counting callback mechanism of pubsub according to the number of physical nodes resulted in several troubles when attaching to other processes that emit messages. And also because Redis pubsub is used over the entire Redis server rather than database number, messages from different environments are easily messed up.
Version break. So far, socket.io is still under development and some features are only available on the latest version. At the moment we adopted it, Typescript and Adapters parts were not completed and there were a lot of errors during the development phase.
Ack by callback. I don’t like callback, that’s it. I only use fire and forget events, no more, because I don’t think I am totally able to handle any consequence of overusing callback.
Unstable ping/pong: Even with good connectivity, socket.io client still frequently disconnects after a period of time regardless of changing ping/pong default timeout or increasing timeout for load balancer. All these issues have been open and if you have it, it is simply that you are not lucky.

In general, socket.io does its job at a basic level. To adopt it for a real project with stability requirements, there are still a lot of works to do, i.e. QoS guarantee, scaling, tracking this tracking that. The most challenging part is to synchronize the state of the client and server when the disconnection occurs.

MQTT

MQTT is popular in IoT applications due to its small transmission data and is thus suitable for the unreliable network of smart devices. It is designed with pub/sub mechanism and has good built-in features like QoS, persistent session, last will message. This is the protocol used by Facebook for their real-time app as aforementioned — MQTT over Websocket.

We tried because we want our app to reach the top of the world. But, again, it is not that easy.

Changing our mindset. There are no concepts like room or routing backend in MQTT. Room is to manage at the backend of socket.io and facilitate the creation of chat room, insertion of new users, and automatic emission of messages to chat rooms. With MQTT, you need to manage room by yourself, emit messages to each topic by yourself because how the topic is listened to depends on the client. The subscription to a certain topic happens only after the connectivity is established while you are limited in forcing clients to unsubscribe or subscribe to new topics.
Scaling? No, it is just marketing or advertisement, whatever you call, when guys like Mosquitto, VerneMQ, or EMQ X say they can scale very well. When we tried several of those message brokers for clusters with high traffic demand, a lot of issues come and they are too painful to fix one by one. C’est la vie, you use their products, you depend on them, and if they don’t fix the issue, what can we say?
Conflict connection. This is very tricky on the client side. In order to leverage the persistent session feature, we need to keep client ID unchanged — which is a challenging task if the app is not well-coded. Many times even though only 1 MQTT connection instance is defined, we really had no idea how the hell several instances with the same ID could exist. As a result, the clients kick out each other and none of them can connect. In these cases, none other than frontend guys can troubleshoot the issue.
Security. It is more difficult to fulfill security requirements for an MQTT system and for a socket.io one, mainly because of the organization of the topic and subscribe pattern.

If you notice, you might figure out that MQTT is more vulnerable than API approach. Security issues come not only from our implementation but also from 3rd-party message brokers. A security issue on EMQ X allowing to bypass auth to send fake data to other clients has been fixed in our team and is still an open issue. At least, implementing MQTT improves the reliability of our chat system, avoids missing messages as previously.

Conclusion

Real-time communication is a challenging problem and requires a lot of knowledge and experience. The same requirement, i.e. “a messenger like Facebook” can be done in 1h, 3h, and also a couple of years depending on the situation.

Acknowledge

I would like to send my big thanks to Quang Minh (a.k.a Minh Monmen) for the permission to translate his original post.