What made Discord move to another Database?

What made Discord move to another Database?

Ankit Raj's photo
Ankit Raj
·Aug 17, 2021·

Subscribe to my newsletter and never miss my upcoming articles

If you were to design a system like Discord, how would you go about it? Specifically how they store all the messages.

Discord has about 150 million monthly active users and 19 million weekly active servers. They store all chat history forever so users can come back at any time and have their data available on any device. It counts for billions of messages that are still increasing in velocity and size. 🤯

Untitled design.gif

What's your first thought or question while designing such a system?

Usage pattern. It drastically decides how you want to store the messages. Discord has about the same read-write ratio and extremely random reads. They have voice chats, private chats and Large public servers that rack millions of messages in a month. 📈

Earlier, Discord stored everything in a single MongoDB replica set to iterate things quickly. They created a single compound index on channel_id and created_at. Slowly, with millions of messages pouring in, data and the index could no longer fit in RAM, and latencies started to become unpredictable. They had to move to another database.

Here came Cassandra! An open-source, linearly scalable and distributed database. Discord could now add nodes to scale it and have replicas to tolerate a loss of nodes. It stored all related data contiguously on disk, providing minimum seeks. Screenshot 2021-08-15 at 5.42.02 PM.png Understanding Cassandra can be easy. It comprises two primary keys, a partition key - used to determine which node the data lives on and where it is on disk. The clustering key identifies a row from that particular partition.

Cassandra partition keys can be compounded, so the new primary key became ((channel_id, bucket), message_id).

Discord had a 20 node cluster a few years ago, and as data increases, they will continue to add new nodes as needed. They should do fine even with more data because companies like Netflix and Apple run clusters of hundreds of nodes.

undraw_Online_chat_re_c4lx.png

People often argue about one database being the best, but it's all about use cases and trade-offs. 🤷🏻

How do you store your product data, and why? 🤔

Cover Photo by Alexander Shatov on Unsplash


Thanks for reading this article. I write about system design and break down how companies build their system. Join my weekly newsletter to get more insights! Connect with me on LinkedIn and Twitter

 
Share this

Impressum

Feel free to reach out to me for feedback and suggestions

LinkedIn | Twitter