/Database

Just Make It Scale: An Aurora DSQL Story

- Werner Vogels tl;dr: “In this post, Niko and Marc - the two senior principal engineers who built DSQL - provide deep technical insights on Rust and how we’ve used it to build DSQL. It’s an interesting story on the pursuit of engineering efficiency and why it’s so important to question past decisions – even if they’ve worked very well in the past.”

featured in #620


Why Query Caching Is the Most Cost-Effective Way To Scale Databases

- Gautam Gopinadhan tl;dr: Most teams try to scale databases by throwing hardware at the problem, duplicating data, or rewriting slow queries, often at great cost. But there's a quieter and far more efficient path: SQL-layer query caching. It cuts load, reduces tail latency, and simplifies scaling, without migrations or infrastructure sprawl.

featured in #619


On The Road To Your Own Vector DB

- Doug Turnbull tl;dr: “These vectors correspond to vector embeddings, a representation of a word, sentence, image, or, really anything. Embeddings come out of models that move similar items closer. Our model might know that "Mary had a little lamb" is very similar to "Little bo peep had a sheep" - yielding nearly identical embeddings - despite sharing no important words.”

featured in #613


How Discord Indexes Trillions Of Messages

- Vicki Niu tl;dr: “As guilds on Discord grow larger with longer histories, more and more of them bump up against Lucene’s MAX\_DOC limit of ~2 billion messages. We needed a solution to scale search for these special cases, which we call BFGs, or Big Freaking Guilds. We wanted to retain the performance gains from storing all messages for a given guild on the same Elasticsearch shard, since that still works for the vast majority of guilds, but we needed a solution to scale search for BFGs as well.”

featured in #611


Database Design For Google Calendar: A Tutorial

- Alexey Makhotkin tl;dr: “In this database design tutorial I’m going to show how to design the database tables for a real-world project of substantial complexity. We’ll design a clone of Google Calendar. We will model as much as possible of the functionality that is directly related to the calendar.”

featured in #587


Database Design For Google Calendar: A Tutorial

- Alexey Makhotkin tl;dr: “In this database design tutorial I’m going to show how to design the database tables for a real-world project of substantial complexity. We’ll design a clone of Google Calendar. We will model as much as possible of the functionality that is directly related to the calendar.”

featured in #586


Database Sharding Explained

- Mahdi Yusuf tl;dr: Mahdi discusses when to use it, how it can be set up, why we shard data stores and various options you have before sharding.

featured in #584


Migrating Billions Of Records: Moving Our Active DNS Database While It’s In Use

- Alex Fattouche Corey Horton tl;dr: “When initially measured in 2022, DNS data took up approximately 40% of the storage capacity in Cloudflare’s main database cluster (cfdb). This database cluster, consisting of a primary system and multiple replicas, is responsible for storing DNS zones, propagated to our data centers in over 330 cities via our distributed KV store.” 

featured in #565


Things You Should Know About Databases

- Mahdi Yusuf tl;dr: "So, without fully getting into the weeds on database-specific quirks, I will cover everything you should understand about RDBMS indexes. I will touch briefly on transactions and isolation levels and how they can impact your reasoning about specific transactions."

featured in #553


Dealing With Large Tables

- Benjamin Dicken tl;dr: “Large databases often have a small number of very large tables that makes scaling difficult. How can you scale with these while keeping your database performant? This article covers vertical scaling, vertical sharding and horizontal sharding.”

featured in #550