Understanding DynamoDB Secondary Indexes

- Alex DeBrie tl;dr: Indexes are a crucial part of proper data modeling for all databases, and DynamoDB is no exception. Alex DeBrie, author of The DynamoDB Book, explains the problems secondary indexes solve, how to use them effectively, and how to choose between secondary indexes and alternatives like Rockset.

featured in #503

How LedgerStore Supports Trillions Of Indexes At Uber

- Kaushik Devarajaiah tl;dr: “LedgerStore is an immutable storage solution at Uber that provides verifiable data completeness and correctness guarantees to ensure data integrity for these transactions... This blog covers the significance of LedgerStore indexing and its architecture, which powers trillions of indexes, with a petabyte-scale index storage footprint.”

featured in #503

The Problem With Using a UUID Primary Key In MySQL

- Brian Morrison tl;dr: “UUIDs especially useful in a distributed architecture, where you have a number of systems and databases responsible for creating records. You might think that using UUIDs as a primary key in a database is a great idea, but when used incorrectly, they can drastically hurt database performance. In this article, you'll learn about the downsides of using UUIDs as a primary key in your MySQL database.”

featured in #502

DuckDB As The New Jq

- Paul Gross tl;dr: “Recently, I’ve been interested in the DuckDB project. And one of the amazing features is that it has many data importers included without requiring extra dependencies. This means it can natively read and parse JSON as a database table, among many other formats.” Paul discusses how this has impacted his work. 

featured in #499

How Figma’s Databases Team Lived To Tell The Scale

- Sammy Steele tl;dr: “The data revealed that some of our tables, containing several terabytes and billions of rows, were becoming too large for a single database. At this size, we began to see reliability impact during Postgres vacuums, which are essential background operations that keep Postgres from running out of transaction IDs and breaking down. Our highest write tables were growing so quickly that we would soon exceed the maximum IO operations per second supported by Amazon’s Relational Database Service. Vertical partitioning couldn’t save us here because the smallest unit of partitioning is a single table. To keep our databases from toppling, we needed a bigger lever.”

featured in #498

Postgres Is Eating The Database World

- Ruohang Feng tl;dr: “PostgreSQL isn’t just a simple relational database; it’s a data management framework with the potential to engulf the entire database realm. The trend of “Using Postgres for Everything” is no longer limited to a few elite teams but is becoming a mainstream best practice.”

featured in #498

Better Benchmarks Through Graphs

- Marc Brooker tl;dr: “I believe that one of the things that’s holding back databases as an engineering discipline is a lack of good benchmarks, especially ones available at the design stage. The gold standard is designing for and benchmarking against real application workloads, but there are some significant challenges achieving this ideal.” Marc discusses an approach to develop benchmarks that shine light on a database’s design decisions.

featured in #493

How Uber Serves Over 40 Million Reads Per Second From Online Storage Using An Integrated Cache

tl;dr: “Docstore is Uber’s in-house, distributed database built on top of MySQL. Storing tens of PBs of data and serving tens of millions of requests/second, it is one of the largest database engines at Uber used by microservices from all business verticals. Docstore users and use cases are growing, and so are the request volume and data footprint. This post discusses the challenges serving applications that require low-latency read access and high scalability.

featured in #491

The Billion Row Challenge (1BRC) - Step-By-Step From 71s To 1.7s

- Marko Topolnik tl;dr: “The main thing I'd like to show you in this post is that a good part of that amazing speed comes from easy-to-grasp, reusable tricks that you could apply in your code as well. Towards the end, I'll also show you some of the magical parts that take it beyond that level.”

featured in #491

Let's Talk About Joins

- Crystal Lewis tl;dr: “In general, there are two ways to link our data, horizontally or vertically. When linking or joining data horizontally we are matching rows by one or more variables (i.e., keys), making a wider dataset. When joining vertically, column names are matched and datasets are stacked on top of each other, making a longer dataset. Joins can be done in many different programs (e.g., SQL, R, Stata, SAS). Most of this post will be applicable to any language, but examples in R will be provided.”

featured in #481