Essential Reading For Engineering Leaders

How DoorDash Leverages LLMs To Evaluate Search Result Pages

- Yulei Liu

LLM
Architecture

tl;dr: “Traditionally, evaluating search relevance relied on human annotations, which posed challenges in scale, latency, consistency, and cost. To solve this, we built AutoEval, a human-in-the-loop system for automated search quality evaluation that is powered by large language models (LLMs). Through leveraging LLMs and our whole-page relevance (WPR) metric, AutoEval enables scalable, accurate, and near-real-time search result assessments.”

featured in #613

Systems Ideas That Sound Good But Almost Never Work

- Steven Sinofsky

Architecture

tl;dr: “I started my list with “let’s just” because 9 out of 10 times when someone says “let’s just” what follows is going to be ultimately way more complicated than anyone in the room thought it would be. I’m going to say “9 out of 10 times” a lot below on purpose because…experience. I offer an example of two below but for each there are probably a half dozen I lived through.”

featured in #613

Systems Ideas That Sound Good But Almost Never Work

- Steven Sinofsky

Architecture

tl;dr: “I started my list with “let’s just” because 9 out of 10 times when someone says “let’s just” what follows is going to be ultimately way more complicated than anyone in the room thought it would be. I’m going to say “9 out of 10 times” a lot below on purpose because…experience. I offer an example of two below but for each there are probably a half dozen I lived through.”

featured in #612

How Discord Indexes Trillions Of Messages

- Vicki Niu

Database
Architecture

tl;dr: “As guilds on Discord grow larger with longer histories, more and more of them bump up against Lucene’s MAX\_DOC limit of ~2 billion messages. We needed a solution to scale search for these special cases, which we call BFGs, or Big Freaking Guilds. We wanted to retain the performance gains from storing all messages for a given guild on the same Elasticsearch shard, since that still works for the vast majority of guilds, but we needed a solution to scale search for BFGs as well.”

featured in #611

Advancing Invoice Document Processing At Uber Using GenAI

AI
Architecture

tl;dr: “In today’s fast-paced business environment, efficiently managing operational tasks is vital for maintaining workflows. Uber, with its large network of suppliers worldwide, faces considerable challenges in processing a high volume of invoices daily. Invoice processing is a critical function for Uber’s financial operations, directly impacting the efficiency and accuracy of our accounts payable processes. This blog explores how we used GenAI to solve this problem, setting a new standard in financial operations management.”

featured in #610

Human In The Loop Software Development Agents

- Jirat Pasuksmit

Architecture
AI

tl;dr: “Recently, we created the ‘Human-in-the-loop LLM-based agents framework’, or HULA. HULA reads a Jira work item, creates a plan, writes code, and even raises a pull request. And it does all of this while keeping the engineer in the driver’s seat. So far, HULA has merged ~900 pull requests for Atlassian software engineers, saving their time and allowing them to focus on other important tasks.”

featured in #610

Lessons From Building And Maintaining Distributed Systems At Scale

- Eliran Turgeman

Architecture

tl;dr: “When your architecture grows beyond a single container, things you thought were simple can now break in a variety of ways. In this post I want to highlight different lessons I learned while developing and maintaining large distributed systems at scale."

featured in #609

Overclocking DBT: Discord's Custom Solution In Processing Petabytes Of Data

- Chris Dong

Architecture

tl;dr: “At Discord, we faced a challenge that would make most data teams flinch: scaling dbt to process petabytes of data while supporting 100+ developers simultaneously working across 2,500+ models. What started as a simple implementation quickly hit critical limitations to accommodate millions of concurrent users generating petabytes of data.”

featured in #606

Making Uber’s ExperimentEvaluation Engine 100x Faster

Architecture
Testing

tl;dr: This blog post describes how we made efficiency improvements to Uber’s Experimentation platform to reduce the latencies of experiment evaluations by a factor of 100x, milliseconds to microseconds. We accomplished this by going from a remote evaluation architecture to a local evaluation architecture.

featured in #603

In Defense Of Simple Architectures

- Dan Luu

Architecture

tl;dr: Dan discusses the effectiveness of simple architectures in software development, using Wave, a $1.7B company, as an example. Wave's architecture is a Python monolith on top of Postgres, allowing engineers to focus on delivering value to users. The article emphasizes that simple architectures can be created more cheaply and easily than complex ones, even for high-traffic apps. Despite the trend towards complex, microservice-based architectures, Dan argues for the "unreasonable effectiveness" of monoliths, detailing Wave's choices, mistakes, and areas of unavoidable complexity. Simplicity in architecture can lead to success, allowing companies to allocate complexity where it benefits the business.

featured in #599

/Architecture