/Architecture

Odin: Uber’s Stateful Platform

- Jesper Borlum Gianluca Mezzetti tl;dr: “The Odin platform aims to provide a unified operational experience by encompassing all aspects of managing stateful workloads. These aspects include host lifecycle, workload scheduling, cluster management, monitoring, state propagation, operational user interfaces, alerting, auto-scaling, and automation. Uber deploys stateful systems at global, regional, and zonal levels, and Odin is designed to manage these systems consistently and in a technology-agnostic manner.” This post provides an overview of Odin’s origins, the fundamental principles, and the challenges encountered early on. 

featured in #534


Building And Scaling Notion’s Data Lake

tl;dr: “In the past three years Notion’s data has expanded 10x due to user and content growth, with a doubling rate of 6-12 months. Managing this rapid growth while meeting the ever-increasing data demands of critical product and analytics use cases, especially our recent Notion AI features, meant building and scaling Notion’s data lake. Here’s how we did it.”

featured in #533


How Discord Uses Open-Source Tools For Scalable Data Orchestration & Transformation

- Zach Bluhm tl;dr: “Until recently, we’ve been using an in-house orchestration system that’s provided the foundation for Discord’s data analytics over the last five years. As our data organization grew, it became apparent that both self-service and top-notch observability would be key for our ability to effectively scale as a team. The team embraced an ambitious project: to overhaul our data orchestration infrastructure using modern, open-source tools sharing the candid lessons learned along the way, and how our new system is powering over 2000 dbt tables today.”

featured in #532


How Canva Collects 25 Billion Events Per Day

- Long Nguyen tl;dr: “These use cases are powered by a stream of analytics events at a rate of 25 billion events per day (800 billion events per month), with 99.999% uptime. Our Product Analytics Platform team manages this data pipeline. Their mission is to provide a reliable, ergonomic, and cost-effective way to collect user interaction events and distribute the data to a wide range of destinations for consumption.”

featured in #531


How We Build Experiments In-House

- Vincey Au tl;dr: “Experimentation is an invaluable decision-making tool, and at Canva, it’s a pivotal step in our product development process to quickly test ideas, measure impact, and safeguard the customer experience of over 100 million monthly active users. We split our experimentation platforms into 2 core components: (1) Experiment setup: Creating feature flags and assignments. (2) Experiment analysis: Measuring the impact of the change. In this blog post, we will dive into how the second component, experiment analysis.”

featured in #529


How Meta Trains Large Language Models At Scale

tl;dr: “Our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models that would ingest vast amounts of information to make accurate recommendations that power most of our products. With the advent of generative AI, we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together.”

featured in #525


What Powersync Open Edition Means For Local-First

- Conrad Hofmeyr tl;dr: Local-first app architecture promises instant, collaborative UX and full offline support — but tooling has been proprietary or immature. The release of PowerSync Open Edition opens access to a mature sync layer that solves local-first complexities.

featured in #520


24 Fundamental Techniques For Software Architects

- Patrick Roos tl;dr: “This comprehensive collection gives architects the techniques they need to not only design solid architectures, but to seamlessly align them with business goals. Learn how these techniques enable architects and teams to make informed decisions, minimize risk, and communicate effortlessly with stakeholders.”

featured in #515


24 Fundamental Techniques For Software Architects

- Patrick Roos tl;dr: “This comprehensive collection gives architects the techniques they need to not only design solid architectures, but to seamlessly align them with business goals. Learn how these techniques enable architects and teams to make informed decisions, minimize risk, and communicate effortlessly with stakeholders.”

featured in #514


Test Clocks: How We Made It Easier To Test Stripe Billing Integrations

- Ji Huang tl;dr: Test Clocks simulates the passage of time in billing scenarios without waiting for actual seconds to tick by in the real world. This blog discusses the technical details of how Stripe built test clocks, and how they updated systems to account for the different ways that time passes. 

featured in #513