Essential Reading For Engineering Leaders

Building And Scaling Notion’s Data Lake

Architecture
Data

tl;dr: “In the past three years Notion’s data has expanded 10x due to user and content growth, with a doubling rate of 6-12 months. Managing this rapid growth while meeting the ever-increasing data demands of critical product and analytics use cases, especially our recent Notion AI features, meant building and scaling Notion’s data lake. Here’s how we did it.”

featured in #533

How Discord Uses Open-Source Tools For Scalable Data Orchestration & Transformation

- Zach Bluhm

OpenSource
Architecture

tl;dr: “Until recently, we’ve been using an in-house orchestration system that’s provided the foundation for Discord’s data analytics over the last five years. As our data organization grew, it became apparent that both self-service and top-notch observability would be key for our ability to effectively scale as a team. The team embraced an ambitious project: to overhaul our data orchestration infrastructure using modern, open-source tools sharing the candid lessons learned along the way, and how our new system is powering over 2000 dbt tables today.”

featured in #532

How Canva Collects 25 Billion Events Per Day

- Long Nguyen

Architecture
Analytics

tl;dr: “These use cases are powered by a stream of analytics events at a rate of 25 billion events per day (800 billion events per month), with 99.999% uptime. Our Product Analytics Platform team manages this data pipeline. Their mission is to provide a reliable, ergonomic, and cost-effective way to collect user interaction events and distribute the data to a wide range of destinations for consumption.”

featured in #531

How We Build Experiments In-House

- Vincey Au

Architecture

tl;dr: “Experimentation is an invaluable decision-making tool, and at Canva, it’s a pivotal step in our product development process to quickly test ideas, measure impact, and safeguard the customer experience of over 100 million monthly active users. We split our experimentation platforms into 2 core components: (1) Experiment setup: Creating feature flags and assignments. (2) Experiment analysis: Measuring the impact of the change. In this blog post, we will dive into how the second component, experiment analysis.”

featured in #529

How Meta Trains Large Language Models At Scale

Architecture
Scale

tl;dr: “Our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models that would ingest vast amounts of information to make accurate recommendations that power most of our products. With the advent of generative AI, we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together.”

featured in #525

What Powersync Open Edition Means For Local-First

- Conrad Hofmeyr

Architecture

tl;dr: Local-first app architecture promises instant, collaborative UX and full offline support — but tooling has been proprietary or immature. The release of PowerSync Open Edition opens access to a mature sync layer that solves local-first complexities.

featured in #520

24 Fundamental Techniques For Software Architects

- Patrick Roos

Architecture

tl;dr: “This comprehensive collection gives architects the techniques they need to not only design solid architectures, but to seamlessly align them with business goals. Learn how these techniques enable architects and teams to make informed decisions, minimize risk, and communicate effortlessly with stakeholders.”

featured in #515

24 Fundamental Techniques For Software Architects

- Patrick Roos

Architecture

tl;dr: “This comprehensive collection gives architects the techniques they need to not only design solid architectures, but to seamlessly align them with business goals. Learn how these techniques enable architects and teams to make informed decisions, minimize risk, and communicate effortlessly with stakeholders.”

featured in #514

Test Clocks: How We Made It Easier To Test Stripe Billing Integrations

- Ji Huang

Architecture
TimeData

tl;dr: Test Clocks simulates the passage of time in billing scenarios without waiting for actual seconds to tick by in the real world. This blog discusses the technical details of how Stripe built test clocks, and how they updated systems to account for the different ways that time passes.

featured in #513

Building Bluesky: a Distributed Social Network

- Gergely Orosz

Architecture

tl;dr: "Bluesky is built by around 10 engineers, and has amassed 5 million users since publicly launching in February this year. A deep dive into novel design decisions, moving off AWS, and more."

featured in #510

/Architecture