Essential Reading For Engineering Leaders

How Airbnb Measures Listing Lifetime Value

- Carlos Sanchez Martinez

DataScience

tl;dr: “In this blog post, we explained how we approach listing lifetime value at Airbnb. We covered our measurement framework, including baseline LTV, incremental LTV, and marketing-induced incremental LTV. We also zoomed into measurement challenges, like when travel patterns changed drastically during the COVID pandemic and accurately estimating LTV became more difficult.”

featured in #606

Measuring Product Impact Without A/B Testing: How Discord Used the Synthetic Control Method for Voice Messages

- Alec Brevé Angela Ambroz

Product
DataScience

tl;dr: “Sometimes, you just can’t randomize - it’s either not possible, or it’s unethical, or you sacrifice too much precision. In those cases, you can release your treatment to one group and create a composite, synthetic control made up of a weighted combination of your untreated groups.”

featured in #570

Unit Testing Analytics Code

- Matt Kaye

Testing
DataScience

tl;dr: “I face lots of pushback when it comes to unit testing. Usually the objections come in the form of either not knowing why you might test, since the code is just so simple and straightforward that nothing could go wrong, or not understanding the value added. In my opinion, both of these objections come from the same place.”

featured in #405

The Data Science Interview Book

DataScience

tl;dr: "This book does not cover the topics in depth, it covers just enough to get you ready for the interview. The assumption here is that the person using it is already familiar with the topic and is here to brush up on the same. Additonal resources for someone eager to explore the topic in depth is added. In short, don’t use this as text book, use it as a revision note."

featured in #364

Introduction To Streaming For Data Scientists

- Chip Huyen

DataScience

tl;dr: "With luck you shouldn’t have to build or maintain a streaming system yourself. Your company should have infrastructure to help you with this. However, understanding where streaming is useful and why streaming is hard could help you evaluate the right tools and allocate sufficient resources for your needs."

featured in #342

Data Mesh — A Data Movement and Processing Platform @ Netflix

DataScience

tl;dr: "As the system evolves to solve more and more use cases, we have expanded its scope to handle not only the CDC use cases but also more general data movement and processing use cases:" (1) Events can be sourced from more generic applications. (2) Catalog of available DB connectors is growing. (3) More processing patterns such as filter, projection, union, join, etc...

featured in #341

Stop Aggregating Away The Signal In Your Data

- Zan Armstrong

Data
DataScience

tl;dr: "Aggregation is the standard best practice for analyzing time series data, but it can create problems by stripping away crucial context so that you’re not even aware of how much potential insight you’ve lost. In this article, I’ll start by discussing how aggregation can be problematic, before walking through three specific alternatives to aggregation with before / after examples."

featured in #339

Organizing And Scaling An Effective Data Team

- Rob Dearborn

tl;dr: The scope of a data team should include: (1) Ensuring focus on the right hierarchy of input & output metrics. (2) Steering the roadmap through insightful analysis & research. (3) Driving optimization through experimentation and ML. (4) Developing and maintaining data infrastructure. Rob outlines how the data team should evolve, and it's function within a startup, as it grows.

featured in #302

Algorithms For Decision Making

- Mykel Kochenderfer Tim Wheeler Kyle Wray

Algo
DataScience

tl;dr: "This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of topics related to decision making, introducing the underlying mathematical problem formulations and the algorithms for solving them."

featured in #299

On Owning A Software Problem

- Vicki Boykis

tl;dr: What is a low-friction small thing that most will not notice, but that when they do, is a sign of craftsmanship, expertise, and pride in one's work? Vicki has created a list relevant for ML and Data Science: (1) Python code has type annotations. (2) Accurate documentation of a repo and an easy, reproducible way to run the project. (3) Formatted and linted SQL statements. And more.

featured in #293

/Data Science