/Data Science

Unit Testing Analytics Code

- Matt Kaye tl;dr: “I face lots of pushback when it comes to unit testing. Usually the objections come in the form of either not knowing why you might test, since the code is just so simple and straightforward that nothing could go wrong, or not understanding the value added. In my opinion, both of these objections come from the same place.”

featured in #405


The Data Science Interview Book

tl;dr: "This book does not cover the topics in depth, it covers just enough to get you ready for the interview. The assumption here is that the person using it is already familiar with the topic and is here to brush up on the same. Additonal resources for someone eager to explore the topic in depth is added. In short, don’t use this as text book, use it as a revision note."

featured in #364


Introduction To Streaming For Data Scientists

- Chip Huyen tl;dr: "With luck you shouldn’t have to build or maintain a streaming system yourself. Your company should have infrastructure to help you with this. However, understanding where streaming is useful and why streaming is hard could help you evaluate the right tools and allocate sufficient resources for your needs."

featured in #342


Data Mesh — A Data Movement and Processing Platform @ Netflix

tl;dr: "As the system evolves to solve more and more use cases, we have expanded its scope to handle not only the CDC use cases but also more general data movement and processing use cases:" (1) Events can be sourced from more generic applications. (2) Catalog of available DB connectors is growing. (3) More processing patterns such as filter, projection, union, join, etc...

featured in #341


Stop Aggregating Away The Signal In Your Data

- Zan Armstrong tl;dr: "Aggregation is the standard best practice for analyzing time series data, but it can create problems by stripping away crucial context so that you’re not even aware of how much potential insight you’ve lost. In this article, I’ll start by discussing how aggregation can be problematic, before walking through three specific alternatives to aggregation with before / after examples."

featured in #339


Organizing And Scaling An Effective Data Team

- Rob Dearborn tl;dr: The scope of a data team should include: (1) Ensuring focus on the right hierarchy of input & output metrics. (2) Steering the roadmap through insightful analysis & research. (3) Driving optimization through experimentation and ML. (4) Developing and maintaining data infrastructure. Rob outlines how the data team should evolve, and it's function within a startup, as it grows.

featured in #302


Algorithms For Decision Making

- Mykel Kochenderfer Tim Wheeler Kyle Wray tl;dr: "This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of topics related to decision making, introducing the underlying mathematical problem formulations and the algorithms for solving them."

featured in #299


On Owning A Software Problem

- Vicki Boykis tl;dr: What is a low-friction small thing that most will not notice, but that when they do, is a sign of craftsmanship, expertise, and pride in one's work? Vicki has created a list relevant for ML and Data Science: (1) Python code has type annotations. (2) Accurate documentation of a repo and an easy, reproducible way to run the project. (3) Formatted and linted SQL statements. And more.

featured in #293


Data To Engineers Ratio: US vs Europe

- Mikkel Dengsøe tl;dr: "The median data to engineers ratio for the US companies I looked at is 1:7 compared to 1:4 for European companies. And the design to engineers ratio is 1:9 for both groups. This post gives some answers to why this is but also leaves some questions unanswered."

featured in #282


What Is The Right Level Of Specialization? For Data Teams And Anyone Else

- Erik Bernhardsson tl;dr: The specialization of data teams into many different roles e.g. data scientist, data engineer, analytics engineer, ML engineer etc is "generally a bad thing driven by the fact that tools are bad and too hard to use." He elaborates on this stance, here.

featured in #255