Essential Reading For Engineering Leaders

From Archival To Access: Config-Driven Data Pipelines

- Abhishek Dobliyal Aakash Bhardwaj

Data
Architecture

tl;dr: “In 2021 our team managed 65 regulatory reports, consuming terabytes of storage. By Q2 2024, this number surged to over 500 reports majorly covering areas related to trips across a given jurisdiction, significantly increasing resource consumption. Although existing solutions could archive and retrieve data, they often risked data mutation, especially during backfills, which isn’t ideal for regulatory and audit purposes. Additionally, retrieving smaller partitions and range-based retrieval wasn’t feasible with the existing solutions, complicating efficient data access.” The Uber team discuss some of the challenges implementing their new system.

featured in #623

Reservoir Sampling

- Sam Rose

Data

tl;dr: “Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from. By the end of this essay you will know: (1) When you would need reservoir sampling. (2) The mathematics behind how it works, using only basic operations: subtraction, multiplication, and division. (3) A simple way to implement reservoir sampling if you want to use it.”

featured in #617

Introducing Impressions At Netflix

- Tulika Bhatt

Data

tl;dr: “Capturing these moments and turning them into a personalized journey is no simple feat. It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profile’s exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations.”

featured in #591

Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla

Data
Management

tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #574

Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla

Data
Management

tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #573

Designing Data Products

- Kiran Prakash

Design
Data

tl;dr: “Working backwards from the end goal is a core principle of software development, and we’ve found it to be highly effective in modelling data products. In this article we'll explore a step-by-step, methodical approach to identifying data products that avoids overdesign while providing just enough clarity for teams to begin implementation.”

featured in #572

Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla

Data
Management

tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #570

Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla

Data

tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #568

Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla

Data

tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #566

Control Data Access with Targeted Row-Level Security

Security
Data

tl;dr: Integrate Clerk with Neon Authorize to enforce Row-Level Security (RLS) in Postgres using JWTs. This setup enhances security by securing database queries based on user identity. For team leads, it simplifies security management and reduces risk, allowing teams to focus on development.

featured in #566

/Data