/Data

From Archival To Access: Config-Driven Data Pipelines

- Abhishek Dobliyal Aakash Bhardwaj tl;dr: “In 2021 our team managed 65 regulatory reports, consuming terabytes of storage. By Q2 2024, this number surged to over 500 reports majorly covering areas related to trips across a given jurisdiction, significantly increasing resource consumption. Although existing solutions could archive and retrieve data, they often risked data mutation, especially during backfills, which isn’t ideal for regulatory and audit purposes. Additionally, retrieving smaller partitions and range-based retrieval wasn’t feasible with the existing solutions, complicating efficient data access.” The Uber team discuss some of the challenges implementing their new system. 

featured in #623


Reservoir Sampling

- Sam Rose tl;dr: “Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from. By the end of this essay you will know: (1) When you would need reservoir sampling. (2) The mathematics behind how it works, using only basic operations: subtraction, multiplication, and division. (3) A simple way to implement reservoir sampling if you want to use it.”

featured in #617


Introducing Impressions At Netflix

- Tulika Bhatt tl;dr: “Capturing these moments and turning them into a personalized journey is no simple feat. It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profile’s exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations.”

featured in #591


Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #574


Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #573


Designing Data Products

- Kiran Prakash tl;dr: “Working backwards from the end goal is a core principle of software development, and we’ve found it to be highly effective in modelling data products. In this article we'll explore a step-by-step, methodical approach to identifying data products that avoids overdesign while providing just enough clarity for teams to begin implementation.”

featured in #572


Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #570


Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #568


Dump The Golden Dataset: Switch To Random Sampling

- Nishant Shukla tl;dr: Golden Datasets have long been a reliable method for measuring AI prompt performance. But as AI innovation moves fast, companies need a more agile, flexible, and cost-effective solution to stay ahead of their competition. Enter random sampling of AI prompt performance—a cutting-edge approach that adapts to real-world data and drives scalable performance for QA Wolf customers. Stay ahead of the curve—watch the webinar now.

featured in #566


Control Data Access with Targeted Row-Level Security

tl;dr: Integrate Clerk with Neon Authorize to enforce Row-Level Security (RLS) in Postgres using JWTs. This setup enhances security by securing database queries based on user identity. For team leads, it simplifies security management and reduces risk, allowing teams to focus on development.

featured in #566