/ML

Copilot Internals

- Parth Thakkar tl;dr: "In this post, I try to answer specific questions about the internals of Copilot, while also describing some interesting observations I made as I combed through the code. I will provide pointers to the relevant code for almost everything I talk about, so that interested folks can take a look at the code themselves."

featured in #376


Improving Instagram Notification Management With Machine Learning And Causal Inference

- Nailong Zhang tl;dr: "The key to solving this problem is figuring out the incremental value of sending a daily digest notification compared to not sending... For some cohorts, they would be active without receiving the daily digest notifications and thus the incremental values would be small; selecting these cohorts to send the digest notifications is inefficient and may even spam these users."

featured in #366


RecSysOps: Best Practices for Operating a Large-Scale Recommender System

- Ehsan Saberian Justin Basilico tl;dr: "In this blog post, we introduce RecSysOps a set of best practices and lessons that we learned while operating large-scale recommendation systems at Netflix. These practices helped us to keep our system healthy while: (1) reducing our firefighting time, (2) focusing on innovations and (3) building trust with our stakeholders."

featured in #360


What I Learned Building Platforms At Stitch Fix

tl;dr: "I was lucky enough to spend the last six years focusing on “engineering for data science” and learning to build great platforms." Stefan guides us through 5 lessons he learned: (1) Focus on adoption, not completeness. (2) Your users are not all equal. (3) Abstract away the internals of your system. (4) Live your users’ life cycle. (5) The two layer API trick. 

featured in #359


Machine Learning For Fraud Detection in Streaming Services

tl;dr: "Many users across many platforms make for a uniquely large attack surface that includes content fraud, account fraud, and abuse of terms of service. Detection of fraud and abuse at scale and in real-time is highly challenging."

featured in #355


How The New York Times Uses Machine Learning To Make Its Paywall Smarter

- Rohit Supekar tl;dr: "When the paywall was launched, the meter limit was the same for all users. However, as The Times has transformed into a data-driven digital company, we are now successfully using a causal machine learning model called the Dynamic Meter to set personalized meter limits and to make the paywall smarter."

featured in #345


Introducing Natural Language Search For Podcast Episodes

- Alexandre Tamborrino tl;dr: "To enable users to find more relevant content with less effort, we started investigating a technique called Natural Language Search, also known as Semantic Search. In a nutshell, Natural Language Search matches a query and a textual document that are semantically correlated instead of needing exact word matches. It matches synonyms, paraphrases, etc., and any variation of natural language that express the same meaning."  

featured in #336


The Berkeley Crossword Solver

tl;dr: "The BCS uses a two-step process to solve crossword puzzles. First, it generates a probability distribution over possible answers to each clue using a question answering (QA) model; second, it uses probabilistic inference, combined with local search and a generative language model, to handle conflicts between proposed intersecting answers."

featured in #331


In Search Of The Least Viewed Article On Wikipedia

- Colin Morris tl;dr: "Based on our findings above, the least viewed articles on Wikipedia are not going to be merely about topics with little popular interest - they must also be “unlucky” in the sense of having very small random gaps... Of these 600,000 least lucky articles, all received at least a few views in 2021. The booby prize for least popular article of 2021 is shared by two articles which received exactly 3 probably-human pageviews."

featured in #322


Evolution Of ML Fact Store

- Vivek Kaushal tl;dr: "This post will focus on the large volume of high-quality data stored in Axion — our fact store that is leveraged to compute ML features offline. We built Axion primarily to remove any training-serving skew and make offline experimentation faster. We will share how its design has evolved over the years and the lessons learned while building it."

featured in #321