/ML

How To Improve Search Without Looking At Queries Or Results

tl;dr: “Canva celebrated the milestone of 200M monthly active users (MAUs). Our customers have over 30 billion designs on Canva and create almost 300 new designs every second. With this growth rate, the ability for Canva Community members to effectively search for and find their designs, as well as those shared to them by team members, is becoming an increasingly challenging and essential problem to solve.”

featured in #569


No GPS Required: Our App Can Now Locate Underground Trains

tl;dr: “Thanks to our clever engineering, we can now predict your location in a subway tunnel using your phone’s vibration signature.” This post dives into how. 

featured in #568


Classifying All Of The Pdfs On The Internet

- Santiago Pedroza tl;dr: “I classified the entirety of SafeDocs using a mixture of LLMs, Embeddings Models, XGBoost and just for fun some LinearRegressors. In the process I too created some really pretty graphs!”

featured in #545


Machine Unlearning In 2024

- Ken Liu tl;dr: “As our ML models today become larger and their (pre-)training sets grow to inscrutable sizes, people are increasingly interested in the concept of machine unlearning to edit away undesired things like private data, stale knowledge, copyrighted materials, toxic / unsafe content, dangerous capabilities, and misinformation, without retraining models from scratch.” Ken provides us with an introduction. 

featured in #515


Building A Weather Data Warehouse Part I: Loading A Trillion Rows Of Weather Data Into TimescaleDB

- Ali Ramadhan tl;dr: “I think it would be cool to have historical weather data from around the world to analyze for signals of climate change we’ve already had rather than think about potential future change.” Ali discusses the implementation of this analysis tool. 

featured in #510


Personalizing The DoorDash Retail Store Page Experience

tl;dr: "In this post, we show how we built a personalized shopping experience for our new business vertical stores, which include grocery, convenience, pets, and alcohol, among many others. Following a high-level overview of our recommendation framework, we home in on the modeling details, the challenges we have encountered along the way, and how we addressed those challenges."

featured in #479


Ship Shape

- Kerry Halupka Rowan Katekar tl;dr: How Canva does hand-drawn shape recognition in the browser using machine learning to convert user-drawn scribbles into vector graphics, keeping classification latency at the forefront of the user experience. "We wanted to make sure the experience was snappy but still accurate. Therefore, we decided to deploy the solution in the browser, which allows for real-time shape recognition and drawing assistance, providing a seamless and interactive user experience. Users can draw shapes and receive immediate feedback without experiencing delays associated with server-based processing."

featured in #474


Navigating The Chaos: Why You Don’t Need Another MLOps Tool

tl;dr: AI/ML development lacks systematic processes, leading to errors and biases in deployed models. The MLOps landscape is fragmented, and teams need to glue together a ton of bespoke and third-party tools to meet basic needs. We don’t think you should, so we're building Openlayer to condense and simplify AI evaluation.

featured in #469


Building In-Video Search

tl;dr: "Suppose it’s Christmas, and you want to create a great instagram piece out all the best scenes across Netflix films of people shouting “Merry Christmas”! Or suppose it’s Anya Taylor Joy’s birthday, and you want to create a highlight reel of all her most iconic and dramatic shots. Creating these involves sifting through hundreds of thousands of movies and TV shows to find the right line of dialogue or the appropriate visual elements (objects, scenes, emotions, actions, etc.). We have built an internal system that allows someone to perform in-video search across the entire Netflix video catalog, and we’d like to share our experience in building this system."

featured in #464


Hey, Computer, Make Me A Font

- Sergey Tselovalnikov tl;dr: “This is a story of my journey learning to build generative ML models from scratch and teaching a computer to create fonts in the process.” FontoGen is a generative ML model project that crafts type fonts based on user descriptions. The author delves into the complexities of text-to-SVG generation and the intricacies of maintaining stylistic uniformity across glyphs. Drawing inspiration from the IconShop paper, a sequence-to-sequence model was employed, using text embeddings from BERT and font embeddings from tokenized glyph shapes.

featured in #454