/Yulei Liu

How DoorDash Leverages LLMs To Evaluate Search Result Pages tl;dr: “Traditionally, evaluating search relevance relied on human annotations, which posed challenges in scale, latency, consistency, and cost. To solve this, we built AutoEval, a human-in-the-loop system for automated search quality evaluation that is powered by large language models (LLMs). Through leveraging LLMs and our whole-page relevance (WPR) metric, AutoEval enables scalable, accurate, and near-real-time search result assessments.”

featured in #613