/AI

5 Misconceptions About AI Agents

- Zach Lloyd tl;dr: The biggest unlock from AI isn’t just speed — it’s parallelism. Great developers are starting to multithread themselves, spinning up agents to handle multiple tasks at once. But today’s tools aren’t built for this kind of parallel work. We need systems that give developers visibility, control, and oversight across all those moving parts — or we risk the chaos outpacing the gains.

featured in #613


Innovations In Evaluating AI Agent Performance

tl;dr: Just like athletes need more than one drill to win a competition, AI agents require consistent training based on real-world performance metrics to excel in their role. At QA Wolf, we’ve developed weighted “gym scenarios” to simulate real-world challenges and track their progress over time. How does our AI use these metrics to improve our accuracy continuously? 

featured in #612


Cross-Channel Messaging For AI Agents

tl;dr: Knock enables AI agents to send messaging (Email, Push, SMS, In-app, and Slack) and powers human-in-the-loop workflows with deferred tool calls. You design the templates, and your agent sends on-brand messages. Works seamlessly with frameworks like LangChain and Vercel’s AI SDK.

featured in #612


Sycophancy Is The First LLM “Dark Pattern"

- Sean Goedecke tl;dr: “The principle here is something like the psychological trick door-to-door evangelists use on new converts - encouraging them to knock on doors knowing that many people will be rude, driving the converts back into the comforting arms of the church. It’s even possible to imagine AI models deliberately doing this exact thing: setting users up for failure in the real world in order to optimize time spent chatting to the model.”

featured in #611


How I Use LLMs As A Staff Engineer

- Sean Goedecke tl;dr: “Personally, I feel like I get a lot of value from AI. I think many of the people who don’t feel this way are “holding it wrong”: i.e. they’re not using language models in the most helpful ways. In this post, I’m going to list a bunch of ways I regularly use AI in my day-to-day as a staff engineer.”

featured in #611


How I Use LLMs As A Staff Engineer

- Sean Goedecke tl;dr: “Personally, I feel like I get a lot of value from AI. I think many of the people who don’t feel this way are “holding it wrong”: i.e. they’re not using language models in the most helpful ways. In this post, I’m going to list a bunch of ways I regularly use AI in my day-to-day as a staff engineer.”

featured in #610


Advancing Invoice Document Processing At Uber Using GenAI

tl;dr: “In today’s fast-paced business environment, efficiently managing operational tasks is vital for maintaining workflows. Uber, with its large network of suppliers worldwide, faces considerable challenges in processing a high volume of invoices daily. Invoice processing is a critical function for Uber’s financial operations, directly impacting the efficiency and accuracy of our accounts payable processes. This blog explores how we used GenAI to solve this problem, setting a new standard in financial operations management.”

featured in #610


Human In The Loop Software Development Agents

- Jirat Pasuksmit tl;dr: “Recently, we created the ‘Human-in-the-loop LLM-based agents framework’, or HULA. HULA reads a Jira work item, creates a plan, writes code, and even raises a pull request. And it does all of this while keeping the engineer in the driver’s seat. So far, HULA has merged ~900 pull requests for Atlassian software engineers, saving their time and allowing them to focus on other important tasks.”

featured in #610


Innovations In Evaluating AI Agent Performance

tl;dr: Just like athletes need more than one drill to win a competition, AI agents require consistent training based on real-world performance metrics to excel in their role. At QA Wolf, we’ve developed weighted “gym scenarios” to simulate real-world challenges and track their progress over time. How does our AI use these metrics to improve our accuracy continuously?

featured in #609


The Second Half

- Shunyu Yao tl;dr: Shunyu, a researched at OpenAI, claims we’re at AI’s halftime. The second half of AI — starting now — will shift focus from solving problems to defining problems. In this new era, evaluation becomes more important than training. Instead of just asking, “Can we train a model to solve X?”, we’re asking, “What should we be training AI to do, and how do we measure real progress?” To thrive in this second half, we’ll need a timely shift in mindset and skill set, ones perhaps closer to a product manager.

featured in #609