6 Hard Lessons We Learned About Automated Testing For GenAI Apps
- John Gluck tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.featured in #529
Autotrader Saved $620K/YR Trading In Manual Testing For Automation
tl;dr: Automated testing with cruise control allowed: (1) Offset the need to hire six QA engineers, saving $600K+/year. (2) Returned more than 1,000 hours per year to the customer support team, saving $20,000/year. (3) Increased release velocity 15–20%. (4) Reduced QA cycles from 3+ days to 15 minutes.featured in #528
Getting 100% Code Coverage Doesn't Eliminate Bugs
- Kostis Kapelonis tl;dr: “There are many articles already on the net explaining why this is a fallacy, but I recently discovered that sharing an actual code example goes a long way towards proving why 100% code coverage doesn’t mean zero bugs. These people have their “aha” moment when they look at real code, instead of recycling theoretical arguments over and over.”featured in #527
Debugging With Production Neighbors
tl;dr: SLATE is Uber’s E2E testing tool for microservice architectures that allows testing of services alongside production dependencies. It enables developers to generate test requests mimicking production flows while targeting services under test. This blog explores three debugging options in SLATE: remote debugging of deployed instances, local debugging on developer machines, and debugging through filtered monitoring. These features aim to simplify troubleshooting in production-like environments.featured in #525
featured in #521
featured in #520
featured in #518
featured in #517
Test Failures Should Be Actionable
- Titus Winters tl;dr: “When a test fails, you should be able to begin investigation with nothing more than the test’s name and its failure messages — no need to add more information and rerun the test.” Titus shares examples.featured in #513
Generative AI For High-Quality Mobile Testing
tl;dr: “The Developer Platform team at Uber is consistently developing new and innovative ideas to enhance the developer’s experience and strengthen the quality of our apps. Quality and testing go hand in hand, and in 2023 we took on a new and exciting challenge to change how we test our mobile applications, with a focus on machine learning (ML). Specifically, we are training models to test our applications just like real humans would.”featured in #509