Essential Reading For Engineering Leaders

6 Hard Lessons We Learned About Automated Testing For GenAI Apps

- John Gluck

Testing

tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.

featured in #529

Autotrader Saved $620K/YR Trading In Manual Testing For Automation

Testing

tl;dr: Automated testing with cruise control allowed: (1) Offset the need to hire six QA engineers, saving $600K+/year. (2) Returned more than 1,000 hours per year to the customer support team, saving $20,000/year. (3) Increased release velocity 15–20%. (4) Reduced QA cycles from 3+ days to 15 minutes.

featured in #528

Getting 100% Code Coverage Doesn't Eliminate Bugs

- Kostis Kapelonis

Testing

tl;dr: “There are many articles already on the net explaining why this is a fallacy, but I recently discovered that sharing an actual code example goes a long way towards proving why 100% code coverage doesn’t mean zero bugs. These people have their “aha” moment when they look at real code, instead of recycling theoretical arguments over and over.”

featured in #527

Debugging With Production Neighbors

debugging
Testing

tl;dr: SLATE is Uber’s E2E testing tool for microservice architectures that allows testing of services alongside production dependencies. It enables developers to generate test requests mimicking production flows while targeting services under test. This blog explores three debugging options in SLATE: remote debugging of deployed instances, local debugging on developer machines, and debugging through filtered monitoring. These features aim to simplify troubleshooting in production-like environments.

featured in #525

Flaky Tests Overhaul At Uber

Testing

tl;dr: “A few years ago, we started tackling flaky tests in an effort to stabilize CI experience across our monorepos. The project first debuted in our Java monorepo and received good results in driving down frictions in developers’ workflow. However, as we evolved our CI infrastructure and started onboarding it to our largest repository with the most users, Go Monorepo, the stop-gap solution became increasingly challenging to scale to the scope.” The authors discuss a centralized system to track all tests.

featured in #521

How To Test

- Alex Kladov

Testing

tl;dr: “This post describes my current approach to testing. When I started programming professionally, I knew how to write good code, but good tests remained a mystery for a long time. This is not due to the lack of advice — on the contrary, there’s abundance of information & terminology about testing.”

featured in #520

Test-Driving HTML Templates

- Matteo Vaccari

Testing

tl;dr: “When building a server-side rendered web application, it is valuable to test the HTML that's generated through templates. While these can be tested through end-to-end tests running in the browser, such tests are slow and more work to maintain than unit tests. Unit tests, written in the server-side environment, can check for valid HTML, and extract elements with CSS selectors to test the details of generated HTML. These test cases can be defined in a simple data structure to make them easier to understand and enhance.”

featured in #518

Avoid The Long Parameter List

- Gene Volovich

Testing

tl;dr: “Always try to group data that belongs together and break up long, complicated parameter lists. The result will be code that is easier to read and maintain, and harder to make mistakes with.“ Gene shares examples.

featured in #517

Test Failures Should Be Actionable

- Titus Winters

Testing
BestPractices

tl;dr: “When a test fails, you should be able to begin investigation with nothing more than the test’s name and its failure messages — no need to add more information and rerun the test.” Titus shares examples.

featured in #513

Generative AI For High-Quality Mobile Testing

Testing
Platform

tl;dr: “The Developer Platform team at Uber is consistently developing new and innovative ideas to enhance the developer’s experience and strengthen the quality of our apps. Quality and testing go hand in hand, and in 2023 we took on a new and exciting challenge to change how we test our mobile applications, with a focus on machine learning (ML). Specifically, we are training models to test our applications just like real humans would.”

featured in #509

/Tests