/Post-mortem

What If Everybody Did Everything Right?

- Lorin Hochstein tl;dr: In the wake of an incident, we are inevitably led to answer two questions: “What did we do wrong here? What didn’t we do that we should have?” Lorin argues these questions create a specific lens to scrutinize the incident. “An alternative lens for making sense of an incident is to ask the question “how did this incident happen, assuming that everybody did everything right?” Assume that everybody whose actions contributed to the incident made the best possible decision based on the information they had, and the constraints and incentives that were imposed upon them.” This incites different questions: (1) What information did people know in the moment? (2) What were the constraints that people were operating under?

featured in #492


I Won Free Load Testing

tl;dr: "My main site received about 34M requests over 72h - in three spikes." The author discusses how this DDoS attack was executed, why, and the consequences. 

featured in #313


A List Of Post-mortems!

- Dan Luu tl;dr: Dan links to many public post-mortems by both small and large companies.

featured in #210


A Terrible, Horrible, No-Good, Very Bad Day At Slack

- Laura Nolan tl;dr: "This story describes the technical details of the problems that caused the Slack downtime on May 12th, 2020."

featured in #190