/James Frost

Save Your Engineers' Sleep: Best Practices For On-call Processes tl;dr: 8 best practices shared including: (1) Alerts are treated as code i.e. go through code reviews, generated from existing modules. (2) Use percentiles over averages to get a higher quality signal. (3) Use playbooks to document each alert so there is corresponding documentation that explains what is broken and how to investigate and fix it. 

featured in #274