Ex-Google SRE on Alerting

The contents in this document is derived from a document titled My Philosophy on Alerting by Rob Ewaschuk.

The origiinal document is publicly available at this link.

Some ideas from the document linked above:

over monitoring is a harder problem to solve than under monitoring
alert rules must be able to classify problems into one of the following classes:
- avaiiability and basic functionality
- correctness (completeness, freshness and durability of data)
- feature specific problem
focus on symptoms to catch the problem. include cause-based information, but alert should be based on symptoms. For example, alert should be based on query failing rather than database server is down .