Monitor What Matters Not What Is Easy
Monitoring tools trick people into thinking observability is an easy problem. It is not. Most organizations drown in metrics they never look at while missing the signals that actually predict failure.
"Monitoring and observability tools commit the cardinal sin of tricking people into thinking this is an easy problem. It is very simple to monitor a small application or service. Almost none of those approaches scale." Mathew Duggan
The fundamental problem with monitoring is scope creep. Logs start as debugging aids print statements saved to disk. Then they become the customer service tool, the audit trail, the business intelligence source of truth, the deployment verification system. Suddenly the simple syslog is mission-critical infrastructure, and the cost of monitoring an application can easily exceed the cost of hosting it. Each expansion of purpose adds requirements (retention, searchability, reliability) without anyone stepping back to ask whether this accretion of responsibilities makes sense.
Metrics follow the same pattern. Prometheus works beautifully on one server. Then you need federation, long-term storage, cross-service queries, and stakeholders outside engineering want to track customer behavior and marketing campaigns through the same pipeline. The complexity jump from "store everything and let god figure it out" to hierarchical federation or cross-service aggregation is enormous and rarely anticipated.
The discipline that matters is distinguishing between operational metrics and analytical data. Operational metrics need to be real-time, reliable, and actionable they tell you whether to wake someone up at 3 AM. Analytical data can tolerate delay, sampling, and lower availability. Conflating "how we do things" with "how we decide which things to do" is a fatal mistake. Sample aggressively for low-priority signals. Store compliance-critical records in a dedicated system, not the logging pipeline. Set and enforce SLAs for your monitoring infrastructure itself. And above all, invest your monitoring budget in the characteristic metrics that actually predict system state transitions, not in capturing every 200-OK response for eternity.
Takeaway: The value of monitoring is in the decisions it enables, not the data it collects measure what changes your behavior, and sample or discard the rest.
See also: Leading Indicators Beat Lagging Ones | Cognitive Load Is the Real Bottleneck in System Design | Complexity Has Three Sources | The Streetlight Effect Distorts What We Know | Wittgenstein's Ruler Measures the Measurer