Skip to main content

The problem WHAWIT solves

Modern digital systems are built from microservices, serverless functions, data pipelines, and third-party APIs. This distributed architecture increases flexibility but also increases fragility: small regressions or configuration mistakes can cascade into major outages. At the same time:
  • Downtime is extremely expensive, often reaching hundreds of thousands or millions of dollars per hour.
  • Engineering teams spend a large fraction of their time on unplanned incident work instead of building features.
  • Telemetry volumes (logs, metrics, traces) grow rapidly, driving up observability costs without a proportional increase in understanding.
The WHAWIT extended version dives into these trends in detail, including MTTR/MTTD benchmarks and the economics of observability tooling.

WHAWIT as an intelligence layer

WHAWIT is designed as an intelligence layer that sits on top of your existing observability stack rather than replacing it. You keep using tools like Datadog, New Relic, CloudWatch, and Elastic for data collection and low-level queries. WHAWIT connects to those tools and focuses on:
  • Interpreting telemetry semantically.
  • Explaining what is happening during incidents.
  • Connecting symptoms in production back to code and configuration.
Conceptually:
  • Existing tools: Collect and store telemetry; provide dashboards and search.
  • WHAWIT: Understands patterns in that telemetry; explains incidents; proposes improvements.

Core reasoning capabilities

The WHAWIT intelligence layer continuously analyzes incoming logs, metrics, and events and builds a model of “normal” versus “abnormal” behavior. When something unusual happens, WHAWIT applies several kinds of reasoning:
  • Temporal reasoning: What changed just before the incident started? Which services began failing first?
  • Topological reasoning: How are affected services connected? Is a downstream dependency causing upstream failures?
  • Historical reasoning: Have we seen a similar pattern before? What was the root cause and fix last time?
The result is an incident summary in natural language, backed by concrete pointers into your telemetry. Engineers can still drill into raw data, but WHAWIT removes much of the initial cognitive load.

The On-Call Hub

The On-Call Hub is the primary interface for humans responding to incidents. It is built around explanation and coordinated response rather than raw charts. Typical elements include:
  • A timeline of key events as WHAWIT understands them.
  • A summary of the incident that can be read quickly by anyone joining the response.
  • Links into logs, metrics, and traces that support the summary.
  • Context for escalations, handovers, and post-incident review.
This makes it easier for new responders and cross-functional teams to be effective quickly, while enabling senior engineers to focus on diagnosis and remediation.

The autonomous code feedback loop

Traditional observability stops when the incident is resolved. WHAWIT extends further by integrating deeply with version control systems and CI/CD pipelines. Over time, WHAWIT learns:
  • Which modules or services are statistically more failure-prone.
  • Which recent commits or deployments correlate with observed regressions.
  • Which types of fixes and patterns have been effective in the past.
From this context, WHAWIT can:
  • Highlight risky areas of the codebase connected to recent incidents.
  • Propose concrete improvements to code or configuration (for example, safer retries, better timeouts, more robust error handling).
  • Generate candidate patches or pull requests for human review.
This closes the loop:
  1. Production systems generate telemetry.
  2. WHAWIT interprets telemetry and explains incidents.
  3. Incidents are mapped back to code and configuration.
  4. Code is improved through targeted recommendations and patches.
  5. Future incidents become less frequent or easier to resolve.
The extended version refers to this as autonomous reliability engineering.

Where to go next

  • Follow the quickstart to connect your own observability stack.
  • Share the whitepapers with stakeholders who need deeper economic and strategic context.
  • Watch the demos to see WHAWIT handling real-world incidents.