The problem WHAWIT solves
Modern digital systems are built from microservices, serverless functions, data pipelines, and third-party APIs. This distributed architecture increases flexibility but also increases fragility: small regressions or configuration mistakes can cascade into major outages. At the same time:- Downtime is extremely expensive, often reaching hundreds of thousands or millions of dollars per hour.
- Engineering teams spend a large fraction of their time on unplanned incident work instead of building features.
- Telemetry volumes (logs, metrics, traces) grow rapidly, driving up observability costs without a proportional increase in understanding.
WHAWIT as an intelligence layer
WHAWIT is designed as an intelligence layer that sits on top of your existing observability stack rather than replacing it. You keep using tools like Datadog, New Relic, CloudWatch, and Elastic for data collection and low-level queries. WHAWIT connects to those tools and focuses on:- Interpreting telemetry semantically.
- Explaining what is happening during incidents.
- Connecting symptoms in production back to code and configuration.
- Existing tools: Collect and store telemetry; provide dashboards and search.
- WHAWIT: Understands patterns in that telemetry; explains incidents; proposes improvements.
Core reasoning capabilities
The WHAWIT intelligence layer continuously analyzes incoming logs, metrics, and events and builds a model of “normal” versus “abnormal” behavior. When something unusual happens, WHAWIT applies several kinds of reasoning:- Temporal reasoning: What changed just before the incident started? Which services began failing first?
- Topological reasoning: How are affected services connected? Is a downstream dependency causing upstream failures?
- Historical reasoning: Have we seen a similar pattern before? What was the root cause and fix last time?
The On-Call Hub
The On-Call Hub is the primary interface for humans responding to incidents. It is built around explanation and coordinated response rather than raw charts. Typical elements include:- A timeline of key events as WHAWIT understands them.
- A summary of the incident that can be read quickly by anyone joining the response.
- Links into logs, metrics, and traces that support the summary.
- Context for escalations, handovers, and post-incident review.
The autonomous code feedback loop
Traditional observability stops when the incident is resolved. WHAWIT extends further by integrating deeply with version control systems and CI/CD pipelines. Over time, WHAWIT learns:- Which modules or services are statistically more failure-prone.
- Which recent commits or deployments correlate with observed regressions.
- Which types of fixes and patterns have been effective in the past.
- Highlight risky areas of the codebase connected to recent incidents.
- Propose concrete improvements to code or configuration (for example, safer retries, better timeouts, more robust error handling).
- Generate candidate patches or pull requests for human review.
- Production systems generate telemetry.
- WHAWIT interprets telemetry and explains incidents.
- Incidents are mapped back to code and configuration.
- Code is improved through targeted recommendations and patches.
- Future incidents become less frequent or easier to resolve.
Where to go next
- Follow the quickstart to connect your own observability stack.
- Share the whitepapers with stakeholders who need deeper economic and strategic context.
- Watch the demos to see WHAWIT handling real-world incidents.

