Skip to main content

Overview

This guide details how WHAWIT interacts with the OpenTelemetry “Astronomy Shop” demo deployed on Google Kubernetes Engine (GKE). This scenario demonstrates WHAWIT’s ability to analyze complex, distributed microservices architectures and correlate infrastructure issues with code repositories.

The Scenario

The OpenTelemetry Astronomy Shop is a microservices-based application intended to illustrate the use of OpenTelemetry. It consists of over a dozen microservices written in different languages (Go, Java, Python, .NET, etc.) communicating via gRPC and HTTP. We have deployed this application to a GKE Autopilot cluster to simulate a realistic production environment where infrastructure constraints (like resource quotas) can impact application stability.

Repository Structure

To manage this deployment and enable WHAWIT’s intelligent analysis, we use two GitHub repositories:

1. Infrastructure Repository (whawit-demos/opentelemetry-demo-codeops)

View on GitHub This repository contains the Infrastructure as Code (IaC) and deployment configurations.
  • Terraform: Defines the GKE cluster, VPCs, and networking.
  • Helm: Contains the opentelemetry-demo chart and values files defining resource limits (LimitRange), quotas (ResourceQuota), and scaling settings.
  • ArgoCD: (Optional) GitOps configurations.
Why WHAWIT connects here: Incidents are often caused by infrastructure constraints, not just bad code. WHAWIT uses this repository to understand the topology of the cluster and the constraints (CPU/Memory limits) applied to namespaces.

2. Application Repository (whawit-demos/opentelemetry-demo)

View on GitHub This repository contains the source code for the microservices.
  • Business Logic: The actual code for frontend, cartservice, checkoutservice, etc.
  • Instrumentation: OpenTelemetry SDK setup.
Why WHAWIT connects here: When a service fails due to a bug or unhandled exception, WHAWIT correlates the stack trace or error log back to the specific commit and line of code in this repository. Architecture Diagram

Troubleshooting with WHAWIT

WHAWIT acts as an Intelligence Layer on top of your existing observability stack (Google Cloud Logging and Managed Prometheus in this case).

1. Detection

WHAWIT continuously ingests logs and metrics. In this scenario, it detects a CrashLoopBackOff event for the fraud-detection service. K9s View Simultaneously, it sees error logs in Google Cloud Logging indicating a failure to create pods or containers. GCP Logs

2. Correlation

WHAWIT analyzes the telemetry and correlates it with the connected repositories.
  • It identifies the fraud-detection service is failing.
  • It scans the codeops repository and finds LimitRange and ResourceQuota definitions in the helm/ directory.
  • It recognizes that the error messages in the logs (e.g., “exceeded quota”) match the constraints defined in the infrastructure code.

3. Analysis

Instead of just alerting “Pod Failed”, WHAWIT provides a comprehensive analysis:
  • Root Cause: The pod is failing to start because the namespace has hit its ResourceQuota limits for CPU/Memory.
  • Evidence: Cites the specific log entries from GCP and the configuration files from the codeops repo.
  • Recommendation: Suggests increasing the quota in the namespaces.tf or Helm values, or optimizing the resource requests of the fraud-detection service.
Whawit Analysis

Integration with Datadog

While the example above uses Google Cloud Logging and Managed Prometheus, WHAWIT integrates equally well with Datadog for the same OpenTelemetry demo.

1. Setup Changes

In a Datadog-integrated environment, the observability data flow changes slightly:
  • Datadog Agent: Deployed as a DaemonSet on the GKE cluster to collect logs, metrics, and traces.
  • OpenTelemetry Collector: Configured to export traces and OTLP metrics to the Datadog Agent or directly to Datadog’s backend.

2. WHAWIT Configuration

To enable this integration in WHAWIT:
  1. Go to Integrations and select Datadog.
  2. Provide your Datadog API Key and Application Key.
  3. WHAWIT will begin indexing alerts, events, and relevant telemetry from Datadog.

3. The WHAWIT Advantage

Even with Datadog, the Correlation and Analysis steps remain powered by WHAWIT’s unique engine:
  • Alert Ingestion: WHAWIT receives a monitor alert from Datadog (e.g., “High Error Rate on frontend”).
  • Deep Linking: The WHAWIT incident summary provides direct links back to your Datadog dashboards and notebooks.
  • Repo Connection: WHAWIT still connects to whawit-demos/opentelemetry-demo to find the code causing the errors, enriching the Datadog alert with root cause analysis that points to specific commits or config files.

Summary

By connecting to both the Application and Infrastructure repositories, WHAWIT bridges the gap between “Code” and “Ops”. It doesn’t just tell you what is broken, but why—whether it’s a buggy commit in the app repo or a tight resource quota in the infra repo.