← home/opsdemo

ready

skill events

no events yet

AI Ops Center

Production intelligence powered by LLM reasoning

services

active alerts

4.2min

MTTR (AI)

847

resolved

INCIDENT #2847 — API Latency Spike

started: 12:03 | affected: 3 services

AI Root Cause Analysis

LLM reasoning

Click Analyze to start AI-powered root cause analysis

Remediation Plan

○

Rollback deploy #4521

git revert abc1234 && git push

○

Apply hotfix: batch query

SELECT * FROM preferences WHERE user_id IN (...)

○

Scale connection pool temporarily

max_connections: 100 → 200

○

Verify service recovery

p99 latency < 500ms for 5 min

Waiting for AI analysis...

Service Status

api-gateway

2,340mscritical

user-service

890mswarning

db-primary

—critical

auth-service

12mshealthy

payment-service

45mshealthy

notification-svc

23mshealthy

client→api-gw→user-svc→db

→auth-svc✓

→payment-svc✓

Live Log Stream

12:03:24ERRapi-gatewayupstream timeout after 30000ms on POST /api/users/preferences

12:03:25WRNuser-serviceconnection pool utilization at 95% (95/100)

12:03:26ERRdb-primarymax_connections reached: 100/100 — rejecting new connections

12:03:27ERRapi-gatewayupstream timeout after 30000ms on GET /api/users/12847

12:03:28WRNuser-serviceconnection pool utilization at 98% (98/100)

12:03:29ERRdb-primarytoo many connections for role "app_user"

12:03:30ERRapi-gatewaycircuit breaker OPEN for user-service — 23 failures in 60s

12:03:31INFauth-servicehealth check OK — no anomalies detected

12:03:32ERRuser-serviceconnection pool exhausted — 0 available connections

12:03:33WRNapi-gatewayresponse time p99 = 28,400ms (threshold: 500ms)

◈

Traditional monitoring alerts when thresholds are exceeded — but only an LLM can read unstructured logs, understand code semantics, correlate events across services, and reason about causality to identify the actual root cause.

Mockup Agent

╔══════════════════════════════════════════╗

║ ███████╗███╗ ██╗ █████╗ ║

║ ██╔════╝████╗ ██║██╔══██╗ ║

║ ███████╗██╔██╗ ██║███████║ ║

║ ╚════██║██║╚██╗██║██╔══██║ ║

║ ███████║██║ ╚████║██║ ██║ ║

║ ╚══════╝╚═╝ ╚═══╝╚═╝ ╚═╝ ║

║ Skills-Native Application ● local ║

╚══════════════════════════════════════════╝

This panel embeds Claude Code via xterm.js + node-pty.

Everything runs locally — no API calls, no cloud.

Skills ready:

/devlog-collect · scan git repos → SQLite

/devlog-analyze · analyze patterns

/devlog-report · generate weekly report

/ops-collect · aggregate logs & metrics

/ops-analyze · AI root cause analysis

/ops-remediate · auto-remediation plan

/inbox-scan · scan emails & calendar

/daily-brief · AI daily briefing

/auto-respond · draft replies & actions

[ This is a Mockup Agent — click a skill button above ]

❯