AI-powered SRE and Security agent for OpenShift. Claude diagnoses, triages, and remediates — 24/7.
Full-spectrum cluster operations. Autonomous when you trust it.
Investigate pod failures, crash loops, OOM kills, scheduling problems. Correlate events, logs, and Prometheus metrics for root cause analysis.
Privileged containers, RBAC audit, network policy gaps, image security, SCC analysis, secret hygiene. Full cluster security posture in one scan.
16 scanners running every 60 seconds. Crashloops, pending pods, failed deployments, node pressure, cert expiry, firing alerts, OOM kills, image pull errors, degraded operators, DaemonSet gaps, HPA saturation + 5 audit scanners.
Trust level 3: safe fixes without approval. Trust level 4: full autonomous. Rollback snapshots for every action. Rate-limited with cooldown.
Auto-routing agent classifies SRE, Security, or View Designer intent per message. Keyword-based fast routing, shared context bus, cross-agent handoff.
Learns from every interaction. Extracts runbooks from confirmed resolutions. Detects patterns. Augments prompts with relevant past incidents.
Every finding, investigation, and action includes a 0-100% confidence score. Reasoning chains with evidence and alternatives considered.
Structured error types with 7 categories. Thread-safe ring buffer tracking. Circuit breaker with auto-recovery. Database resilience decorators.
Time-aware greeting, action summary, investigation count, current cluster state. The 30-second standup replacement for your cluster.
73 PromQL recipes across 16 categories. Semantic auto-layout engine. discover_metrics → verify_query → build. Data-first, never empty charts.
Feeds analytics back into the system prompt. Query reliability scores, dashboard patterns, error hotspots, token efficiency. The agent gets smarter over time.
71% prompt reduction (28KB → 8KB). Selective component schemas. Selective runbooks. Token tracking per turn. Cache hit rate monitoring.
Full audit log of every tool invocation. Chain discovery via bigram analysis. Next-tool hints injected into prompts. Token tracking and usage stats API.
Multi-cluster operations across your fleet. 5 fleet tools for cross-cluster visibility, comparison, and coordinated actions.
6 ArgoCD integration tools. Application sync status, health checks, rollback operations, and drift detection across GitOps-managed clusters.
84 evaluation prompts for systematic agent testing. Measures tool selection accuracy, response quality, and diagnostic reasoning across SRE and security scenarios.
Talk to your cluster like a colleague.
Progressive autonomy. You decide how much to trust.
Monitor Only
Observe & report
Suggest
Dry-run previews
Ask First
Propose + confirm
Auto-Fix Safe
Low-risk auto
Full Auto
All categories
From zero to scanning in 60 seconds.
Pair with OpenShift Pulse for the full experience, or run standalone as a CLI.