Runbooks to Guide Debugging

Create Runbooks to Guide Automated Debugging

Runbooks in Relvy do more than document investigation steps — they guide the behavior of Relvy’s AI agent during real-time incident analysis. When incidents occur, Relvy’s AI-based investigation engine uses your runbooks as strategic instructions to plan and execute the debugging process.

How Runbooks Work in Relvy

Relvy’s AI investigation system consists of:

A Planner, which orchestrates the overall debugging strategy.
Specialized Data Source Agents for logs, metrics (dashboards), events, and traces.

When an investigation is triggered (manually or via alert), the planner:

Reads applicable runbooks relevant to the incident symptoms or tags.
Uses the instructions to prioritize certain signals and tools.
Dispatches tasks to the data source agents accordingly.
Aggregates findings into a unified Root Cause Analysis.

Runbooks as AI Guidance

Runbooks provide a way for your team to configure and influence the planner using natural language. You define what your team considers best practices — and the planner follows them when forming its investigation plan. This allows you to encode organizational knowledge and system-specific workflows directly into the AI.

Creating a Runbook

To add a runbook:

Navigate to the Runbooks tab in the Discovery section on the left sidebar.
Click Create New Runbook.
Fill out the form:
- Title: A clear and concise label.
- Type: e.g., General Planning, Log Analysis, Event Analysis.
- Symptom / When to Use: Describe when this runbook is relevant.
- Instructions: Write your investigation steps in natural language.
- Tags (optional): Add tags like user-facing, latency, kubernetes, etc.
Click Create Runbook to save.

Example Runbook Instructions

Example 1: General Debugging Instructions


Title	General Debugging Instructions
When to Use	When debugging any incident
Instructions	1. Check RED metrics dashboard as a starting point for most investigations 2. For user facing issues, check frontend service logs and metrics 3. To check recent deployments, filter events by @source:kubernetes and look for pod restarts, scaling or service deployments 4. Check Runtime metrics dashboard for CPU/memory utilization 5. To locate traces from logs, use otel.trace_id and otel.span_id

💡 Relvy’s planner will use these general guidelines as a foundation for any investigation, adapting the approach based on the specific incident context.

Example 2: Latency in Backend Services


Title	Latency in Core APIs
When to Use	When alerts mention increased latency in APIs
Instructions	1. Begin with the API service metrics dashboard 2. Compare current latency to 1h and 24h baselines 3. Check for saturation in DB or cache services connected to the API 4. Investigate traces for slow spans and associated service calls 5. Review logs for errors or warnings in the same time range

💡 The planner will dynamically follow these steps, dispatching tasks to metric, trace, and log agents to execute them in order.

Example 3: Debugging Kafka Issues


Title	Debugging Kafka issues
When to Use	When debugging a kafka related alert
Instructions	1. Look at the kafka dashboard for the appropriate topic / consumer group - identify specific affected partitions 2. Identify if any consumer pods are down 3. Check logs for consumer and producer services 4. For lag issues, check if this is because of traffic surges 5. Finally, check the kafka infra metrics dashboards for issues with kafka itself

💡 Relvy’s planner will interpret this and structure its investigation to answer the above questions.

Example 4: Application Architecture Overview


Title	Application and System Description
When to Use	When debugging all incidents
Instructions	This is an ecommerce application (Astronomy Shop). This is the list of critical services: - accounting - ad - cart - checkout - currency - email - frontend - payment - product-catalog - quote - recommendation - shipping

💡 Relvy’s planner will use this architectural knowledge to prioritize services during investigations.

6.5 Runbook Management

Runbooks can be searched and filtered by tags.
You can edit them anytime to evolve with your system.
After an investigation, Relvy highlights the runbooks that were followed and lets you view or modify the instructions for future investigations.

Benefits of Configurable AI Planning

With runbooks:

You encode team knowledge into the AI, turning experience into automation.
Investigations become standardized, repeatable, and transparent.
New team members benefit from a guided process, and experts can continuously improve it.

Whether you’re debugging application errors, latency spikes, infrastructure issues, or deployment regressions — Relvy’s AI will follow your instructions, step-by-step.

What is Relvy?

Get Started

Configure Relvy

Integrations

Users

Self-Hosting

Create Runbooks to Guide Automated Debugging

How Runbooks Work in Relvy

Runbooks as AI Guidance

Creating a Runbook

Example Runbook Instructions

Example 1: General Debugging Instructions

Example 2: Latency in Backend Services

Example 3: Debugging Kafka Issues

Example 4: Application Architecture Overview

6.5 Runbook Management

Benefits of Configurable AI Planning

What is Relvy?

Get Started

Configure Relvy

Integrations

Users

Self-Hosting

​Create Runbooks to Guide Automated Debugging

​How Runbooks Work in Relvy

​Runbooks as AI Guidance

​Creating a Runbook

​Example Runbook Instructions

​Example 1: General Debugging Instructions

​Example 2: Latency in Backend Services

​Example 3: Debugging Kafka Issues

​Example 4: Application Architecture Overview

​6.5 Runbook Management

​Benefits of Configurable AI Planning

Create Runbooks to Guide Automated Debugging

How Runbooks Work in Relvy

Runbooks as AI Guidance

Creating a Runbook

Example Runbook Instructions

Example 1: General Debugging Instructions

Example 2: Latency in Backend Services

Example 3: Debugging Kafka Issues

Example 4: Application Architecture Overview

6.5 Runbook Management

Benefits of Configurable AI Planning