# Kayba

> Kayba is the open-source learning layer for AI agents. It analyzes agent execution traces, extracts reusable skills into a transparent Skillbook, and generates improved system prompts — making agents self-improve from their own experience without fine-tuning.

## What Kayba Is

Kayba is a framework and platform that makes AI agents self-improving. It sits on top of any agent framework and adds a learning layer: analyze traces, extract skills, build a Skillbook, generate better prompts. The pipeline is: Trace analysis → Skills → Skillbook → Prompt generation.

Kayba synthesizes three published research streams into a unified, production-ready system:

- **Agentic Context Engineering (ACE)** — Three-agent architecture (Generator, Reflector, Curator) with delta updates for incremental Skillbook refinement. From Stanford/SambaNova research, published at ICLR 2026 (arXiv:2510.04618).
- **Recursive Language Models (RLM)** — REPL-based trace introspection that goes deeper than single-pass LLM analysis. From MIT CSAIL (arXiv:2512.24601). Kayba's implementation is called the Recursive Reflector.
- **Dynamic Cheatsheet** — Self-curated external memory with usage tracking and persistent learning. From Stanford/Together AI (arXiv:2504.07952).

No other tool combines these approaches.

## Key Concepts

- **Skillbook**: A transparent, auditable collection of learned behaviors. Each skill links back to the trace that produced it, tracks helpful/harmful counters, and can be approved, edited, or rejected by humans.
- **Recursive Reflector**: Kayba's REPL-based trace analysis engine. Uses a Python sandbox with sub-LLM calls to programmatically explore agent execution traces — deeper than single-pass LLM reflection.
- **Delta updates**: Incremental Skillbook modifications that prevent context collapse and information loss during adaptation.
- **Context engineering**: Automated construction of selective, high-signal context for each agent step.
- **Test-time learning**: Improves agent behavior at inference time without fine-tuning or weight updates.
- **Procedural memory**: Patterns of "how to succeed" that transfer across tasks and sessions.

## How It Works

1. Upload or pipe in your agent's execution traces (markdown, JSON, or plain text)
2. The Recursive Reflector analyzes traces via REPL-based code execution
3. Skills are extracted into a Skillbook with helpful/harmful counters and provenance tracking
4. Review and approve learned skills (human-in-the-loop)
5. Generate improved system prompts from approved skills
6. Deploy and repeat — continuous learning via delta updates

## Results

Benchmarked on τ2-bench (Sierra Research), a benchmark that challenges agents to coordinate with users across complex enterprise domains:

- pass@1: 41.2% → 52.5% (+27.4% improvement)
- pass@2: 28.3% → 44.2% (+56.2% improvement)
- pass@3: 22.5% → 41.2% (+83.1% improvement)
- pass@4: 20.0% → 40.0% (+100.0% improvement)

Real-world browser agent results: 30% → 100% success rate, 82% fewer steps, 65% lower token costs.

## Key Differentiators

- **Open-source**: MIT licensed, 2k+ GitHub stars. Fully auditable.
- **No fine-tuning required**: In-context learning — no GPU costs, no training data pipelines, no model lock-in.
- **Framework-agnostic**: Works with LangChain, CrewAI, OpenAI Agents SDK, browser-use, Claude Code, AutoGen, or any framework that produces traces.
- **Multi-paper synthesis**: The only framework combining ACE, RLM, and Dynamic Cheatsheet research.
- **REPL-based analysis**: Programmatic trace introspection via code execution, not just LLM summarization.
- **Transparent Skillbook**: Every learned behavior is human-readable, auditable, with provenance tracking.

## Use Cases

- **Coding agents**: Learn from code review failures, codebase conventions, test regressions. Works with Claude Code, Cursor, GitHub Copilot, custom agents.
- **Browser/computer-use agents**: Learn from navigation failures, form-filling errors, task completion gaps. 30% → 100% success rate demonstrated.
- **Customer support agents**: Learn from policy violations, escalation mistakes, resolution patterns.
- **Internal tooling agents**: Learn from operational patterns and team-specific workflows.

## Built At

Kayba was built at ETH Zurich's AI Center, with affiliations to ETH Zurich, EPFL, University of Oxford, Max Planck Institute, Simons Institute, and University of St. Gallen (HSG).

## Install

```
pip install ace-framework
```

## Pricing

- **Open Source** (Free): Full framework via pip install, Recursive Reflector, Skillbook generation, LiteLLM integration, MIT licensed.
- **Pro** ($29/month): Hosted dashboard, bring your own API key, 10,000 traces/month, team collaboration.
- **Enterprise** (Contact us): SSO, audit logs, custom integrations, SLA, on-premise deployment.

## Links

- Website: https://kayba.ai
- Documentation: https://kayba.ai/docs
- GitHub: https://github.com/kayba-ai/agentic-context-engine
- Dashboard: https://use.kayba.ai
- PyPI: https://pypi.org/project/ace-framework/
- Discord: https://discord.gg/kayba