# Kayba > Kayba is the open-source learning layer for AI agents. It analyzes agent execution traces, extracts reusable skills into a transparent Skillbook, and generates improved system prompts — making agents self-improve from their own experience without fine-tuning. ## What Kayba Is Kayba is a framework and platform that makes AI agents self-improving. It sits on top of any agent framework and adds a learning layer: analyze traces, extract skills, build a Skillbook, generate better prompts. The pipeline is: Trace analysis → Skills → Skillbook → Prompt generation. Kayba synthesizes three published research streams into a unified, production-ready system: - **Agentic Context Engineering (ACE)** — Three-agent architecture (Generator, Reflector, Curator) with delta updates for incremental Skillbook refinement. From Stanford/SambaNova research, published at ICLR 2026 (arXiv:2510.04618). - **Recursive Language Models (RLM)** — REPL-based trace introspection that goes deeper than single-pass LLM analysis. From MIT CSAIL (arXiv:2512.24601). Kayba's implementation is called the Recursive Reflector. - **Dynamic Cheatsheet** — Self-curated external memory with usage tracking and persistent learning. From Stanford/Together AI (arXiv:2504.07952). No other tool combines these approaches. ## Key Concepts - **Skillbook**: A transparent, auditable collection of learned behaviors. Each skill links back to the trace that produced it, tracks helpful/harmful counters, and can be approved, edited, or rejected by humans. - **Recursive Reflector**: Kayba's REPL-based trace analysis engine. Uses a Python sandbox with sub-LLM calls to programmatically explore agent execution traces — deeper than single-pass LLM reflection. - **Delta updates**: Incremental Skillbook modifications that prevent context collapse and information loss during adaptation. - **Context engineering**: Automated construction of selective, high-signal context for each agent step. - **Test-time learning**: Improves agent behavior at inference time without fine-tuning or weight updates. - **Procedural memory**: Patterns of "how to succeed" that transfer across tasks and sessions. ## How It Works 1. Upload or pipe in your agent's execution traces (markdown, JSON, or plain text) 2. The Recursive Reflector analyzes traces via REPL-based code execution 3. Skills are extracted into a Skillbook with helpful/harmful counters and provenance tracking 4. Review and approve learned skills (human-in-the-loop) 5. Generate improved system prompts from approved skills 6. Deploy and repeat — continuous learning via delta updates ## Results Benchmarked on τ2-bench (Sierra Research), a benchmark that challenges agents to coordinate with users across complex enterprise domains: - pass@1: 41.2% → 52.5% (+27.4% improvement) - pass@2: 28.3% → 44.2% (+56.2% improvement) - pass@3: 22.5% → 41.2% (+83.1% improvement) - pass@4: 20.0% → 40.0% (+100.0% improvement) Real-world browser agent results: 30% → 100% success rate, 82% fewer steps, 65% lower token costs. ## Key Differentiators - **Open-source**: MIT licensed, 2k+ GitHub stars. Fully auditable. - **No fine-tuning required**: In-context learning — no GPU costs, no training data pipelines, no model lock-in. - **Framework-agnostic**: Works with LangChain, CrewAI, OpenAI Agents SDK, browser-use, Claude Code, AutoGen, or any framework that produces traces. - **Multi-paper synthesis**: The only framework combining ACE, RLM, and Dynamic Cheatsheet research. - **REPL-based analysis**: Programmatic trace introspection via code execution, not just LLM summarization. - **Transparent Skillbook**: Every learned behavior is human-readable, auditable, with provenance tracking. ## Use Cases - **Coding agents**: Learn from code review failures, codebase conventions, test regressions. Works with Claude Code, Cursor, GitHub Copilot, custom agents. - **Browser/computer-use agents**: Learn from navigation failures, form-filling errors, task completion gaps. 30% → 100% success rate demonstrated. - **Customer support agents**: Learn from policy violations, escalation mistakes, resolution patterns. - **Internal tooling agents**: Learn from operational patterns and team-specific workflows. ## Built At Kayba was built at ETH Zurich's AI Center, with affiliations to ETH Zurich, EPFL, University of Oxford, Max Planck Institute, Simons Institute, and University of St. Gallen (HSG). ## Install ``` pip install ace-framework ``` ## Pricing - **Open Source** (Free): Full framework via pip install, Recursive Reflector, Skillbook generation, LiteLLM integration, MIT licensed. - **Pro** ($29/month): Hosted dashboard, bring your own API key, 10,000 traces/month, team collaboration. - **Enterprise** (Contact us): SSO, audit logs, custom integrations, SLA, on-premise deployment. ## Links - Website: https://kayba.ai - Documentation: https://kayba.ai/docs - GitHub: https://github.com/kayba-ai/agentic-context-engine - Dashboard: https://use.kayba.ai - PyPI: https://pypi.org/project/ace-framework/ - Discord: https://discord.gg/kayba