The Short Answer
DSPy is a framework for building LLM programs with automatic prompt optimization. Kayba is a learning layer that analyzes agent execution traces and generates improved prompts from experience.
DSPy optimizes prompts through search. Kayba learns from what actually happened in production.
If you're building structured LLM pipelines from scratch and want the framework to find optimal prompts for you, DSPy is excellent. If you already have an agent running (in any framework) and want it to systematically improve from its own successes and failures, that's what Kayba does.
They solve different problems at different stages of the agent lifecycle.
What Each Tool Does
DSPy
DSPy (31.8k+ GitHub stars), developed at Stanford NLP, introduces a "programming not prompting" paradigm:
- Signatures: Declare input/output behavior declaratively (e.g.,
question -> answer) - Modules: Composable building blocks (ChainOfThought, ReAct, ProgramOfThought) that replace hand-written prompts
- Optimizers (Compilers): Automatically search for the best prompts, few-shot examples, or fine-tuning data by evaluating against a metric
- Typed predictors: Enforce structured outputs with Pydantic models
- Assertions: Runtime constraints that guide LLM behavior
DSPy's core insight is that prompts should be compiled, not written. You define what you want, provide a metric and training examples, and the optimizer finds prompts that work.
Kayba
Kayba (2k+ stars, MIT license) is an open-source learning layer for AI agents, built at ETH Zurich AI Center:
- Trace analysis: The Recursive Reflector programmatically analyzes agent execution traces via REPL-based code execution, extracting patterns from real-world behavior
- Skill extraction: Successes and failures are distilled into atomic, reusable skills with helpful/harmful tracking
- Skillbook: A persistent, transparent collection of learned behaviors — auditable, with provenance linking each skill to the trace that produced it
- Prompt generation: Approved skills compile into optimized system prompts
- Continuous learning: Delta updates refine the Skillbook incrementally as new traces arrive
Kayba synthesizes three research contributions: ACE (arXiv:2510.04618), RLM (arXiv:2512.24601), and Dynamic Cheatsheet (arXiv:2504.07952).
The Key Difference: Optimization vs. Learning from Experience
This is the fundamental distinction.
DSPy optimizes at build time. You provide training examples and a metric, and the optimizer searches the space of possible prompts (via bootstrapping, random search, or Bayesian methods) to find one that scores well. The result is a static, optimized prompt that you deploy.
Kayba learns at run time. Your agent runs in production, Kayba analyzes the traces of what actually happened, extracts skills from real successes and failures, and generates improved prompts. The Skillbook grows continuously as the agent encounters new situations.
| DSPy | Kayba | |
|---|---|---|
| When optimization happens | Before deployment (compile step) | After deployment (continuous learning) |
| Input | Training examples + metric | Real execution traces |
| Method | Search/compilation over prompt space | Trace analysis via Recursive Reflector |
| Output | Optimized prompt or few-shot examples | Skillbook + generated prompts |
| Human review | Inspect compiled prompts | Review and approve/reject individual skills |
Think of it this way: DSPy is a compiler that produces good prompts from specifications. Kayba is a learning system that produces better prompts from experience.
Comparison
| Dimension | DSPy | Kayba |
|---|---|---|
| Primary function | Programmatic prompt optimization | Experience-based agent improvement |
| Integration model | You build programs in DSPy | You add Kayba on top of any framework |
| Framework dependency | DSPy is the framework — your code uses DSPy modules | Framework-agnostic (works with LangChain, CrewAI, custom agents, etc.) |
| Prompt strategy | Compiler finds optimal prompts via search | Skills extracted from traces, compiled into prompts |
| Learning source | Training examples you provide | Production traces the agent generates |
| Knowledge representation | Compiled prompts (opaque optimization result) | Skillbook (transparent, auditable skills with provenance) |
| Fine-tuning | Supported (can compile to fine-tuning data) | Not required (prompt-level improvement only) |
| Open source | Yes (MIT, 31.8k+ stars) | Yes (MIT, 2k+ stars) |
When DSPy Is the Better Choice
DSPy is stronger when:
- You're building a new LLM pipeline from scratch and want a structured framework with composable modules. DSPy's signatures and modules are a clean abstraction for pipeline design.
- You have clear training data and metrics upfront. DSPy's optimizers shine when you can define exactly what "good" looks like and provide examples.
- You need fine-tuning support. DSPy can compile programs down to fine-tuning data for smaller models — Kayba operates at the prompt level only.
- Your task is well-scoped and static. Classification, extraction, QA over a fixed domain — tasks where the optimal prompt doesn't need to evolve over time.
- You want to eliminate prompt engineering entirely. DSPy's philosophy is that you should never write a prompt — the compiler handles it.
When Kayba Is the Better Choice
Kayba is stronger when:
- You already have an agent running and want to improve it without rewriting it in a new framework. Kayba works on top of what you have.
- Your agent encounters diverse, evolving scenarios where training examples can't cover everything upfront. Kayba learns from real production behavior.
- You need transparency in what was learned. The Skillbook shows exactly what the agent learned, where it came from, and whether it helps or hurts — not an opaque optimized prompt.
- You want continuous improvement, not a one-time optimization step. As your agent handles more cases, the Skillbook grows and prompts improve automatically.
- Your team wants to review changes before deployment. Kayba's approve/edit/reject workflow gives you control over every skill before it enters the prompt.
Using Them Together
DSPy and Kayba can complement each other in a mature agent stack:
- DSPy structures your initial pipeline — use signatures and modules to build a clean, composable agent architecture. Run the optimizer to get a strong starting prompt.
- Deploy the DSPy-optimized agent to production.
- Kayba analyzes real execution traces, finding failure patterns and edge cases the optimizer's training examples didn't cover.
- Skills from Kayba can inform updates to your DSPy program — either by adding new training examples for the next optimization round, or by incorporating learned rules directly.
The combination gives you the best of both: DSPy's structured optimization for the initial build, and Kayba's continuous learning for production improvement.
Results
Kayba's experience-based approach has shown significant improvements in benchmarks and production deployments:
- t2-bench: pass@1 improvement of +27.4%, scaling to +100% at pass@4
- Browser agents: Success rate from 30% to 100%, with 82% fewer steps and 65% lower costs
These gains come from the Skillbook accumulating procedural knowledge — the kind of "how to handle this situation" insights that are difficult to capture in training examples but emerge naturally from analyzing real traces.
Getting Started
Both tools are open-source and well-documented.
DSPy:
pip install dspy
Kayba:
pip install ace-framework
- Documentation — Setup guides and API reference
- GitHub — Source code and examples
- Dashboard — Hosted version with visual Skillbook management