Back to Home

Kayba vs DSPy

Compare Kayba's experience-based agent learning with DSPy's programmatic prompt optimization. Different approaches to making AI agents better — learn when to use each.

March 11, 2026
ComparisonDSPyPrompt OptimizationStanford NLP

The Short Answer

DSPy is a framework for building LLM programs with automatic prompt optimization. Kayba is a learning layer that analyzes agent execution traces and generates improved prompts from experience.

DSPy optimizes prompts through search. Kayba learns from what actually happened in production.

If you're building structured LLM pipelines from scratch and want the framework to find optimal prompts for you, DSPy is excellent. If you already have an agent running (in any framework) and want it to systematically improve from its own successes and failures, that's what Kayba does.

They solve different problems at different stages of the agent lifecycle.

What Each Tool Does

DSPy

DSPy (31.8k+ GitHub stars), developed at Stanford NLP, introduces a "programming not prompting" paradigm:

  • Signatures: Declare input/output behavior declaratively (e.g., question -> answer)
  • Modules: Composable building blocks (ChainOfThought, ReAct, ProgramOfThought) that replace hand-written prompts
  • Optimizers (Compilers): Automatically search for the best prompts, few-shot examples, or fine-tuning data by evaluating against a metric
  • Typed predictors: Enforce structured outputs with Pydantic models
  • Assertions: Runtime constraints that guide LLM behavior

DSPy's core insight is that prompts should be compiled, not written. You define what you want, provide a metric and training examples, and the optimizer finds prompts that work.

Kayba

Kayba (2k+ stars, MIT license) is an open-source learning layer for AI agents, built at ETH Zurich AI Center:

  • Trace analysis: The Recursive Reflector programmatically analyzes agent execution traces via REPL-based code execution, extracting patterns from real-world behavior
  • Skill extraction: Successes and failures are distilled into atomic, reusable skills with helpful/harmful tracking
  • Skillbook: A persistent, transparent collection of learned behaviors — auditable, with provenance linking each skill to the trace that produced it
  • Prompt generation: Approved skills compile into optimized system prompts
  • Continuous learning: Delta updates refine the Skillbook incrementally as new traces arrive

Kayba synthesizes three research contributions: ACE (arXiv:2510.04618), RLM (arXiv:2512.24601), and Dynamic Cheatsheet (arXiv:2504.07952).

The Key Difference: Optimization vs. Learning from Experience

This is the fundamental distinction.

DSPy optimizes at build time. You provide training examples and a metric, and the optimizer searches the space of possible prompts (via bootstrapping, random search, or Bayesian methods) to find one that scores well. The result is a static, optimized prompt that you deploy.

Kayba learns at run time. Your agent runs in production, Kayba analyzes the traces of what actually happened, extracts skills from real successes and failures, and generates improved prompts. The Skillbook grows continuously as the agent encounters new situations.

DSPyKayba
When optimization happensBefore deployment (compile step)After deployment (continuous learning)
InputTraining examples + metricReal execution traces
MethodSearch/compilation over prompt spaceTrace analysis via Recursive Reflector
OutputOptimized prompt or few-shot examplesSkillbook + generated prompts
Human reviewInspect compiled promptsReview and approve/reject individual skills

Think of it this way: DSPy is a compiler that produces good prompts from specifications. Kayba is a learning system that produces better prompts from experience.

Comparison

DimensionDSPyKayba
Primary functionProgrammatic prompt optimizationExperience-based agent improvement
Integration modelYou build programs in DSPyYou add Kayba on top of any framework
Framework dependencyDSPy is the framework — your code uses DSPy modulesFramework-agnostic (works with LangChain, CrewAI, custom agents, etc.)
Prompt strategyCompiler finds optimal prompts via searchSkills extracted from traces, compiled into prompts
Learning sourceTraining examples you provideProduction traces the agent generates
Knowledge representationCompiled prompts (opaque optimization result)Skillbook (transparent, auditable skills with provenance)
Fine-tuningSupported (can compile to fine-tuning data)Not required (prompt-level improvement only)
Open sourceYes (MIT, 31.8k+ stars)Yes (MIT, 2k+ stars)

When DSPy Is the Better Choice

DSPy is stronger when:

  • You're building a new LLM pipeline from scratch and want a structured framework with composable modules. DSPy's signatures and modules are a clean abstraction for pipeline design.
  • You have clear training data and metrics upfront. DSPy's optimizers shine when you can define exactly what "good" looks like and provide examples.
  • You need fine-tuning support. DSPy can compile programs down to fine-tuning data for smaller models — Kayba operates at the prompt level only.
  • Your task is well-scoped and static. Classification, extraction, QA over a fixed domain — tasks where the optimal prompt doesn't need to evolve over time.
  • You want to eliminate prompt engineering entirely. DSPy's philosophy is that you should never write a prompt — the compiler handles it.

When Kayba Is the Better Choice

Kayba is stronger when:

  • You already have an agent running and want to improve it without rewriting it in a new framework. Kayba works on top of what you have.
  • Your agent encounters diverse, evolving scenarios where training examples can't cover everything upfront. Kayba learns from real production behavior.
  • You need transparency in what was learned. The Skillbook shows exactly what the agent learned, where it came from, and whether it helps or hurts — not an opaque optimized prompt.
  • You want continuous improvement, not a one-time optimization step. As your agent handles more cases, the Skillbook grows and prompts improve automatically.
  • Your team wants to review changes before deployment. Kayba's approve/edit/reject workflow gives you control over every skill before it enters the prompt.

Using Them Together

DSPy and Kayba can complement each other in a mature agent stack:

  1. DSPy structures your initial pipeline — use signatures and modules to build a clean, composable agent architecture. Run the optimizer to get a strong starting prompt.
  2. Deploy the DSPy-optimized agent to production.
  3. Kayba analyzes real execution traces, finding failure patterns and edge cases the optimizer's training examples didn't cover.
  4. Skills from Kayba can inform updates to your DSPy program — either by adding new training examples for the next optimization round, or by incorporating learned rules directly.

The combination gives you the best of both: DSPy's structured optimization for the initial build, and Kayba's continuous learning for production improvement.

Results

Kayba's experience-based approach has shown significant improvements in benchmarks and production deployments:

  • t2-bench: pass@1 improvement of +27.4%, scaling to +100% at pass@4
  • Browser agents: Success rate from 30% to 100%, with 82% fewer steps and 65% lower costs

These gains come from the Skillbook accumulating procedural knowledge — the kind of "how to handle this situation" insights that are difficult to capture in training examples but emerge naturally from analyzing real traces.

Getting Started

Both tools are open-source and well-documented.

DSPy:

pip install dspy

Kayba:

pip install ace-framework
  • Documentation — Setup guides and API reference
  • GitHub — Source code and examples
  • Dashboard — Hosted version with visual Skillbook management