The Short Answer

DSPy is a framework for building LLM programs with automatic prompt optimization. Kayba is a learning layer that analyzes agent execution traces and generates improved prompts from experience.

DSPy optimizes prompts through search. Kayba learns from what actually happened in production.

If you're building structured LLM pipelines from scratch and want the framework to find optimal prompts for you, DSPy is excellent. If you already have an agent running (in any framework) and want it to systematically improve from its own successes and failures, that's what Kayba does.

They solve different problems at different stages of the agent lifecycle.

What Each Tool Does

DSPy

DSPy (31.8k+ GitHub stars), developed at Stanford NLP, introduces a "programming not prompting" paradigm:

Signatures: Declare input/output behavior declaratively (e.g., question -> answer)
Modules: Composable building blocks (ChainOfThought, ReAct, ProgramOfThought) that replace hand-written prompts
Optimizers (Compilers): Automatically search for the best prompts, few-shot examples, or fine-tuning data by evaluating against a metric
Typed predictors: Enforce structured outputs with Pydantic models
Assertions: Runtime constraints that guide LLM behavior

DSPy's core insight is that prompts should be compiled, not written. You define what you want, provide a metric and training examples, and the optimizer finds prompts that work.

Kayba

Kayba (2k+ stars, MIT license) is an open-source learning layer for AI agents, built at ETH Zurich AI Center:

Trace analysis: The Recursive Reflector programmatically analyzes agent execution traces via REPL-based code execution, extracting patterns from real-world behavior
Skill extraction: Successes and failures are distilled into atomic, reusable skills with helpful/harmful tracking
Skillbook: A persistent, transparent collection of learned behaviors — auditable, with provenance linking each skill to the trace that produced it
Prompt generation: Approved skills compile into optimized system prompts
Continuous learning: Delta updates refine the Skillbook incrementally as new traces arrive

Kayba synthesizes three research contributions: ACE (arXiv:2510.04618), RLM (arXiv:2512.24601), and Dynamic Cheatsheet (arXiv:2504.07952).

The Key Difference: Optimization vs. Learning from Experience

This is the fundamental distinction.

DSPy optimizes at build time. You provide training examples and a metric, and the optimizer searches the space of possible prompts (via bootstrapping, random search, or Bayesian methods) to find one that scores well. The result is a static, optimized prompt that you deploy.

Kayba learns at run time. Your agent runs in production, Kayba analyzes the traces of what actually happened, extracts skills from real successes and failures, and generates improved prompts. The Skillbook grows continuously as the agent encounters new situations.

	DSPy	Kayba
When optimization happens	Before deployment (compile step)	After deployment (continuous learning)
Input	Training examples + metric	Real execution traces
Method	Search/compilation over prompt space	Trace analysis via Recursive Reflector
Output	Optimized prompt or few-shot examples	Skillbook + generated prompts
Human review	Inspect compiled prompts	Review and approve/reject individual skills

Think of it this way: DSPy is a compiler that produces good prompts from specifications. Kayba is a learning system that produces better prompts from experience.

Comparison

Dimension	DSPy	Kayba
Primary function	Programmatic prompt optimization	Experience-based agent improvement
Integration model	You build programs in DSPy	You add Kayba on top of any framework
Framework dependency	DSPy is the framework — your code uses DSPy modules	Framework-agnostic (works with LangChain, CrewAI, custom agents, etc.)
Prompt strategy	Compiler finds optimal prompts via search	Skills extracted from traces, compiled into prompts
Learning source	Training examples you provide	Production traces the agent generates
Knowledge representation	Compiled prompts (opaque optimization result)	Skillbook (transparent, auditable skills with provenance)
Fine-tuning	Supported (can compile to fine-tuning data)	Not required (prompt-level improvement only)
Open source	Yes (MIT, 31.8k+ stars)	Yes (MIT, 2k+ stars)

When DSPy Is the Better Choice

DSPy is stronger when:

You're building a new LLM pipeline from scratch and want a structured framework with composable modules. DSPy's signatures and modules are a clean abstraction for pipeline design.
You have clear training data and metrics upfront. DSPy's optimizers shine when you can define exactly what "good" looks like and provide examples.
You need fine-tuning support. DSPy can compile programs down to fine-tuning data for smaller models — Kayba operates at the prompt level only.
Your task is well-scoped and static. Classification, extraction, QA over a fixed domain — tasks where the optimal prompt doesn't need to evolve over time.
You want to eliminate prompt engineering entirely. DSPy's philosophy is that you should never write a prompt — the compiler handles it.

When Kayba Is the Better Choice

Kayba is stronger when:

You already have an agent running and want to improve it without rewriting it in a new framework. Kayba works on top of what you have.
Your agent encounters diverse, evolving scenarios where training examples can't cover everything upfront. Kayba learns from real production behavior.
You need transparency in what was learned. The Skillbook shows exactly what the agent learned, where it came from, and whether it helps or hurts — not an opaque optimized prompt.
You want continuous improvement, not a one-time optimization step. As your agent handles more cases, the Skillbook grows and prompts improve automatically.
Your team wants to review changes before deployment. Kayba's approve/edit/reject workflow gives you control over every skill before it enters the prompt.

Using Them Together

DSPy and Kayba can complement each other in a mature agent stack:

DSPy structures your initial pipeline — use signatures and modules to build a clean, composable agent architecture. Run the optimizer to get a strong starting prompt.
Deploy the DSPy-optimized agent to production.
Kayba analyzes real execution traces, finding failure patterns and edge cases the optimizer's training examples didn't cover.
Skills from Kayba can inform updates to your DSPy program — either by adding new training examples for the next optimization round, or by incorporating learned rules directly.

The combination gives you the best of both: DSPy's structured optimization for the initial build, and Kayba's continuous learning for production improvement.

Results

Kayba's experience-based approach has shown significant improvements in benchmarks and production deployments:

t2-bench: pass@1 improvement of +27.4%, scaling to +100% at pass@4
Browser agents: Success rate from 30% to 100%, with 82% fewer steps and 65% lower costs

These gains come from the Skillbook accumulating procedural knowledge — the kind of "how to handle this situation" insights that are difficult to capture in training examples but emerge naturally from analyzing real traces.

Getting Started

Both tools are open-source and well-documented.

DSPy:

pip install dspy

Kayba:

pip install ace-framework

Documentation — Setup guides and API reference
GitHub — Source code and examples
Dashboard — Hosted version with visual Skillbook management