Back to Home

Kayba vs Lemma

Compare Kayba's open-source agent learning framework with Lemma's closed-source prompt optimization. Transparent Skillbook vs black-box optimization.

March 11, 2026
ComparisonLemmaDirect CompetitorPrompt Optimization

The Short Answer

Both Kayba and Lemma aim to automatically improve AI agent performance over time. The fundamental difference is in approach and transparency: Kayba is an open-source learning framework where every improvement is auditable through the Skillbook. Lemma is a closed-source prompt optimization service where changes happen inside a black box.

Kayba shows you exactly what it learned and why. Lemma optimizes your prompts behind closed doors.

What Each Tool Does

Lemma

Lemma (YC F25) provides continuous prompt optimization as a service:

  • Drift detection: Monitors agent performance and flags when outputs degrade
  • Prompt optimization: Automatically rewrites prompts to improve results
  • Delivery via API or PR: Pushes optimized prompts through your existing workflow
  • Managed service: Handles the optimization loop end-to-end

Lemma is solving a real problem — agents drift over time, and manual prompt tuning is tedious. Their approach is to handle it as a managed service with minimal setup.

Kayba

Kayba is an open-source learning layer (MIT, 2k+ stars) that synthesizes three published research papers into a unified framework:

  • Recursive Reflector: REPL-based trace analysis that programmatically examines agent execution — grounded in the ACE framework (arXiv:2510.04618) and Reflective LLM Methods (arXiv:2512.24601)
  • Skill extraction: Failures and successes are distilled into atomic, reusable skills with helpful/harmful counters
  • Skillbook: A persistent, transparent collection of everything the agent has learned — organized, auditable, with provenance tracking back to source traces. Inspired by the Dynamic Cheatsheet approach (arXiv:2504.07952)
  • Prompt generation: Approved skills are compiled into optimized system prompts
  • Continuous learning: Delta updates refine the Skillbook incrementally as new traces come in

The framework is agent-agnostic and requires no fine-tuning — it works by improving the context your agent receives, not by retraining weights.

The Key Difference: Transparency

When Lemma optimizes your prompt, you get a new prompt. You don't see the reasoning, the intermediate analysis, or the specific failure patterns it detected. If something breaks, you're debugging a black box.

When Kayba generates an improved prompt, every step is traceable:

StepWhat you can inspect
Trace analysisWhich traces were analyzed, what the Recursive Reflector found
Skill extractionEach skill links to the specific trace and failure pattern that produced it
SkillbookEvery learned behavior is visible — helpful count, harmful count, source, status
ReviewYou approve, edit, or reject skills before they affect prompts
Prompt generationThe generated prompt maps directly to approved Skillbook entries

With Kayba, if an agent's behavior changes, you can trace exactly which skill caused it, which trace that skill came from, and whether the skill is actually helping. With Lemma, you get an optimized prompt and trust that it's better.

Comparison

DimensionKaybaLemma
Open sourceYes, MIT licenseNo, closed-source
TransparencyFull — Skillbook shows every learned behavior with provenanceBlack-box — optimized prompts delivered without visible reasoning
Research backing3 published papers (ACE, RLM, Dynamic Cheatsheet)No published research
ApproachTrace analysis, skill extraction, Skillbook curation, prompt generationDrift detection, prompt rewriting
Human reviewBuilt-in — approve, edit, or reject skills before deploymentLimited — you receive optimized prompts
Self-hostingYes, run entirely on your infrastructureNo, managed service only
Framework dependencyFramework-agnostic (any agent, any trace format)Integration-dependent
Fine-tuning requiredNo — improves context, not weightsNo — prompt-level optimization
PricingFree (OSS) / $29/month (hosted dashboard)Contact sales (demo-led)
MaturityProduction-ready, 2k+ GitHub stars, active communityEarly-stage (YC F25)

Benchmarks

Kayba's approach is validated on public benchmarks:

  • t2-bench: pass@1 improvement of +27.4%, scaling to +100% at pass@4
  • Browser agents: Success rate from 30% to 100%, with 82% fewer steps and 65% lower costs

These results come from the published research papers and are reproducible with the open-source framework.

When to Choose Lemma

Lemma may be a fit if:

  • You want a fully managed service with zero infrastructure to maintain
  • You're comfortable with a black-box approach and trust the output without needing to inspect the reasoning
  • Prompt optimization is your primary concern, not building a persistent knowledge base of agent behaviors
  • You prefer a demo-led sales process over self-serve

When to Choose Kayba

Kayba is the stronger choice if:

  • You need to understand exactly what changed in your agent's behavior and why
  • Auditability matters — regulated industries, enterprise compliance, or teams that need to review changes before deployment
  • You want to own your learning data, not send it to a third-party service
  • Self-hosting is a requirement (data sovereignty, air-gapped environments)
  • You value open-source — inspect the code, contribute, fork if needed
  • You want research-backed methods rather than proprietary optimization

Getting Started

Kayba is open-source and ready to use today:

pip install ace-framework
  • Documentation — Setup guides and API reference
  • GitHub — Source code and examples
  • Dashboard — Hosted version with visual Skillbook management