Changelog

Track the latest updates, improvements, and fixes to the Kayba framework.

v0.10.0April 13, 2026

Added

  • Usage metering hookRecursiveConfig.usage_callback: (RequestUsage, model_id) -> None fires once per pydantic-ai model request (orchestrator turns, sub-agent runs, tool-call follow-ups). Implemented via ace.rr.MeteredModel, a pydantic_ai.models.wrapper.WrapperModel subclass, so metering lives at the framework's own model boundary — one firing site, no per-call-site plumbing. Callback exceptions are caught and logged so metering never crashes the pipeline.
  • Pre-built model instance supportRRStep, create_rr_agent, create_sub_agent, and RecursiveConfig.subagent_model now accept either a model-id string or a pre-built pydantic_ai.models.Model instance. Enables callers that need a custom provider (e.g. a Bedrock model carrying STS-assumed credentials) to inject a fully-configured model rather than resolving from a string.
  • Sub-agent model_settingscreate_sub_agent now threads an explicit ModelSettings parameter into its PydanticAgent constructor.

Back-compat

Existing RRStep(model="...") callers are unchanged. The widened type signatures are additive.

v0.9.7April 11, 2026

TypeScript SDK

v0.9.5April 11, 2026

What's new

  • TypeScript tracing SDK (@kayba_ai/tracing) — instrument Node.js agents and send traces to Kayba, mirroring the Python SDK API
  • Standalone Python tracing package (kayba-tracing) — can be installed independently without the full ace-framework
  • ace.tracing continues to work as before (re-exports from kayba-tracing)
  • CI publishes all three packages (ace-framework, kayba-tracing, @kayba_ai/tracing) on release
v0.9.4April 11, 2026
  • Kayba tracing SDKace.tracing module wraps MLflow tracing with Kayba-native configuration, folder organization, and input sanitization (pip install ace-framework[tracing])
v0.9.3April 1, 2026
  • Structured design docs — split ACE_DESIGN.md into architecture, reference, and decisions docs under docs/design/
  • Simplified Skill model — removed unused tag counters (helpful/harmful/neutral) and TagStep from the pipeline
  • Cleaner InsightSource provenance — restored error_identification and learning_text fields
v0.9.2March 31, 2026

What's changed

Added

  • Insight source provenanceInsightSource typed model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata); AttachInsightSourcesStep automatically enriches UpdateBatch operations with provenance and is wired into the default learning tail
  • Claude SDK stepClaudeSDKStep integration for running Claude Code sub-agents from within ACE pipelines
  • RR sub-agent code execution — Recursive Reflector can now delegate to code-execution sub-agents at runtime
  • RR raw trace batch helpersbuild_raw_trace_batches and related runtime utilities for feeding raw traces directly into the RR pipeline

Fixed

  • Logfire scrubbing — added scrubbing callback to stop Logfire over-redacting trace content (reasoning, answers, messages now visible in Logfire UI)
  • RR combined-batch normalization — fixed ordering/deduplication of combined task batches in multi-sample runs

Docs

  • Logfire query API guide clarifications
  • MCP client setup guide and compatibility tests
  • Design docs updated to reflect insight source provenance model
v0.9.1March 26, 2026

Fixed

  • CLI packaging — include .md data files in wheel so kayba setup and skill install work on pip/uv-installed packages
v0.9.0March 26, 2026

Added

  • PydanticAI migration — ACE roles (Agent, Reflector, SkillManager) rebuilt on PydanticAI agents with structured output, replacing the legacy role system
  • Recursive Reflector — PydanticAI-powered trace analysis agent with sandboxed code execution, sub-agent delegation, and working memory (save_notes tool)
  • Kayba CLI — full hosted API client with trace upload/management, interactive run, insights, prompts, batch processing, materialization, and integration commands (kayba entry point)
v0.8.9March 18, 2026

Thread CancellationToken through TraceAnalyser.run() for pipeline cancellation support.

v0.8.8March 17, 2026

Introducing the Kayba CLI: automated agent self-improvement from your terminal

We built a CLI that plugs into Claude Code, Codex, or any coding agent and turns your agent's execution traces into improvements.

Upload traces → Kayba surfaces failure patterns → your coding agent proposes edits to your codebase. Pick what makes sense, implement, and repeat.

First test on tau2-bench: 34.3% improvement after a single cycle auto-accepting all changes.

🚀 Try it free

7-day free trial (no credit card required) at kayba.ai:

  • Automated agent self-improvement
  • CLI for Claude Code, Codex & more
  • Hosted dashboard & analytics
  • Team collaboration

The core engine (ACE) stays open source and MIT licensed. Run kayba setup to get started.


Added

  • Pipeline hooks & cancellationPipelineHook protocol and CancellationToken for observing and controlling pipeline execution
  • Kayba pipeline skills for Claude Code — 7-stage dynamic evaluation pipeline that generates custom benchmarks tailored to your agent's domain. Instead of static test suites, the skills analyze your API, build domain-aware metrics and rubrics, create action plans, and run human-in-the-loop validation — all as composable Claude Code skills
  • kayba setup command — one command to install the full evaluation skill pipeline into your .claude/skills/ directory, ready to use inside Claude Code out of the box
v0.8.7March 17, 2026
  • Improved Opik trace naming — traces now display the question text (first 80 chars) instead of generic names like "ace_pipeline" or "rr_reflect"
  • Thread ID support for OpikOpikStep and RROpikStep accept an optional thread_id parameter for grouping related traces
v0.8.6March 12, 2026

What's New

  • Kayba CLI — New kayba CLI for the hosted API with commands: upload, insights generate/list/triage, prompts generate/list/pull, status, materialize, batch, setup
  • HTTP clientKaybaClient with Bearer auth for the Kayba hosted API
  • Agent integrationkayba setup prints/appends coding agent instructions (CLAUDE.md, AGENTS.md, .cursorrules)

Full Changelog

https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.5...v0.8.6

v0.8.5March 4, 2026
  • Self-contained RR module (ace_next/rr/) — sandbox, subagent, trace_context, config extracted from ace/reflector/
  • v5.6 prompt promoted as default — prompt evolution (v4 → v5.1–v5.6) for the RR pipeline
  • build_steps() API — all runners gain a build_steps() classmethod for pipeline customization
  • Shared CallBudget — single budget instance shared across RR pipeline steps
  • ACE MCP server (optional) — stdio MCP server with tools: ace.ask, ace.learn.sample, ace.learn.feedback, ace.skillbook.get/save/load
  • MCP packaging + CLI — optional mcp extra and ace-mcp entrypoint
  • Composing pipelines guide — new docs/guides/composing-pipelines.md
  • RR examplesrr_demo.py, rr_opik_demo.py, compose_custom_pipeline.py
  • Opik made opt-in — moved from hard dependency to observability extra
v0.8.4February 27, 2026

What's New

  • OpenClaw integration — learn from OpenClaw session transcripts (JSONL) via new OpenClawToTraceStep and LoadTracesStep pipeline steps (#86)
  • ExportSkillbookMarkdownStep — export skillbook to markdown file
  • OpenClaw example script and integration docs
v0.8.3February 21, 2026
  • Pipeline engine — generic pipeline framework with branching, async boundaries, and parallel execution (#78)
  • Trace passthrough_build_traces() helper and raw trace data passed to RecursiveReflector sandbox
v0.8.2February 18, 2026
  • RecursiveReflector None-response guard — gracefully handles empty/None LLM responses (e.g. from Gemini) with retry prompt instead of crashing
  • LiteLLMClient.complete_messages() — native multi-turn completion that preserves structured message lists
v0.8.1February 18, 2026

Insight Source Tracing

Track where every skill in your skillbook came from.

Added

  • Insight source tracingInsightSource dataclass tracks skill provenance (epoch, sample, trace refs, error identification, learning text)
  • Sample.id promoted to first-class field with UUID auto-generation
  • Skillbook query APIsource_map(), source_summary(), source_filter() for skill lineage
  • Insight sources wired through OfflineACE, OnlineACE, and async learning pipelines
  • UpdateOperation.learning_index for linking operations to reflector learnings
  • Bedrock e2e example (examples/litellm/bedrock_insight_source_test.py)
  • docs/INSIGHT_SOURCES.md guide
v0.8.0February 17, 2026

What's New

  • TAU-bench integration: Full benchmark framework for evaluating agents on TAU-bench tasks
  • Recursive Reflector: New reflector module with sandbox execution, trace context, and sub-agent support
  • Skillbook tools: Clean, consolidate, and merge skillbooks via new utility scripts
v0.7.3February 4, 2026

Release v0.7.3

v0.7.2January 26, 2026

What's New

Agentic System Prompting

New workflow to automatically optimize your agent's system prompts using your own data. Feed in past traces or conversations, and ACE analyzes what worked and what failed to generate actionable prompt suggestions.

Traces / ConversationsACEPrompt Suggestions

Each suggestion includes the recommended prompt text, justification for why it helps, and evidence from your actual traces. You review and decide what to implement.

See examples/agentic-system-prompting/ for the full workflow.

Other Changes

  • Fix: Align test matrix with Python 3.12 requirement
  • Fix: Use setup-uv action for Windows CI compatibility
v0.7.1December 8, 2025

Fix: Forward credentials (api_key, base_url, etc.) to Instructor client (#44)

This patch fixes an issue where custom API credentials weren't being forwarded to all internal LLM calls, causing authentication errors when using OpenAI-compatible endpoints.

v0.7.0December 4, 2025

⚠️ Breaking Changes

Complete terminology rename - Playbook → Skillbook, Bullet → Skill

OldNew
PlaybookSkillbook
BulletSkill
GeneratorAgent
CuratorSkillManager
OfflineAdapterOfflineACE
OnlineAdapterOnlineACE
DeltaOperationUpdateOperation
DeltaBatchUpdateBatch

Migration:

# Old
from ace import Playbook, Bullet, Generator, Curator, OfflineAdapter

# New
from ace import Skillbook, Skill, Agent, SkillManager, OfflineACE

JSON files: Change "bullets" key to "skills" in saved skillbooks.

Fixed

  • Deduplication now properly applies consolidation operations
v0.6.0November 29, 2025

Summary

Async learning pipeline with parallel Reflectors, bullet deduplication, and Instructor integration.

🚀 Async Learning

Non-blocking background learning - answers return immediately while learning continues in background threads.

agent.learn(samples, env, async_learning=True, max_reflector_workers=3)

🔍 Bullet Deduplication

Vector embedding-based duplicate detection prevents playbook bloat.

agent = ACELiteLLM(model="gpt-4o-mini", dedup_config=DeduplicationConfig(similarity_threshold=0.80))

📋 Instructor Integration

Robust JSON parsing with Pydantic schema validation and automatic retries.

Other Changes

  • Reorganized examples by integration type (litellm/, langchain/, local-models/)
  • Fixed Claude temperature+top_p conflict
  • Improved Curator prompt for better deduplication and imperative strategy format
  • Increased default max_tokens from 512 to 2048 to prevent truncation
  • Added comprehensive test suites (~1600 lines)

Tests

291 passed, 67% coverage

v0.5.1November 25, 2025

Bug Fixes

  • Fixed Opik integration warnings for base installations
  • Improved Opik configuration for local usage
v0.5.0November 20, 2025

⚠️ Breaking Changes

  • Playbook format changed to TOON (Token-Oriented Object Notation)
    • Playbook.as_prompt() now returns TOON format instead of markdown
    • Reason: 16-62% token savings for improved scalability and reduced inference costs
    • Migration: No action needed if using playbook with Generator/Curator/Reflector
    • Debugging: Use playbook._as_markdown_debug() or str(playbook) for human-readable output
    • Details: Uses tab delimiters and excludes internal metadata (created_at, updated_at)

Added

  • ACELiteLLM integration - Simple conversational agent with automatic learning
  • ACELangChain integration - Wrap LangChain Runnables with ACE learning
  • Custom integration pattern - Wrap ANY agentic system with ACE learning
    • Base utilities in ace/integrations/base.py with wrap_playbook_context() helper
    • Complete working example in examples/custom_integration_example.py
    • Integration Pattern: Inject playbook → Execute agent → Learn from results
  • Integration exports - Import ACEAgent, ACELiteLLM, ACELangChain from ace package root
  • TOON compression for playbooks - 16-62% token reduction vs markdown
  • Citation-based tracking - Strategies cited inline as [section-00001], auto-extracted from reasoning
  • Enhanced browser traces - Full execution logs (2200+ chars) passed to Reflector
  • Test coverage - Improved from 28% to 70% (241 tests total)

Changed

  • Renamed SimpleAgent → ACELiteLLM - Clearer naming for conversational agent integration
  • Playbook.__str__() returns markdown (TOON reserved for LLM consumption via as_prompt())

Fixed

  • Browser-use trace integration - Reflector now receives complete execution traces
    • Fixed initial query duplication (task appeared in both question and reasoning)
    • Fixed missing trace data (reasoning field now contains 2200+ chars vs 154 chars)
    • Fixed screenshot attribute bug causing AttributeError on step.state.screenshot
    • Fixed invalid bullet ID filtering - hallucinated/malformed citations now filtered out
    • Added comprehensive regression tests to catch these issues
    • Impact: Reflector can now properly analyze browser agent's thought process
    • Test coverage improved: 69% → 79% for browser_use.py
  • Prompt v2.1 test assertions updated to match current format
  • All 206 tests now pass (was 189)
v0.4.0November 8, 2025

Changes

  • Fixed GitHub Actions workflow triggering
  • Fixed all 46 mypy type checking errors
  • Improved type annotations across codebase
  • Python 3.11+ required
v0.3.0October 16, 2025

🚀 Highlights

We're excited to introduce experimental v2 prompts that bring state-of-the-art prompt engineering to ACE! This release adds confidence scoring, domain-specific optimizations, and comprehensive prompt management capabilities.

✨ What's New

Experimental v2 Prompts (Beta)

  • 🎯 Confidence Scoring: Know when your AI is certain vs uncertain
    • Bullet-level confidence (how applicable each strategy is)
    • Answer-level confidence (overall certainty of the response)
  • 📝 Enhanced Reasoning: 23% more detailed step-by-step explanations
  • 🔧 Domain Optimization: Specialized prompts for math and code generation
  • ✅ Better Structure: Based on analysis of 80+ production AI systems

Prompt Management System

  • PromptManager class for version control and A/B testing
  • Easy switching between v1 (stable) and v2 (experimental)
  • Domain-specific prompt selection
  • Usage tracking and statistics

Playbook Persistence

  • Save trained playbooks with playbook.save_to_file("model.json")
  • Load pre-trained playbooks with Playbook.load_from_file("model.json")
  • Full JSON serialization support

Documentation & Examples

  • 📚 Comprehensive prompt engineering guide (docs/PROMPT_ENGINEERING.md)
  • 🔬 v1 vs v2 comparison script (examples/compare_v1_v2_prompts.py)
  • 💡 Advanced v2 examples (examples/advanced_prompts_v2.py)
  • 🎨 Mermaid flowchart visualization of ACE learning loop in README

🔄 Changes

  • Enhanced docstrings with comprehensive examples throughout
  • Improved README with visual diagrams and v2 prompts section
  • Code formatting standardized with Black

🐛 Fixes

  • Fixed Black formatting issues for CI/CD compliance
  • Corrected README references to non-existent directories
  • Fixed test badge URL in README

📊 v1 vs v2 Performance

Featurev1v2 (Experimental)
Token UsageBaseline+30-50% more
Confidence Scoring
Reasoning DetailBasicEnhanced (+23%)
Domain Variants✅ Math, Code

🚀 Quick Start with v2

from ace.prompts_v2 import PromptManager

⚠ Important Notes

  • v2 prompts are experimental and in active development
  • They use 30-50% more tokens due to enhanced structure
  • Test with your use case before production deployment
  • v1 prompts remain the default for stability
v0.1.1October 15, 2025

Fixed Release - v0.1.1-alpha

This release fixes the GitHub Actions workflow for PyPI publishing.

Changes

  • Updated artifact upload/download actions from v3 to v4
  • Fixed deprecation errors preventing package publication

Installation

pip install ace-framework

All features remain the same as v0.1.0. This is a infrastructure fix only.

v0.1.0October 15, 2025

Initial Alpha Release of ACE Framework

This is the first alpha release of the Agentic Context Engine (ACE) framework, a Python implementation based on the paper "Agentic Context Engineering" from Stanford/SambaNova.

Alpha Status

This is an alpha release for early adopters and contributors. The API may change in future releases as we refine the framework based on community feedback.

Features

  • Self-improving agents that learn from experience
  • Playbook system for storing and evolving strategies
  • Three-role architecture: Generator, Reflector, and Curator
  • 100+ LLM providers support via LiteLLM (OpenAI, Anthropic, Google, etc.)
  • Async support for high-performance applications
  • Online and offline adaptation modes

Installation

pip install ace-framework

Quick Start

from ace import LiteLLMClient, OfflineAdapter, Playbook

Create your agent

client = LiteLLMClient(model="gpt-3.5-turbo") adapter = OfflineAdapter( playbook=Playbook(), generator=Generator(client), reflector=Reflector(client), curator=Curator(client) )

Notes