Changelog

Track the latest updates, improvements, and fixes to the Kayba framework.

v0.10.0April 13, 2026

Added

Usage metering hook — RecursiveConfig.usage_callback: (RequestUsage, model_id) -> None fires once per pydantic-ai model request (orchestrator turns, sub-agent runs, tool-call follow-ups). Implemented via ace.rr.MeteredModel, a pydantic_ai.models.wrapper.WrapperModel subclass, so metering lives at the framework's own model boundary — one firing site, no per-call-site plumbing. Callback exceptions are caught and logged so metering never crashes the pipeline.
Pre-built model instance support — RRStep, create_rr_agent, create_sub_agent, and RecursiveConfig.subagent_model now accept either a model-id string or a pre-built pydantic_ai.models.Model instance. Enables callers that need a custom provider (e.g. a Bedrock model carrying STS-assumed credentials) to inject a fully-configured model rather than resolving from a string.
Sub-agent model_settings — create_sub_agent now threads an explicit ModelSettings parameter into its PydanticAgent constructor.

Back-compat

Existing RRStep(model="...") callers are unchanged. The widened type signatures are additive.

v0.9.7April 11, 2026

TypeScript SDK

v0.9.5April 11, 2026

What's new

TypeScript tracing SDK (@kayba_ai/tracing) — instrument Node.js agents and send traces to Kayba, mirroring the Python SDK API
Standalone Python tracing package (kayba-tracing) — can be installed independently without the full ace-framework
ace.tracing continues to work as before (re-exports from kayba-tracing)
CI publishes all three packages (ace-framework, kayba-tracing, @kayba_ai/tracing) on release

v0.9.4April 11, 2026

Kayba tracing SDK — ace.tracing module wraps MLflow tracing with Kayba-native configuration, folder organization, and input sanitization (pip install ace-framework[tracing])

v0.9.3April 1, 2026

Structured design docs — split ACE_DESIGN.md into architecture, reference, and decisions docs under docs/design/
Simplified Skill model — removed unused tag counters (helpful/harmful/neutral) and TagStep from the pipeline
Cleaner InsightSource provenance — restored error_identification and learning_text fields

v0.9.2March 31, 2026

What's changed

Added

Insight source provenance — InsightSource typed model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata); AttachInsightSourcesStep automatically enriches UpdateBatch operations with provenance and is wired into the default learning tail
Claude SDK step — ClaudeSDKStep integration for running Claude Code sub-agents from within ACE pipelines
RR sub-agent code execution — Recursive Reflector can now delegate to code-execution sub-agents at runtime
RR raw trace batch helpers — build_raw_trace_batches and related runtime utilities for feeding raw traces directly into the RR pipeline

Fixed

Logfire scrubbing — added scrubbing callback to stop Logfire over-redacting trace content (reasoning, answers, messages now visible in Logfire UI)
RR combined-batch normalization — fixed ordering/deduplication of combined task batches in multi-sample runs

Docs

Logfire query API guide clarifications
MCP client setup guide and compatibility tests
Design docs updated to reflect insight source provenance model

v0.9.1March 26, 2026

Fixed

CLI packaging — include .md data files in wheel so kayba setup and skill install work on pip/uv-installed packages

v0.9.0March 26, 2026

Added

PydanticAI migration — ACE roles (Agent, Reflector, SkillManager) rebuilt on PydanticAI agents with structured output, replacing the legacy role system
Recursive Reflector — PydanticAI-powered trace analysis agent with sandboxed code execution, sub-agent delegation, and working memory (save_notes tool)
Kayba CLI — full hosted API client with trace upload/management, interactive run, insights, prompts, batch processing, materialization, and integration commands (kayba entry point)

v0.8.9March 18, 2026

Thread CancellationToken through TraceAnalyser.run() for pipeline cancellation support.

v0.8.8March 17, 2026

Introducing the Kayba CLI: automated agent self-improvement from your terminal

We built a CLI that plugs into Claude Code, Codex, or any coding agent and turns your agent's execution traces into improvements.

Upload traces → Kayba surfaces failure patterns → your coding agent proposes edits to your codebase. Pick what makes sense, implement, and repeat.

First test on tau2-bench: 34.3% improvement after a single cycle auto-accepting all changes.

🚀 Try it free

7-day free trial (no credit card required) at kayba.ai:

Automated agent self-improvement
CLI for Claude Code, Codex & more
Hosted dashboard & analytics
Team collaboration

The core engine (ACE) stays open source and MIT licensed. Run kayba setup to get started.

Added

Pipeline hooks & cancellation — PipelineHook protocol and CancellationToken for observing and controlling pipeline execution
Kayba pipeline skills for Claude Code — 7-stage dynamic evaluation pipeline that generates custom benchmarks tailored to your agent's domain. Instead of static test suites, the skills analyze your API, build domain-aware metrics and rubrics, create action plans, and run human-in-the-loop validation — all as composable Claude Code skills
kayba setup command — one command to install the full evaluation skill pipeline into your .claude/skills/ directory, ready to use inside Claude Code out of the box

v0.8.7March 17, 2026

Improved Opik trace naming — traces now display the question text (first 80 chars) instead of generic names like "ace_pipeline" or "rr_reflect"
Thread ID support for Opik — OpikStep and RROpikStep accept an optional thread_id parameter for grouping related traces

v0.8.6March 12, 2026

What's New

Kayba CLI — New kayba CLI for the hosted API with commands: upload, insights generate/list/triage, prompts generate/list/pull, status, materialize, batch, setup
HTTP client — KaybaClient with Bearer auth for the Kayba hosted API
Agent integration — kayba setup prints/appends coding agent instructions (CLAUDE.md, AGENTS.md, .cursorrules)

Full Changelog

https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.5...v0.8.6

v0.8.5March 4, 2026

Self-contained RR module (ace_next/rr/) — sandbox, subagent, trace_context, config extracted from ace/reflector/
v5.6 prompt promoted as default — prompt evolution (v4 → v5.1–v5.6) for the RR pipeline
build_steps() API — all runners gain a build_steps() classmethod for pipeline customization
Shared CallBudget — single budget instance shared across RR pipeline steps
ACE MCP server (optional) — stdio MCP server with tools: ace.ask, ace.learn.sample, ace.learn.feedback, ace.skillbook.get/save/load
MCP packaging + CLI — optional mcp extra and ace-mcp entrypoint
Composing pipelines guide — new docs/guides/composing-pipelines.md
RR examples — rr_demo.py, rr_opik_demo.py, compose_custom_pipeline.py
Opik made opt-in — moved from hard dependency to observability extra

v0.8.4February 27, 2026

What's New

OpenClaw integration — learn from OpenClaw session transcripts (JSONL) via new OpenClawToTraceStep and LoadTracesStep pipeline steps (#86)
ExportSkillbookMarkdownStep — export skillbook to markdown file
OpenClaw example script and integration docs

v0.8.3February 21, 2026

Pipeline engine — generic pipeline framework with branching, async boundaries, and parallel execution (#78)
Trace passthrough — _build_traces() helper and raw trace data passed to RecursiveReflector sandbox

v0.8.2February 18, 2026

RecursiveReflector None-response guard — gracefully handles empty/None LLM responses (e.g. from Gemini) with retry prompt instead of crashing
LiteLLMClient.complete_messages() — native multi-turn completion that preserves structured message lists

v0.8.1February 18, 2026

Insight Source Tracing

Track where every skill in your skillbook came from.

Added

Insight source tracing — InsightSource dataclass tracks skill provenance (epoch, sample, trace refs, error identification, learning text)
Sample.id promoted to first-class field with UUID auto-generation
Skillbook query API — source_map(), source_summary(), source_filter() for skill lineage
Insight sources wired through OfflineACE, OnlineACE, and async learning pipelines
UpdateOperation.learning_index for linking operations to reflector learnings
Bedrock e2e example (examples/litellm/bedrock_insight_source_test.py)
docs/INSIGHT_SOURCES.md guide

v0.8.0February 17, 2026

What's New

TAU-bench integration: Full benchmark framework for evaluating agents on TAU-bench tasks
Recursive Reflector: New reflector module with sandbox execution, trace context, and sub-agent support
Skillbook tools: Clean, consolidate, and merge skillbooks via new utility scripts

v0.7.3February 4, 2026

Release v0.7.3

v0.7.2January 26, 2026

What's New

Agentic System Prompting

New workflow to automatically optimize your agent's system prompts using your own data. Feed in past traces or conversations, and ACE analyzes what worked and what failed to generate actionable prompt suggestions.

Traces / Conversations → ACE → Prompt Suggestions

Each suggestion includes the recommended prompt text, justification for why it helps, and evidence from your actual traces. You review and decide what to implement.

See examples/agentic-system-prompting/ for the full workflow.

Other Changes

Fix: Align test matrix with Python 3.12 requirement
Fix: Use setup-uv action for Windows CI compatibility

v0.7.1December 8, 2025

Fix: Forward credentials (api_key, base_url, etc.) to Instructor client (#44)

This patch fixes an issue where custom API credentials weren't being forwarded to all internal LLM calls, causing authentication errors when using OpenAI-compatible endpoints.

v0.7.0December 4, 2025

⚠️ Breaking Changes

Complete terminology rename - Playbook → Skillbook, Bullet → Skill

Old	New
`Playbook`	`Skillbook`
`Bullet`	`Skill`
`Generator`	`Agent`
`Curator`	`SkillManager`
`OfflineAdapter`	`OfflineACE`
`OnlineAdapter`	`OnlineACE`
`DeltaOperation`	`UpdateOperation`
`DeltaBatch`	`UpdateBatch`

Migration:

# Old
from ace import Playbook, Bullet, Generator, Curator, OfflineAdapter

# New
from ace import Skillbook, Skill, Agent, SkillManager, OfflineACE

JSON files: Change "bullets" key to "skills" in saved skillbooks.

Fixed

Deduplication now properly applies consolidation operations

v0.6.0November 29, 2025

Summary

Async learning pipeline with parallel Reflectors, bullet deduplication, and Instructor integration.

🚀 Async Learning

Non-blocking background learning - answers return immediately while learning continues in background threads.

agent.learn(samples, env, async_learning=True, max_reflector_workers=3)

🔍 Bullet Deduplication

Vector embedding-based duplicate detection prevents playbook bloat.

agent = ACELiteLLM(model="gpt-4o-mini", dedup_config=DeduplicationConfig(similarity_threshold=0.80))

📋 Instructor Integration

Robust JSON parsing with Pydantic schema validation and automatic retries.

Other Changes

Reorganized examples by integration type (litellm/, langchain/, local-models/)
Fixed Claude temperature+top_p conflict
Improved Curator prompt for better deduplication and imperative strategy format
Increased default max_tokens from 512 to 2048 to prevent truncation
Added comprehensive test suites (~1600 lines)

Tests

291 passed, 67% coverage

v0.5.1November 25, 2025

Bug Fixes

Fixed Opik integration warnings for base installations
Improved Opik configuration for local usage

v0.5.0November 20, 2025

⚠️ Breaking Changes

Playbook format changed to TOON (Token-Oriented Object Notation)
- Playbook.as_prompt() now returns TOON format instead of markdown
- Reason: 16-62% token savings for improved scalability and reduced inference costs
- Migration: No action needed if using playbook with Generator/Curator/Reflector
- Debugging: Use playbook._as_markdown_debug() or str(playbook) for human-readable output
- Details: Uses tab delimiters and excludes internal metadata (created_at, updated_at)

Added

ACELiteLLM integration - Simple conversational agent with automatic learning
ACELangChain integration - Wrap LangChain Runnables with ACE learning
Custom integration pattern - Wrap ANY agentic system with ACE learning
- Base utilities in ace/integrations/base.py with wrap_playbook_context() helper
- Complete working example in examples/custom_integration_example.py
- Integration Pattern: Inject playbook → Execute agent → Learn from results
Integration exports - Import ACEAgent, ACELiteLLM, ACELangChain from ace package root
TOON compression for playbooks - 16-62% token reduction vs markdown
Citation-based tracking - Strategies cited inline as [section-00001], auto-extracted from reasoning
Enhanced browser traces - Full execution logs (2200+ chars) passed to Reflector
Test coverage - Improved from 28% to 70% (241 tests total)

Changed

Renamed SimpleAgent → ACELiteLLM - Clearer naming for conversational agent integration
Playbook.__str__() returns markdown (TOON reserved for LLM consumption via as_prompt())

Fixed

Browser-use trace integration - Reflector now receives complete execution traces
- Fixed initial query duplication (task appeared in both question and reasoning)
- Fixed missing trace data (reasoning field now contains 2200+ chars vs 154 chars)
- Fixed screenshot attribute bug causing AttributeError on step.state.screenshot
- Fixed invalid bullet ID filtering - hallucinated/malformed citations now filtered out
- Added comprehensive regression tests to catch these issues
- Impact: Reflector can now properly analyze browser agent's thought process
- Test coverage improved: 69% → 79% for browser_use.py
Prompt v2.1 test assertions updated to match current format
All 206 tests now pass (was 189)

v0.4.0November 8, 2025

Changes

Fixed GitHub Actions workflow triggering
Fixed all 46 mypy type checking errors
Improved type annotations across codebase
Python 3.11+ required

v0.3.0October 16, 2025

🚀 Highlights

We're excited to introduce experimental v2 prompts that bring state-of-the-art prompt engineering to ACE! This release adds confidence scoring, domain-specific optimizations, and comprehensive prompt management capabilities.

✨ What's New

Experimental v2 Prompts (Beta)

🎯 Confidence Scoring: Know when your AI is certain vs uncertain
- Bullet-level confidence (how applicable each strategy is)
- Answer-level confidence (overall certainty of the response)
📝 Enhanced Reasoning: 23% more detailed step-by-step explanations
🔧 Domain Optimization: Specialized prompts for math and code generation
✅ Better Structure: Based on analysis of 80+ production AI systems

Prompt Management System

PromptManager class for version control and A/B testing
Easy switching between v1 (stable) and v2 (experimental)
Domain-specific prompt selection
Usage tracking and statistics

Playbook Persistence

Save trained playbooks with playbook.save_to_file("model.json")
Load pre-trained playbooks with Playbook.load_from_file("model.json")
Full JSON serialization support

Documentation & Examples

📚 Comprehensive prompt engineering guide (docs/PROMPT_ENGINEERING.md)
🔬 v1 vs v2 comparison script (examples/compare_v1_v2_prompts.py)
💡 Advanced v2 examples (examples/advanced_prompts_v2.py)
🎨 Mermaid flowchart visualization of ACE learning loop in README

🔄 Changes

Enhanced docstrings with comprehensive examples throughout
Improved README with visual diagrams and v2 prompts section
Code formatting standardized with Black

🐛 Fixes

Fixed Black formatting issues for CI/CD compliance
Corrected README references to non-existent directories
Fixed test badge URL in README

📊 v1 vs v2 Performance

Feature	v1	v2 (Experimental)
Token Usage	Baseline	+30-50% more
Confidence Scoring	❌	✅
Reasoning Detail	Basic	Enhanced (+23%)
Domain Variants	❌	✅ Math, Code

🚀 Quick Start with v2

from ace.prompts_v2 import PromptManager

⚠ Important Notes

v2 prompts are experimental and in active development
They use 30-50% more tokens due to enhanced structure
Test with your use case before production deployment
v1 prompts remain the default for stability

v0.1.1October 15, 2025

Fixed Release - v0.1.1-alpha

This release fixes the GitHub Actions workflow for PyPI publishing.

Changes

Updated artifact upload/download actions from v3 to v4
Fixed deprecation errors preventing package publication

Installation

pip install ace-framework

All features remain the same as v0.1.0. This is a infrastructure fix only.

v0.1.0October 15, 2025

Initial Alpha Release of ACE Framework

This is the first alpha release of the Agentic Context Engine (ACE) framework, a Python implementation based on the paper "Agentic Context Engineering" from Stanford/SambaNova.

Alpha Status

This is an alpha release for early adopters and contributors. The API may change in future releases as we refine the framework based on community feedback.

Features

Self-improving agents that learn from experience
Playbook system for storing and evolving strategies
Three-role architecture: Generator, Reflector, and Curator
100+ LLM providers support via LiteLLM (OpenAI, Anthropic, Google, etc.)
Async support for high-performance applications
Online and offline adaptation modes

Installation

pip install ace-framework

Quick Start

from ace import LiteLLMClient, OfflineAdapter, Playbook

Create your agent

client = LiteLLMClient(model="gpt-3.5-turbo") adapter = OfflineAdapter( playbook=Playbook(), generator=Generator(client), reflector=Reflector(client), curator=Curator(client) )

Notes

Requires Python 3.9+
See README for detailed documentation
Report issues at: https://github.com/Kayba-ai/agentic-context-engine/issues