Changelog
Track the latest updates, improvements, and fixes to the Kayba framework.
Added
- Usage metering hook —
RecursiveConfig.usage_callback: (RequestUsage, model_id) -> Nonefires once per pydantic-ai model request (orchestrator turns, sub-agent runs, tool-call follow-ups). Implemented viaace.rr.MeteredModel, apydantic_ai.models.wrapper.WrapperModelsubclass, so metering lives at the framework's own model boundary — one firing site, no per-call-site plumbing. Callback exceptions are caught and logged so metering never crashes the pipeline. - Pre-built model instance support —
RRStep,create_rr_agent,create_sub_agent, andRecursiveConfig.subagent_modelnow accept either a model-id string or a pre-builtpydantic_ai.models.Modelinstance. Enables callers that need a custom provider (e.g. a Bedrock model carrying STS-assumed credentials) to inject a fully-configured model rather than resolving from a string. - Sub-agent
model_settings—create_sub_agentnow threads an explicitModelSettingsparameter into itsPydanticAgentconstructor.
Back-compat
Existing RRStep(model="...") callers are unchanged. The widened type signatures are additive.
TypeScript SDK
What's new
- TypeScript tracing SDK (
@kayba_ai/tracing) — instrument Node.js agents and send traces to Kayba, mirroring the Python SDK API - Standalone Python tracing package (
kayba-tracing) — can be installed independently without the full ace-framework ace.tracingcontinues to work as before (re-exports from kayba-tracing)- CI publishes all three packages (ace-framework, kayba-tracing, @kayba_ai/tracing) on release
- Kayba tracing SDK —
ace.tracingmodule wraps MLflow tracing with Kayba-native configuration, folder organization, and input sanitization (pip install ace-framework[tracing])
- Structured design docs — split ACE_DESIGN.md into architecture, reference, and decisions docs under docs/design/
- Simplified Skill model — removed unused tag counters (helpful/harmful/neutral) and TagStep from the pipeline
- Cleaner InsightSource provenance — restored error_identification and learning_text fields
What's changed
Added
- Insight source provenance —
InsightSourcetyped model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata);AttachInsightSourcesStepautomatically enrichesUpdateBatchoperations with provenance and is wired into the default learning tail - Claude SDK step —
ClaudeSDKStepintegration for running Claude Code sub-agents from within ACE pipelines - RR sub-agent code execution — Recursive Reflector can now delegate to code-execution sub-agents at runtime
- RR raw trace batch helpers —
build_raw_trace_batchesand related runtime utilities for feeding raw traces directly into the RR pipeline
Fixed
- Logfire scrubbing — added scrubbing callback to stop Logfire over-redacting trace content (reasoning, answers, messages now visible in Logfire UI)
- RR combined-batch normalization — fixed ordering/deduplication of combined task batches in multi-sample runs
Docs
- Logfire query API guide clarifications
- MCP client setup guide and compatibility tests
- Design docs updated to reflect insight source provenance model
Fixed
- CLI packaging — include .md data files in wheel so
kayba setupand skill install work on pip/uv-installed packages
Added
- PydanticAI migration — ACE roles (Agent, Reflector, SkillManager) rebuilt on PydanticAI agents with structured output, replacing the legacy role system
- Recursive Reflector — PydanticAI-powered trace analysis agent with sandboxed code execution, sub-agent delegation, and working memory (
save_notestool) - Kayba CLI — full hosted API client with trace upload/management, interactive run, insights, prompts, batch processing, materialization, and integration commands (
kaybaentry point)
Thread CancellationToken through TraceAnalyser.run() for pipeline cancellation support.
Introducing the Kayba CLI: automated agent self-improvement from your terminal
We built a CLI that plugs into Claude Code, Codex, or any coding agent and turns your agent's execution traces into improvements.
Upload traces → Kayba surfaces failure patterns → your coding agent proposes edits to your codebase. Pick what makes sense, implement, and repeat.
First test on tau2-bench: 34.3% improvement after a single cycle auto-accepting all changes.
🚀 Try it free
7-day free trial (no credit card required) at kayba.ai:
- Automated agent self-improvement
- CLI for Claude Code, Codex & more
- Hosted dashboard & analytics
- Team collaboration
The core engine (ACE) stays open source and MIT licensed. Run kayba setup to get started.
Added
- Pipeline hooks & cancellation —
PipelineHookprotocol andCancellationTokenfor observing and controlling pipeline execution - Kayba pipeline skills for Claude Code — 7-stage dynamic evaluation pipeline that generates custom benchmarks tailored to your agent's domain. Instead of static test suites, the skills analyze your API, build domain-aware metrics and rubrics, create action plans, and run human-in-the-loop validation — all as composable Claude Code skills
kayba setupcommand — one command to install the full evaluation skill pipeline into your.claude/skills/directory, ready to use inside Claude Code out of the box
- Improved Opik trace naming — traces now display the question text (first 80 chars) instead of generic names like "ace_pipeline" or "rr_reflect"
- Thread ID support for Opik —
OpikStepandRROpikStepaccept an optionalthread_idparameter for grouping related traces
What's New
- Kayba CLI — New
kaybaCLI for the hosted API with commands:upload,insights generate/list/triage,prompts generate/list/pull,status,materialize,batch,setup - HTTP client —
KaybaClientwith Bearer auth for the Kayba hosted API - Agent integration —
kayba setupprints/appends coding agent instructions (CLAUDE.md, AGENTS.md, .cursorrules)
Full Changelog
https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.5...v0.8.6
- Self-contained RR module (
ace_next/rr/) — sandbox, subagent, trace_context, config extracted fromace/reflector/ - v5.6 prompt promoted as default — prompt evolution (v4 → v5.1–v5.6) for the RR pipeline
build_steps()API — all runners gain abuild_steps()classmethod for pipeline customization- Shared
CallBudget— single budget instance shared across RR pipeline steps - ACE MCP server (optional) — stdio MCP server with tools:
ace.ask,ace.learn.sample,ace.learn.feedback,ace.skillbook.get/save/load - MCP packaging + CLI — optional
mcpextra andace-mcpentrypoint - Composing pipelines guide — new
docs/guides/composing-pipelines.md - RR examples —
rr_demo.py,rr_opik_demo.py,compose_custom_pipeline.py - Opik made opt-in — moved from hard dependency to
observabilityextra
What's New
- OpenClaw integration — learn from OpenClaw session transcripts (JSONL) via new
OpenClawToTraceStepandLoadTracesSteppipeline steps (#86) - ExportSkillbookMarkdownStep — export skillbook to markdown file
- OpenClaw example script and integration docs
- Pipeline engine — generic pipeline framework with branching, async boundaries, and parallel execution (#78)
- Trace passthrough —
_build_traces()helper and raw trace data passed to RecursiveReflector sandbox
- RecursiveReflector None-response guard — gracefully handles empty/None LLM responses (e.g. from Gemini) with retry prompt instead of crashing
LiteLLMClient.complete_messages()— native multi-turn completion that preserves structured message lists
Insight Source Tracing
Track where every skill in your skillbook came from.
Added
- Insight source tracing —
InsightSourcedataclass tracks skill provenance (epoch, sample, trace refs, error identification, learning text) - Sample.id promoted to first-class field with UUID auto-generation
- Skillbook query API —
source_map(),source_summary(),source_filter()for skill lineage - Insight sources wired through
OfflineACE,OnlineACE, and async learning pipelines UpdateOperation.learning_indexfor linking operations to reflector learnings- Bedrock e2e example (
examples/litellm/bedrock_insight_source_test.py) docs/INSIGHT_SOURCES.mdguide
What's New
- TAU-bench integration: Full benchmark framework for evaluating agents on TAU-bench tasks
- Recursive Reflector: New reflector module with sandbox execution, trace context, and sub-agent support
- Skillbook tools: Clean, consolidate, and merge skillbooks via new utility scripts
Release v0.7.3
What's New
Agentic System Prompting
New workflow to automatically optimize your agent's system prompts using your own data. Feed in past traces or conversations, and ACE analyzes what worked and what failed to generate actionable prompt suggestions.
Traces / Conversations → ACE → Prompt Suggestions
Each suggestion includes the recommended prompt text, justification for why it helps, and evidence from your actual traces. You review and decide what to implement.
See examples/agentic-system-prompting/ for the full workflow.
Other Changes
- Fix: Align test matrix with Python 3.12 requirement
- Fix: Use setup-uv action for Windows CI compatibility
Fix: Forward credentials (api_key, base_url, etc.) to Instructor client (#44)
This patch fixes an issue where custom API credentials weren't being forwarded to all internal LLM calls, causing authentication errors when using OpenAI-compatible endpoints.
⚠️ Breaking Changes
Complete terminology rename - Playbook → Skillbook, Bullet → Skill
| Old | New |
|---|---|
Playbook | Skillbook |
Bullet | Skill |
Generator | Agent |
Curator | SkillManager |
OfflineAdapter | OfflineACE |
OnlineAdapter | OnlineACE |
DeltaOperation | UpdateOperation |
DeltaBatch | UpdateBatch |
Migration:
# Old
from ace import Playbook, Bullet, Generator, Curator, OfflineAdapter
# New
from ace import Skillbook, Skill, Agent, SkillManager, OfflineACE
JSON files: Change "bullets" key to "skills" in saved skillbooks.
Fixed
- Deduplication now properly applies consolidation operations
Summary
Async learning pipeline with parallel Reflectors, bullet deduplication, and Instructor integration.
🚀 Async Learning
Non-blocking background learning - answers return immediately while learning continues in background threads.
agent.learn(samples, env, async_learning=True, max_reflector_workers=3)
🔍 Bullet Deduplication
Vector embedding-based duplicate detection prevents playbook bloat.
agent = ACELiteLLM(model="gpt-4o-mini", dedup_config=DeduplicationConfig(similarity_threshold=0.80))
📋 Instructor Integration
Robust JSON parsing with Pydantic schema validation and automatic retries.
Other Changes
- Reorganized examples by integration type (litellm/, langchain/, local-models/)
- Fixed Claude temperature+top_p conflict
- Improved Curator prompt for better deduplication and imperative strategy format
- Increased default max_tokens from 512 to 2048 to prevent truncation
- Added comprehensive test suites (~1600 lines)
Tests
291 passed, 67% coverage
Bug Fixes
- Fixed Opik integration warnings for base installations
- Improved Opik configuration for local usage
⚠️ Breaking Changes
- Playbook format changed to TOON (Token-Oriented Object Notation)
Playbook.as_prompt()now returns TOON format instead of markdown- Reason: 16-62% token savings for improved scalability and reduced inference costs
- Migration: No action needed if using playbook with Generator/Curator/Reflector
- Debugging: Use
playbook._as_markdown_debug()orstr(playbook)for human-readable output - Details: Uses tab delimiters and excludes internal metadata (created_at, updated_at)
Added
- ACELiteLLM integration - Simple conversational agent with automatic learning
- ACELangChain integration - Wrap LangChain Runnables with ACE learning
- Custom integration pattern - Wrap ANY agentic system with ACE learning
- Base utilities in
ace/integrations/base.pywithwrap_playbook_context()helper - Complete working example in
examples/custom_integration_example.py - Integration Pattern: Inject playbook → Execute agent → Learn from results
- Base utilities in
- Integration exports - Import ACEAgent, ACELiteLLM, ACELangChain from
acepackage root - TOON compression for playbooks - 16-62% token reduction vs markdown
- Citation-based tracking - Strategies cited inline as
[section-00001], auto-extracted from reasoning - Enhanced browser traces - Full execution logs (2200+ chars) passed to Reflector
- Test coverage - Improved from 28% to 70% (241 tests total)
Changed
- Renamed SimpleAgent → ACELiteLLM - Clearer naming for conversational agent integration
Playbook.__str__()returns markdown (TOON reserved for LLM consumption viaas_prompt())
Fixed
- Browser-use trace integration - Reflector now receives complete execution traces
- Fixed initial query duplication (task appeared in both question and reasoning)
- Fixed missing trace data (reasoning field now contains 2200+ chars vs 154 chars)
- Fixed screenshot attribute bug causing AttributeError on step.state.screenshot
- Fixed invalid bullet ID filtering - hallucinated/malformed citations now filtered out
- Added comprehensive regression tests to catch these issues
- Impact: Reflector can now properly analyze browser agent's thought process
- Test coverage improved: 69% → 79% for browser_use.py
- Prompt v2.1 test assertions updated to match current format
- All 206 tests now pass (was 189)
Changes
- Fixed GitHub Actions workflow triggering
- Fixed all 46 mypy type checking errors
- Improved type annotations across codebase
- Python 3.11+ required
🚀 Highlights
We're excited to introduce experimental v2 prompts that bring state-of-the-art prompt engineering to ACE! This release adds confidence scoring, domain-specific optimizations, and comprehensive prompt management capabilities.
✨ What's New
Experimental v2 Prompts (Beta)
- 🎯 Confidence Scoring: Know when your AI is certain vs uncertain
- Bullet-level confidence (how applicable each strategy is)
- Answer-level confidence (overall certainty of the response)
- 📝 Enhanced Reasoning: 23% more detailed step-by-step explanations
- 🔧 Domain Optimization: Specialized prompts for math and code generation
- ✅ Better Structure: Based on analysis of 80+ production AI systems
Prompt Management System
- PromptManager class for version control and A/B testing
- Easy switching between v1 (stable) and v2 (experimental)
- Domain-specific prompt selection
- Usage tracking and statistics
Playbook Persistence
- Save trained playbooks with playbook.save_to_file("model.json")
- Load pre-trained playbooks with Playbook.load_from_file("model.json")
- Full JSON serialization support
Documentation & Examples
- 📚 Comprehensive prompt engineering guide (docs/PROMPT_ENGINEERING.md)
- 🔬 v1 vs v2 comparison script (examples/compare_v1_v2_prompts.py)
- 💡 Advanced v2 examples (examples/advanced_prompts_v2.py)
- 🎨 Mermaid flowchart visualization of ACE learning loop in README
🔄 Changes
- Enhanced docstrings with comprehensive examples throughout
- Improved README with visual diagrams and v2 prompts section
- Code formatting standardized with Black
🐛 Fixes
- Fixed Black formatting issues for CI/CD compliance
- Corrected README references to non-existent directories
- Fixed test badge URL in README
📊 v1 vs v2 Performance
| Feature | v1 | v2 (Experimental) |
|---|---|---|
| Token Usage | Baseline | +30-50% more |
| Confidence Scoring | ❌ | ✅ |
| Reasoning Detail | Basic | Enhanced (+23%) |
| Domain Variants | ❌ | ✅ Math, Code |
🚀 Quick Start with v2
from ace.prompts_v2 import PromptManager
⚠ Important Notes
- v2 prompts are experimental and in active development
- They use 30-50% more tokens due to enhanced structure
- Test with your use case before production deployment
- v1 prompts remain the default for stability
Fixed Release - v0.1.1-alpha
This release fixes the GitHub Actions workflow for PyPI publishing.
Changes
- Updated artifact upload/download actions from v3 to v4
- Fixed deprecation errors preventing package publication
Installation
pip install ace-framework
All features remain the same as v0.1.0. This is a infrastructure fix only.
Initial Alpha Release of ACE Framework
This is the first alpha release of the Agentic Context Engine (ACE) framework, a Python implementation based on the paper "Agentic Context Engineering" from Stanford/SambaNova.
Alpha Status
This is an alpha release for early adopters and contributors. The API may change in future releases as we refine the framework based on community feedback.
Features
- Self-improving agents that learn from experience
- Playbook system for storing and evolving strategies
- Three-role architecture: Generator, Reflector, and Curator
- 100+ LLM providers support via LiteLLM (OpenAI, Anthropic, Google, etc.)
- Async support for high-performance applications
- Online and offline adaptation modes
Installation
pip install ace-framework
Quick Start
from ace import LiteLLMClient, OfflineAdapter, Playbook
Create your agent
client = LiteLLMClient(model="gpt-3.5-turbo") adapter = OfflineAdapter( playbook=Playbook(), generator=Generator(client), reflector=Reflector(client), curator=Curator(client) )
Notes
- Requires Python 3.9+
- See README for detailed documentation
- Report issues at: https://github.com/Kayba-ai/agentic-context-engine/issues