All articles

A Self-Evolving AI Workflow

4 min read
claude-codeevolveskillsautomation

Every Claude Code session starts from scratch. The model has no memory of the patterns you've established, the shortcuts you've taught it, or the conventions your codebase uses. You're constantly re-explaining yourself.

The standard answer is a CLAUDE.md file — project-level context that loads every session. That helps. But it's static. You write it once, it drifts from reality, you forget to update it. It's documentation, not memory.

I wanted something that accumulates. A way to capture behavioral patterns from real sessions and promote them into structure that persists. The result is a two-command loop: /learn and /evolve.

/learn — Capturing Instincts

/learn runs after you solve a non-trivial problem. It looks at what just happened, extracts the generalizable pattern, and saves it as an instinct file:

~/.claude/instincts/[pattern-name].md

Each file includes YAML frontmatter — a domain, a trigger describing when the pattern applies, tags, and an observation count that increments if the same pattern is extracted again. They accumulate across sessions. I think of them as instincts: not rules, not commands, just lightweight context that shapes behavior when it's relevant.

After a few weeks of use, I had dozens of these. Patterns for how I handle database migrations, how I structure TypeScript modules, when to reach for a sub-agent instead of a long prompt. Each one captured in the moment it was useful, when the reasoning was fresh.

The problem: dozens of small files are nearly as hard to work with as no files at all.

/evolve — Clustering Into Structure

/evolve reads the learned instinct files and groups them by semantic similarity — domain overlap, trigger patterns, action sequences. Clusters with three or more related instincts are candidates for promotion.

Each cluster gets classified into one of three types:

TypeWhen it applies
Skill — a knowledge doc loaded on demandInstincts describe auto-triggered behaviors
Command — a user-invocable slash commandInstincts describe actions the user initiates
Agent — an autonomous sub-agentInstincts describe complex multi-step processes

By default, /evolve previews what it would generate:

🧬 Evolve Analysis
==================

Found 3 clusters ready for evolution:

## Cluster 1: Database Migration Workflow
Instincts: new-table-migration, update-schema, regenerate-types
Type: Command
Confidence: 85% (based on 12 observations)

Would create: /new-table command
File: ~/.claude/homunculus/evolved/commands/new-table.md

Run `/evolve --execute` to create these files.

--execute writes the files. Each generated structure includes an evolved_from: field linking back to the source instincts — a full audit trail from raw observation to abstraction.

What This Actually Is

A few things worth being direct about.

/evolve is a Claude Code command backed by a Python CLI script. The command file tells Claude to invoke instinct-cli.py, which parses the YAML frontmatter from each instinct, groups by domain, scores trigger similarity using difflib.SequenceMatcher, and classifies clusters as commands, skills, or agents. No API calls — the clustering is purely mechanical, based on string similarity and heuristics.

Both commands are fully operational. /learn saves instinct files to disk. /evolve reads, clusters, and generates evolved structures. The output files land in a staging area (~/.claude/homunculus/evolved/) for review before promotion.

The system is less impressive than it sounds and more useful than it looks. You don't notice it working until you run /evolve after a few weeks and see that three slash commands you've been meaning to write are already half-written, clustered from instincts you saved in passing.

The Design Pattern

What I find durable about this approach is the layering.

Instincts are cheap to create. After solving a problem, running /learn takes 30 seconds. The bar is low enough that you actually do it.

Clustering is where value concentrates. Individual instincts are too granular to act on. Clusters reveal patterns you didn't consciously notice — that five separate instincts about error handling in async code all share the same trigger context, and together they're a skill worth loading.

Promotion is conservative. The three-instinct threshold means nothing gets promoted from a single observation. You need repeated evidence before the system structures it. This matters more than it seems — AI-generated structure that isn't grounded in actual usage is just noise.

The result is institutional memory that reflects how you actually work, built from evidence rather than intention. It won't replace a good CLAUDE.md. But it handles the layer below that — the tacit knowledge that never makes it into documentation because it's too small, too specific, or too hard to articulate until you've done it a dozen times.