Skip to main content
FB.
technique ai claude-code workflow engineering git configuration methodology ai-coding

From Afterthought to Infrastructure: How AI Config Evolves in a Real Project

506 commits. 407 files. 8.6% of all repo activity. Here's the git archaeology of how AI configuration evolved over 9 months on a real production codebase, and why the pattern is probably universal.

13 min read Updated Apr 14, 2026

TL;DR

WhatDetails
ProjectMéthode Aristote EdTech platform, 9 months (Aug 2025 → Apr 2026)
Total AI config commits506, 8.6% of all repo commits
Current scale407 files in .claude/, 34 rules, 15 hooks, 36 skills, 57 commands, 17 agents
Distribution23% in first 5 months, 77% in last 4 months
5 phasesAfterthought → Documentation → Infrastructure → Engineering Practice → Compound Engineering
Context DietApr 2026: always-on context 2,518L → 646L (-74%), ai:score 85 → 125/145
The big bangJan 6, 2026: 86 files created in a single day (commit 1421e863)
Key insightYou can’t design your AI configuration on day 1. You discover what it needs by building it.

This isn’t a guide to setting up a perfect CLAUDE.md, it’s a post-mortem of how AI configuration actually evolves when you use it seriously over 9 months on a production codebase.

The data is real: I ran git log --all --format="%ad" --date=format:"%Y-%m" -- CLAUDE.md .claude/ on the repository and categorized 506 commits touching AI configuration files. The pattern surprised me, and so did the final number: 8.6% of all commits in this repository touch AI config. CLAUDE.md alone has been committed 149 times.


The data

Commits AI-config per month (506 total, 9 months)

Aug 2025  ██                                                       5
Sep 2025  ████████                                                24
Oct 2025  ███████████                                             34
Nov 2025  █████████                                               27
Dec 2025  █████████                                               27
Jan 2026  ██████████████████████████████████████████████         138  ← Big Bang (86 files in 1 day)
Feb 2026  ███████████████████████████                             81
Mar 2026  ███████████████████████████████████████                116  ← ACE pipeline
Apr 2026  ██████████████                                          43  (partial)

          |←── 23% (117) ───→|←──────────── 77% (378) ────────────────|
                 Aug–Dec                       Jan–Apr
MonthAI-config commits% of total
Aug 202551.0%
Sep 2025244.7%
Oct 2025346.7%
Nov 2025275.3%
Dec 2025275.3%
Jan 202613827.3%
Feb 20268116.0%
Mar 202611622.9%
Apr 2026438.5% (partial)
Total506100%

Two spikes, not one. January 2026 (month 6) and March 2026 (month 8). Together they represent 50.2% of all AI config activity. Both were triggered by specific structural decisions, not by gradual accumulation.


Phase 1: Config as Afterthought (Aug–Sep, 29 commits)

The first AI config file: August 22, 2025. Commit a48d5017b. CLAUDE.md, 282 lines, included in the very first release.

The content was minimal. Project identity, the T3 stack (Next.js, tRPC, Prisma), basic conventions. A starting point, not a system.

Aug 22 ─── CLAUDE.md (282 lines) ──────── Born: AI config exists
Sep 05 ─── .claude/ directory ──────────── Dedicated structure
Sep 11 ─── .claude/commands/ ───────────── First slash commands

At this stage the mental model was basic: CLAUDE.md as a context file, you write down what the AI needs to know. Useful the way documentation is useful, better than nothing, but not yet a system.

What the AI “knew”: the project stack, the naming conventions, the basic architecture intent. What it didn’t know: the business domain, the specific patterns we’d established, why certain decisions were made, what it should refuse to do.

24 commits in September added the .claude/ directory structure and first slash commands. Basic namespacing: tech:commit, tech:PR, tech:review. Useful shortcuts for repetitive operations.

At this point, AI configuration is something one person maintains informally. It evolves when something is obviously missing. There’s no system, just a file.


Phase 2: Config as Documentation (Oct–Dec, 88 commits)

The shift in Phase 2: the AI starts needing business context, not just technical context.

Oct 15 ─── .claude/agents/ (v0.8.0) ────── First custom agents
Oct 23 ─── knowledge-base.md (v0.10.0) ─── Business rules codified
Nov 04 ─── MCP Serena (v0.14.0) ─────────── Persistent memory
Nov 21 ─── CLAUDE.md -50% tokens ────────── First optimization pass

October 23: doc/knowledge-base.md created. This is when the configuration started encoding business knowledge rather than just technical setup. Session mechanics (supervised vs autonomous, 15-minute tolerance, doublet/triplet offsets). User lifecycle rules. Tutor compensation logic. The glossary of French terms that appeared in the code.

Without this, the AI was technically capable but business-ignorant. It could write a repository method but didn’t know that a “session” in this codebase meant something specific (SUPERVISED: 1h with a tutor, or AUTONOMOUS: 30min solo), with a lifecycle of SCHEDULED → STARTED → COMPLETED.

November 4 (v0.14.0): MCP Serena integration. Persistent memory across sessions. The AI could now remember architectural decisions made in previous sessions without restating them every time.

November 21 (v0.15.6): CLAUDE.md optimized, 50% token reduction. The file had grown organically and accumulated noise. First deliberate compression pass.

By Phase 2 the mental model had shifted: AI configuration as onboarding documentation, the kind you’d write for a new senior hire. Business rules, conventions, architectural decisions, the “why” behind the patterns.


Phase 3: Config as Infrastructure (Jan, 138 commits, 27.3%)

January 2026 is the first turning point. The configuration stops being a file and becomes a system, faster than any planned migration would have allowed.

January 6, 2026. Commit 1421e863. 86 files created in a single day.

Jan 06 ─── 12 agents + 5 hooks + settings.json ─── Big Bang (86 files, 1 day)
Jan 09 ─── .claude/rules/ ──────────────────────── Guardrails formalized (21 files)
Jan 16 ─── grepai MCP ──────────────────────────── Semantic code search
Jan 19 ─── Pre-push security hooks ─────────────── Defense at commit level
Jan 26 ─── Tasks API (450 lines doc) ────────────── Multi-session management
Jan 26 ─── Perplexity MCP, Jam.dev MCP ──────────── Expanded context sources
Jan 29 ─── SonarQube MCP ───────────────────────── Real-time quality analysis
Jan 29 ─── 283 tests added ─────────────────────── TDD enforcement in practice
Jan 29 ─── RTK enforcement hook ─────────────────── Token optimization mandatory

What triggered the explosion: the team was growing. Augustin was joining. The configuration that worked for one developer (me, on macOS, with a specific workflow) needed to work for multiple developers with different setups, different tools, different levels of experience.

One monolithic CLAUDE.md can’t serve that, but a system can.

Skills: 12 agents and the first skills created on January 6, growing to 36 skills today. Loaded on-demand rather than burning context permanently. TDD methodology, security playbooks, database patterns, accessibility rules, all available on trigger and silent otherwise.

Hooks: 5 hooks created on January 6 in a single commit: dangerous-actions-blocker.sh, security-gate.sh, activity-logger.sh, auto-format.sh, notification.sh. Pre-push security checks, token optimization enforcement (RTK mandatory for all CLI operations). These run automatically without requiring the developer to remember to run them.

Rules: First 3 rule files on January 9, growing to 34 today. Guardrails that fire during coding sessions. What the AI must refuse: silent catches, hidden fallbacks, unvalidated nullable access. What’s mandatory: failing test before implementation code (“Write code before the test? Delete it and start over.”).

6 MCP servers integrated in 3 weeks: Serena (persistent memory), grepai (semantic code search), Perplexity (web search with citations), Jam.dev (bug recording), SonarQube (code quality), Postgres read-only (direct production queries for context).

By Phase 3, AI configuration is infrastructure in practice, not metaphor. It has its own PRs, its own review process, its own maintenance burden. You optimize it, test it, measure its impact.


Phase 4: Config as Engineering Practice (Feb, 81 commits, 16.0%)

81 commits in 28 days = 2.9 commits per day on AI configuration alone.

Feb 03 ─── .cursor/ config ──────────────── Cursor support (Augustin)
Feb 05 ─── profiles/ + modules/ YAML ────── Modular system
Feb 09 ─── Zod validation + CI ──────────── Config has tests
Feb 11 ─── Cross-editor sync ────────────── Claude + Cursor synchronized
Feb 13 ─── Memory compression -6.2K tokens ─ Ongoing optimization

February 5 (PR #598): The modular system. Instead of one CLAUDE.md that everyone reads, a generation pipeline:

  • 5 YAML profiles (one per developer)
  • 14 modules (composable content blocks)
  • A TypeScript pipeline that assembles them with Zod validation
  • Generated outputs: CLAUDE.md (703 lines for Florian, Claude Code, all modules) and .cursorrules (289 lines for Augustin, Cursor, minimal modules)

February 9 (PR #614): The generated outputs have their own tests. The pipeline validates that no placeholder remains unresolved. The CI catches configuration regressions.

The AI configuration now has the properties we expect from production code:

  • Version controlled (source files, not generated outputs)
  • Validated (Zod schema with 7 fields, 5 valid tone values, 6 valid feature modules)
  • Tested (pipeline tests, CI checks)
  • Reviewed (PRs for configuration changes, same process as feature PRs)
  • Documented (450 lines of documentation for the Tasks API alone)

Phase 5: Config as Compound Engineering (Mar, 116 commits, 22.9%)

March 2026 produced more AI config commits than January, which wasn’t in any plan.

Mar 04 ─── ACE pipeline + 12 ADRs ──────── Commit 3fc8c14f (43 files)
Mar 04 ─── Compound engineering patterns ── Architecture decisions codified
Mar ────── multi-agent-coordination.md ──── Agent orchestration rules
Mar ────── research-output.md ───────────── Structured research protocol
Mar ────── retex-review.md ──────────────── Post-task retrospective system

Commit 3fc8c14f: ACE pipeline complet, 12 Architecture Decision Records, compound engineering patterns, 43 files in one commit. The configuration had matured to the point where it could start encoding how to evolve itself.

The 5th phase isn’t just “more rules”. It’s the configuration capturing meta-patterns: how to coordinate agents, how to structure research before implementation, how to extract learnings after each task. The system started codifying its own methodology.


April 2026: The Context Diet

The next inflection wasn’t a spike in commits. It was a deliberate reduction.

By April 2026, the configuration had grown to 23 always-on rules loaded into every Claude session, regardless of what you were working on. 2,518 lines of guardrails firing unconditionally, whether you were touching a React component or a Prisma migration. The system worked, but it was eating 14% of a 200K context window before any code was read.

The branch fix-improve-context ran in 5 phases over a single week.

Phase 0 — Baseline. Built a scoring script (scripts/ai/score-ai-context.ts) before touching anything, so “better” would mean something measurable. 85/100, grade A. That’s the number you’re optimizing from, not a vague intuition.

Phase 1 — Triage. Classified every rule against a single test: does Claude need this without being asked? Three categories emerged: CONSTRAINT (always-on, non-negotiable), PROCEDURE (step-by-step workflows, load on demand), HYBRID (short directive + long protocol). Of 23 always-on rules, 13 were procedures or hybrids masquerading as constraints.

Phase 2 — Extraction. 9 procedural rules converted to skills. code-duplication.md became /tech:dupes. defensive-code-audit.md became /tech:audit. implementation-checklist.md became /tech:checklist. The always-on version was replaced by a 5-15 line stub: the directive stayed, the protocol moved to on-demand. Each extracted skill got a smart-suggest.sh pattern so Claude surfaces it automatically when relevant context appears, without loading it permanently.

Phase 3 — Compression. Rules that were genuinely always-on but verbose got trimmed. rtk-enforcement.md deleted entirely because RTK is already in the global CLAUDE.md, making the rule redundant. debugging-methodology.md reduced from 112L to 43L, three more rules trimmed alongside it. Net result: 2,518L to 646L always-on, a 74% reduction.

Phase 3+ — Cursor parity. The .cursorrules file was a monolith at 703L. Converted to a 133L stub (metadata + pointers) with 23 .cursor/rules/*.mdc path-scoped rules carrying the actual content. Cursor now loads rules only when file paths match, the same economy applied to Claude.

Phase 4 — Machine-readable index. Three new generated files in machine-readable/:

  • ai-config.yaml (~270L): structured index of all 27 rules, 36 skills, 59 commands, 16 agents, 15 hooks, module list, profile list. One @ reference answers “what’s available?” in a new session without grep.
  • llms.txt (~50L): standard llms.txt format for LLM crawlers and context injection.
  • llms-full.txt (~4,500L): full content concat of all rules, modules, and skeletons, for offline/no-tools fallback.

All three generated by pnpm ai:sync. Three new canary checks (C18-C20) were added, worth +10pts to the quality scorer.

The final numbers:

Always-on context:   2,518L → 646L       (-74%)
.cursorrules:          703L → 133L        (-81%)
ai:score:              85/100 → 125/145   (+B grade, +10pts machine-readable)
Canary checks:         17/17  → 20/20

The pattern is the same one that shaped the January and March spikes: friction made the cost visible, measurement made the improvement verifiable, and the tooling (scoring script, canary checks, generated index) prevents the next drift from going undetected. What the scoring script adds isn’t just a better grade, it’s the maintenance loop that catches accumulation before it compounds into another cleanup sprint.


Why two spikes, not one gradual curve

The curve isn’t a planning failure, it’s the natural progression of building a system you’re using while building it.

You can’t design your AI configuration on day 1, you discover what it needs to be by running into the friction.

The January spike was team-driven: the rules that prevent silent catches emerged after seeing AI-generated code with silent catches. The TDD enforcement rule emerged after a period of shipping code without tests. The profile system emerged when a second developer with a different setup needed the same configuration that had been tuned for one person. The guardrails encode lessons learned, and you can’t write those down until you’ve actually run into the problem they’re solving.

The March spike was scale-driven: not new people joining, but the system itself becoming complex enough to require architectural governance. The 12 ADRs codified decisions that had been made implicitly over months. Compound engineering patterns emerged from observing what worked across 1,100 commits. The configuration caught up with the maturity of the codebase.

The investment curve also reflects team size. Phase 1-2 configuration works for a solo developer, Phase 3-4 is what you need when someone else joins and the monolithic setup doesn’t fit their machine or context. The shift to collaborative development forced the systematization that solo use never demanded.


What to take from this

Don’t try to build the Phase 3 system on day 1. You’ll build abstractions for problems you don’t have yet. Start with a CLAUDE.md that reflects what you actually know about your project: stack, conventions, the business domain you’ve figured out so far.

The configuration is a living artifact, treat it like the codebase. When something breaks because the AI made a predictable mistake, add a rule. When a team member joins, think about what their configuration needs to be different. When the context file bloats, compress it.

Measure it. Token counts matter (713 vs 289 lines is a real cost difference), hook execution rate matters, and how often your rules actually fire tells you whether they’re pulling any weight. You can’t optimize what you don’t track.

The 77% isn’t a warning sign, it’s evidence that the team invested when the complexity actually demanded it, when the project and team size required a real system rather than an informal file.

The real number is this: 8.6% of all commits in this repository touch AI configuration. On a 5,820-commit project, that’s 506 commits dedicated to improving how the AI works alongside the humans. More than most business logic files received. If that feels like a lot, consider that the alternative is a stale CLAUDE.md and an AI that drifts from the codebase it’s supposed to understand.

The question isn’t whether to invest in AI configuration but when, and this dataset gives a concrete answer: later than any upfront plan would suggest, and then much faster than expected once the team’s complexity makes it unavoidable.


Beyond the two-person team

This dataset is one project, two developers, one stack. The friction that shaped the 506 commits was scoped to that size: a second dev joining, one monorepo to cover, one set of conventions to encode.

At organizational scale, the friction shifts shape. Multi-repo means the same rule gets duplicated across N CLAUDE.md files and drifts the moment someone updates one without the others. Multi-stack means a rule about TypeScript Result types doesn’t translate to a Rust repo with anyhow::Result. Multi-team means profile proliferation: not 5 YAML profiles but 50, with legitimate reasons behind most of the variance.

The pattern likely to emerge: a dedicated AI config repository as the source of truth, consumed by downstream projects. A few shapes this can take:

  • Git submodule: the config repo is pinned at a commit, reproducible across machines, stack-agnostic. The cost is submodule UX, which is real but manageable with a wrapper script.
  • Published package (npm, cargo, pip): versioned with semver, distributed through existing release workflows. Works well when your org is stack-homogeneous, less well when a TypeScript team and a Rust team both need the same base rules.
  • Shared config CLI that fetches + assembles (think Terraform modules or a dotfiles manager): purpose-built, cross-stack, but it’s another tool to maintain.

Whichever shape, the mental model is the same: org modules + team modules + project modules, composed by a pipeline. The modular system that emerged at Phase 4 for one project is the same shape, scaled up one level.

I haven’t run this setup. The projection is based on the principle that made Phases 3-4 work: systematize when collaboration demands it, not before. At org scale, collaboration demands it across repo boundaries. The investment curve probably mirrors what happened here, a long slow accumulation followed by a spike when a second repo needs the same rules that had been tuned for one.

If you’re in that situation, the prediction is concrete: you’ll resist the central repo for months because your team-level configs feel fine, and then you’ll build it in two weeks because version drift across four repos became impossible to manage.

Single project (this article)        → CLAUDE.md + .claude/ scoped to one repo
Multi-repo org (logical next step)    → Shared AI config repo + per-project override

The modular AI instruction system described here (YAML profiles, modules, generation pipeline with Zod validation) is documented in the Claude Code Ultimate Guide. Source data from the Méthode Aristote repository git history (5,820 commits, v0.1.0 → current). The archaeology command: git log --all --format="%ad" --date=format:"%Y-%m" -- CLAUDE.md .claude/ | sort | uniq -c.