Skip to content

09 · The loop — observe and learn

← 08 Step 6 Verify · Contents · Next: 10 Setup and stages →

Purpose: release the verified change, watch how it behaves in reality, and turn what you learn into the next specification. Produces: a running feature, observations, and the next SPEC.md delta.


The flow is a loop, not a line

Older mental models end at "ship." That framing is the source of a common pathology: teams treat release as a finish line, and so they hide defects to protect the line rather than manage them in the open. In AIDD, release is not the end of the flow — it is the point where the most reliable information about the feature finally becomes available: how it behaves with real users, real data, and real load.

That information is the input to the next cycle. What you learn in production becomes the next specification, and the flow returns to Step 1. The cycle is continuous.

Release deliberately

Release behind a mechanism that limits the scope of impact of a mistake — a feature flag, a gradual rollout, or both. The verification step established that the feature is correct against everything you anticipated; a controlled release is your protection against what you did not anticipate. If something is wrong, you want to affect a few users and roll back, not affect everyone and scramble.

Reuse the scenarios as monitors

The scenarios from Step 2 have a second life here. They described the behavior you expected; in production they become the behavior you monitor. The same definition of "correct" that drove the tests now drives the alerts.

What to watch (▶ example):

  • the overall transfer error rate;
  • the rate of each individual rejection (amount_invalid, same_account, insufficient_funds, forbidden) — a sudden spike in one is a signal, not noise;
  • latency, especially of the atomic balance update under load.

Turn observation into the next spec

Every defect, surprise, or new need is written up as a change to the specification — a delta that re-enters the flow at Step 1. An error rate that is too high, a rejection that fires more than expected, a user behavior nobody designed for: each becomes a concrete, specified next step rather than a vague intention.

This is also where the AI returns to a useful role: summarizing telemetry, clustering errors into themes, and drafting the proposed spec delta for a person to review. But the production decisions — what to roll back, what to prioritize — remain human.

Lessons learned and the retrospective consolidation

A spec delta feeds the next feature. But a loop also teaches the method itself — that the domain model missed a boundary, that a whole class of scenario was never tested, that a build convention helped or hurt. AIDD captures those as lessons learned: a single tagged learning, written in the Observe step, marking which of the five competencies it sharpens.

tag competency a delta here means you learned something about…
DDD Domain the domain model — an entity, rule, or boundary the spec assumed wrong
SDD Spec what the feature must do or reject — a missing or wrong requirement
UDD UI/UX the user-facing shape — a flow, affordance, or wording that misled
TDD Test how we prove correctness — a missing scenario, a flaky or hollow test
ADD AI/build how the AI builds — a harness, prompt, or convention that helped or hurt

Each delta is one tagged entry — - [COMPETENCY · status] the learning (evidence: a pointer) — and the evidence is required: a failing scenario, a production signal, a review note. No evidence means it is an opinion, not a delta. The AI emits deltas as open; it never consolidates its own. Consolidation is judgment, and judgment is the human's — the same verify/observe decision point that keeps the AI from grading its own work.

The ship review. Before the consolidation, once a milestone's goal is met, the AI fills the milestone's ## Close — ship review — the whole-milestone, cross-task evidence a person reads before checking the exit-criteria boxes (it is evidence, not a new gate). It has three parts: ship by domain — what changed in each bounded context (tooling · skill · book); cross-task evidence — one row per task (its gate, its tests, any residue); and a goal-met map — each exit criterion tied to the evidence that satisfies it. The AI then defines the milestone's release steps: the ordered, tool-agnostic hints to ship the milestone, of which merge is one small step (a pull request, an exported hand-off document, a tag or a publish are others). Those steps feed the release scope, not a second flow. The ritual itself lives in the add skill's loop.md; this chapter only names it.

The consolidation. At milestone close (or on demand, when open deltas pile up), a person runs the retrospective consolidation: gather every open delta across the milestone's tasks, group them by competency, propose the exact foundation edit for each, confirm with the human one by one, then write — append-only, newest-first (the newest record prepended at the top) — flipping each delta to folded (merged) or rejected (considered and deliberately not merged, left in place so the trail survives), and bumping the foundation-version: marker. DDD/SDD/UDD deltas consolidate into the matching section of PROJECT.md; TDD/ADD consolidate into CONVENTIONS.md (they sharpen the engine, not the product); and every consolidation also prepends one row at the top of PROJECT.md §Key Decisions — the universal, auditable record of what the foundation learned.

Tooling. add.py deltas lists every open delta across the project (so nothing waiting to be consolidated is invisible); add.py check lints each delta's well-formedness — known competency tag, valid status, non-empty evidence. add.py fold [--task <slug>] [--comp <TAG>] mechanizes the consolidation — flip + transcribe + route + bump, validate-all-then-write so a reject leaves the tree byte-unchanged; running it is the human's confirmation, and the engine never self-approves which lessons to keep.

Foundation compaction. The consolidation prepends new learnings; foundation compaction is the separate, later step that shrinks the result. At milestone close (or on demand, once a foundation spec grows past one screen), the AI proposes collapsing the stable, shipped, zero-residue tail of each foundation spec into ONE per-spec rolled-up settled line at the bottom — and the human confirms, one line at a time, exactly as with the consolidation. It never deletes: a settled line summarizes the prose and keeps a see git pointer, so the record is lossy on wording but lossless on traceability, and any OPEN residue always stays live. Each spec collapses in its own per-spec shapePROJECT.md §Spec version bullets, §Key-Decisions rows, CONVENTIONS.md learnings, the GLOSSARY.md definition, the MODEL_REGISTRY.md rows — under one shared eligibility rule. Because every append-only sequence is newest-first (newest record on top), compaction collapses upward from the bottom: the settled line anchors at the tail (the oldest end). Like the consolidation it is convention-guided — there is deliberately no add.py command for it (read compact-foundation.md) — and it is distinct from the engine's add.py compact <slug>, which archives a finished milestone's files rather than shrinking a living spec.

Re-entrancy: the loop is the whole point

Two principles converge here. The flow is re-entrant — any step can send you back to an earlier one — and the flow is a loop — production feeds the next specification. Together they mean the artifacts you built are never "finished"; they are living documents that the next cycle refines.

A team operating this way does not experience requirements changing as a failure of planning. It experiences it as the system working: reality is teaching the specification, and the specification is teaching the next build.

The milestone holds until its goal is met

A single feature loops through Observe back to Specify; a milestone has the same shape at a larger scale, and a gate to match. A milestone is not finished when its tasks are done — it is finished when its goal is met, expressed as the exit criteria in MILESTONE.md. So add.py milestone-done is goal-gated: it refuses to close a milestone while any exit criterion is still unchecked, and holds until every box is checked. Those checkboxes are the human's affirmation that the goal is genuinely met — the engine reads the tally, it never judges the goal itself. (A milestone with no exit criteria closes as before; milestone-done is the only path to done, and archiving refuses anything not yet done — so the one gate cannot be slipped.)

While the milestone is held open, the work each task leaves behind — open lessons, and items discovered but out of scope — becomes its next tasks: the AI proposes them, the human confirms, and the loop continues until the goal is reached. The milestone is the loop made concrete; the exit criteria are its finish line.

Do: release small, watch the scenarios, and feed every learning back into the spec. Don't: treat shipping as the end. The most valuable information about a feature arrives after it ships.