HITL.md – Context engineering for the human in the loop

I’m seeing this discussion flow by about cognitive debt and losing the plot when working with highly autonomous coding agents.

I’ve been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I’ve found myself getting lost in my own projects.

– Simon Willison

…the theory of the system, their shared understanding, had fragmented or disappeared entirely. They had accumulated cognitive debt faster than technical debt, and it paralyzed them.

– Margaret-Anne Storey

I’m feeling similarly lost.

The last few weeks I’ve been deep in a multi week brownfield migration + simultaneous refactor + need to keep everything running for users while working behind a feature flag. It’s big. It’s cross repo. It’s tangled in microservices. It’s live. And I cannot wrap it all up in a neat little sandbox and let the agents go yolo.

I tried various harnesses, plan modes, socratic interviews, SDD waterfalls. I was making excruciatingly slow progress while drowning in an ocean of token generated markdown specs, plans, design docs, roadmaps… All possibly accurate at some point but now rotting and drifting and too detailed to hold in my head. I was confused. My confusion was confusing the agents.

Then I stumbled into this tiny guarded human notes file. I was venting to the agents about my being overwhelmed by the changes and context switching and rotting markdown. One AI suggested I keep a personal status scratchpad outside the repo.

Fine sure notes okay.

Then I moved my notes file into the repo so I wasn’t constantly copy pasting slices into prompts. Then I had to prohibit the agents from editing it into another giant rotting spec.

Then it was drifting from the codebase. When I asked agents to check my notes, they’d predictably suggest a wall of text about what was missing or a mind melting log of what was just coded. Unhelpful. So the next prohibition was on suggesting additions. Just tell me what’s outright false, not what’s missing.

And then… I think… it worked?

I started plowing through the migration at a steady pace. Able to restart minutes after outside bug fix distractions or meetings or sleep or the weekend. Willing to bite off larger and larger chunks of my roadmap into longer autonomous agent runs each time.

I deleted nine migration spec files and one roadmap because neither of us were reading them anyway. Trading 1055 lines of agent owned rotting plans for 80 lines of human owned notes.

The constraints

I don’t know if this will work for anyone else. But if my frustration story sounds familiar, maybe give it a try. I’ve been honing some rules.

Bound to human scan time. Keep terse lists all together readable under a few minutes. Continuously combine bullets, especially older ones. You want to hold this all in your head. Keep a “status at a glance” section for moments you were briefly away so you can scan a few lines to snap back into focus.

Human edits only. It’s okay to lift some agent suggested exact phrasing. But typing it yourself forces mental engagement. If you feel the compulsion to copy paste something the agent says, it’s probably too long, so rephrase and compress.

Agent review for falseness, not exhaustiveness. Lean on the agent to check when the codebase has drifted from your mental model. Agents forever want to generate more tokens. Don’t let them here. The gate is correctness in description and assumptions. Not coverage.

The file

README.md is for human collaborators. AGENTS.md is for AI agents. HITL.md is for the human in the agentic coding loop.

# HITL.md

This document captures my current mental model while I'm the human in the agentic coding loop.
I own this file completely, aggressively updating and ruthlessly compressing it as the project progresses.
This keeps me from losing the plot while away, and shows the agent where my head is at.
This file is NOT a spec, NOT a log, NOT exhaustive, and definitely NOT for the agent to edit.

## Agent Rules

When asked to check or review this file, flag ONLY claims that are now false or make false assumptions.
DO NOT edit this file.
DO NOT suggest adding missing information – this file is intentionally non-exhaustive.

## Goals

- Human scannable phrase
	- A sentence or two.
	- Why: Optional motivation
- Human scannable phrase
	- A sentence or two.
	- Why: Optional motivation

## Scope / Sub Goals

- Just the goals so far

## Boundaries / Accepted Tradeoffs

- None yet

## Done

- Nothing yet

## Status at a Glance

- Just started

## Next

- Decide what's first

Also on GitHub here.

I use my own name in my file: “This captures Warren’s mental model… Warren owns this…” And I scope it to the particular complex task I’m working on: docs/HITL-warren-migration.md

And yeah I know this sort of subverts a common use of “human-in-the-loop” as human permission gates. You can change your file name. But “HITL” kinda works. The irony has a long history. And I love typing @hi into my agent harness and seeing auto suggestion point directly where we need to start.

Review cadence

After the agent completes an autonomous run, before we commit the code, I edit my HITL.md and ask the agent to check @hi and tab complete. The agent usually suggests something, but not always. And sometimes I ignore the suggestions because it’s silly and nitpicky. But sometimes I iterate. Then I commit the HITL.md and code together.

If I’ve pulled down recent team changes that look like they might overlap, or I feel too lazy to check, I preemptively ask the agent to review my HITL.md. Since this is read only, I can let this run, and start up another editing agent in separate terminal instance. If your tooling lets you split worktrees easily that might be worth doing for these parallel sessions. But honestly I haven’t needed worktrees for this yet. If you make a change to HITL.md that would affect your agentic coding run, you probably need to reset it anyway.

Failure modes

I duct taped this together. At first I was definitely doing it wrong. I probably still am. Here are a few things I’m learning.

Long lists turn into specs and logs. The high level goals do not capture every important detail. Sometimes you discover unexpected but unavoidable tasks. So you need a place for some requirements and legitimate scope creep. Hence the “scope / sub goals” section. But this quickly turns into a mini spec if left unchecked. And the “done” section turns into a mini changelog. I thought that was fine at first. I could just scan over old bullets because they were append only anyway. But it turns out both could subtly rot. Later changes would render earlier claims technically wrong. Each agent check could turn into an exercise in carefully tweaking old language just a little to better match reality. It seemed a waste of time. Better to compress and generalize rather than perfect.

Inbox turns into an issue tracker. Sometimes an agent would suggest a change I wanted to consider but postpone. Or a team meeting would produce a small requirement change. I added an “inbox / remember” section to the bottom of the file. This turned into a backlog of noise that distracted the agents. I thought about adding more rules about ignoring the inbox. But tiny out of loop changes were causing commit noise. So I decided these belong in todo lists or our issue tracker. These are tools specifically designed for cognitive offloading, which is contra the design intent of a human mental model mirror.

Open questions

Global mental models. I’d already created docs/mental-models.md for confusing but stable domain entities and operations. It seemed too general for my migration scope specific file. I’m wondering if this belongs in a root HITL.md or maybe alongside it. But cataloguing every mental model in a vast codebase could get out of hand.

Greenfield repos. I’m also experimenting with pulling this template into fresh side projects. I’m not sure it’s appropriate there. We’ll see. Maybe it needs adjustment. Maybe it’s overkill or just the wrong fit.

Team dynamics. And I’m not sure how this fits into big team settings where multiple people are working on the the same feature. How does it work with pull requests and human code review? Maybe big collaborative team folks can help see a way through.

Tell me if you find this useful. Tell me if you have improvements. Tell me if this is just another ceremonial yak shaving tool shaped object.

Feedback. I welcome discussion and critique. Public or private. w@warrenwhipple.com

Process. AI assistance in research and thinking. Writing is my own. AI assistance in editing. See my process and ethics statement.

AIs used. ChatGPT, Claude, Gemini, Grok.

Key influences first hand

Key influences via AI assistance