Several months ago I took up a monster of a project I had no clear idea how to tackle. What was abundantly clear was that I would need to gain a broad-reaching understanding of company policies, internal systems, and open source technologies to succeed. I had one quarter to deliver meaningfully. In an act of desperation, I reached for my coding agent as a research assistant.
The previous quarter’s effort to deliver a project with the coding agent was a disaster. The ease of letting it write code got the better of me. I let that happen. I started the requirements-gathering discussion from a position of what my ideal solution might look like: a self-moderating, stable system that would implement an operational policy built on feedback. The agent, a sycophantic LLM, happily obliged. I let it design what I asked for, something much larger than what we needed. I let it plan the delivery of what I asked for. We implemented in small, easy to review increments. We shipped thousands of lines of code together. It was a well-designed, well-built system and it was essentially what I’d asked for.
After two months of incremental, tested additions, we arrived at a point of integration, and it didn’t work. The system worked as designed, but the design was wrong. I failed to account for how it would integrate with existing systems. I didn’t catch it and neither did the LLM. We would need to change a fundamental aspect of all the downstream systems.
My cybernetic centaur surgical team1 fell prey to all the same follies of enterprise software development, and I was burned by it. So reaching back for the LLM really was an act of desperation.
The fallout from salvaging the previous project was not without consequence. Over several hard conversations with my classical engineering manager2, he helped me to see how I let things get away from me. We got back to basics with my delivery structure. I agreed to take on this next project, and I was determined not to make the same mistakes.
By this point, I’d read about the wisdom of Beads3 and about the fever dream future of Gas Town4. I felt angry and ashamed that I let the agent get the better of me and made such an egotistical error. I had my Back To Basics checklist for an incremental approach to delivery and the determination to include feedback in my development cycle, not just in my system design. I also had a really ambitious project about which I felt not at all comfortable.
I reached again for the coding agent. This time, I started a research project and set the stage.
## Research Focus
This project researches $COMPANY internal and open source projects related to $TOPICS.
- Use sub-agents liberally for research tasks.
- File findings as tasks for other sessions to act on.
- Primary Goal: identify what's blocking the next incremental improvement.
I wanted to accomplish two things. I wanted to express my Back To Basics approach in a way that the agent would reinforce — essential requirements, minimal design, build something that can validate the requirements, and learn enough to identify the next questions. I also wanted to mold this agent into a thinking partner — no more sycophancy, no more accepting my claims at face value.
Over the next four months we experimented and refined our process into a system that helped me deliver broader scope with better results5 than if I had been working alone. It kept me focused on my intended process, tracked our progress automatically, and enabled me to both stay on top of our next incremental task and be mindful when I chose to deviate from that task. I focused on essential requirements with the goal of learning the next question, and the research assistant curated an incrementally expanding horizon of new requirements. This enabled me to learn my problem domain at pace with project delivery.
We built our collection of research documents out of questions. Everything I assumed or simply didn’t know became a research task. Questions I didn’t know how to ask became conversations that refined into questions and then into research tasks. We catalogued systems, defined processes, identified constraints, challenged assumptions, drew new conclusions, and speculated on future projects. Each time we reached a limit that couldn’t be solved by more research, we spun out a project to build, extend, or replace our system components. We learned from our experience interacting with the other systems and from what our users found useful or frustrating.
We used a task system for carving off context-sized pieces to hand to a coding agent. I got better at checkpointing an agent’s progress before its context was exhausted6. Eventually I ran eight or ten agents concurrently — researching, coding, debugging. It was thrilling and addictive and felt so productive. Just one more prompt. All that context switching was absolutely a drag on quality. It was exhausting.
We settled into a method and structure of collaborative work supported by a small set of tools. The method is the practice I’ve already described, iterating from question to delivery and back to question7.
The structure is a curated notebook of documents of several types, each with a lightweight lifecycle process. That structure gives a schema to an otherwise free-form space, enabling both progressive disclosure8 LLM information retrieval and more spontaneous connection discovery. The tools are a couple of simple scripts for parsing markdown frontmatter and agent skills that trigger in the moment of pursuit of a specific task.
Originally I called it a research journal, but now I’ve taken to calling it a fieldbook. A place to observe and to articulate and test hypotheses. I think of it as my development flywheel.
A fieldbook is not Luhmann’s (or Ahren’s) zettelkasten. Both are techniques for organizing information and enable figurative (and now literal) conversations with myself about what I think I know. With the zettelkasten, the writing is the thinking — permanent notes force me to articulate what I believe about an idea. The point of the zettelkasten is to identify ideas and connections between them. The point of the fieldbook is to identify the shape of a system that meets my needs.
A fieldbook is not Karpathy’s LLM Wiki either. Both are collaborations between human and agent for organizing knowledge. The output of the LLM-wiki is the wiki itself, a highly curated context built by an agent so that you can ask it questions. The fieldbook is an engine for refining requirements based on feedback from previous implementations. The knowledge artifacts are a side-effect and record of the process, not the goal.
What began as an act of desperation has become a system I rely on. Nothing about this fieldbook idea is new. It’s the same process of refining an idea that I’ve always done, or at least aimed to do. The fieldbook is an effort to document and structure that practice so that these new tools can help me to do it more reliably and with less toil. I’ve extracted the bones of this fieldbook into a project template. Try it and let me know if you find that some part of this process is repeatable.
I’m not sure about the proper origins of either metaphor, but I attribute the surgical team concept to Mills via Brooks, the centaur and the cyborg much more recently from Mollick.
He always called himself that, “classical engineering manager,” and I never sorted out why. Maybe until that moment.
A task fights both my and the LLM’s limited short-term memory; it’s a place where spurious thoughts can be sidelined for later. It also keeps this session focused on the objective, less distracted by ancillaries.
The dark factory may one day come, but I think the quality of implementation still matters more than the thought leaders give credit.
Subjectively of course. Software development is so very much more art than science.
50k-100k tokens is the sweet spot for effective work. Less than that and the agent is only running on training data and vibes. More than that and it loses the ability to follow your basic instructions. The wheels fall off.
What every scientist, designer, and engineer already does. Why does this feel novel in software development?
I already mentioned the challenge of managing context. “Progressive disclosure” is what we’re calling the practice of giving the LLM just enough additional information at just the right time.