piss

entries

The agent principal-agent problem
David Crawshaw
I am building a cloud
David Crawshaw
Eight more months of agents
David Crawshaw
How I program with Agents
David Crawshaw
How I program with LLMs
David Crawshaw
jsonfile: a quick hack for tinkering
David Crawshaw
new year, same plan
David Crawshaw
log4j: between a rock and a hard place
David Crawshaw
Software I’m thankful for
David Crawshaw
Remembering the LAN
David Crawshaw
The asymmetry of Internet identity
David Crawshaw
Zero Trust Networks
David Crawshaw
Go 1.13: xerrors
David Crawshaw
Fast compilers for fast programs
David Crawshaw
UTF-7: a ghost from the time before UTF-8
David Crawshaw
One process programming notes (with Go and SQLite)
David Crawshaw
Reasoning with Regret
David Crawshaw
Searching the Creative Internet
David Crawshaw
Service Throughput Tradeoffs
David Crawshaw
Sharp-Edged Finalizers in Go
David Crawshaw
The Tragedy of Finalizers
David Crawshaw
Go and SQLite: when database/sql chafes
David Crawshaw
Experimentation Adrift
David Crawshaw
Leaving Google
David Crawshaw
Less cgo overhead in Go 1.8
David Crawshaw
BBR
David Crawshaw
Compiler Bomb
David Crawshaw
On recieving the News
David Crawshaw
Buried by the media
David Crawshaw
Smaller Go 1.7 binaries
David Crawshaw
Good business
David Crawshaw
Everyone a writer
David Crawshaw
2016-06-29
David Crawshaw
Transaction oriented collector
David Crawshaw
Machining under a microscope
David Crawshaw
Limits of Superintelligence
David Crawshaw
COPY Relocations
David Crawshaw
Atom Feed
David Crawshaw
2016-02-10
David Crawshaw
2016-01-23
David Crawshaw
2016-01-18
David Crawshaw
2016-01-15
David Crawshaw
2016-01-09
David Crawshaw
2016-01-07
David Crawshaw
2016-01-05
David Crawshaw
2016-01-04
David Crawshaw
2016-01-03
David Crawshaw
2016-01-02
David Crawshaw
2016-01-01
David Crawshaw
2015-12-29
David Crawshaw
Under the heel of the spirit
David Crawshaw
2015-12-27
David Crawshaw
2015-12-26
David Crawshaw
2015-12-20
David Crawshaw
2015-12-15
David Crawshaw
2015-12-04
David Crawshaw
2015-11-18
David Crawshaw
2015-11-16
David Crawshaw
2015-10-13
David Crawshaw
2015-08-07
David Crawshaw
2015-08-04
David Crawshaw
2015-07-27
David Crawshaw
2015-07-17
David Crawshaw
2015-07-15
David Crawshaw
2015-07-14
David Crawshaw
2015-07-07
David Crawshaw
2015-06-26
David Crawshaw
2015-06-24
David Crawshaw
2015-06-22
David Crawshaw
2015-06-01
David Crawshaw
2015-05-08
David Crawshaw
2015-05-07
David Crawshaw
2015-04-02
David Crawshaw
2015-03-10
David Crawshaw
2015-03-09
David Crawshaw
2015-03-01
David Crawshaw
2015-01-11
David Crawshaw
2015-01-10
David Crawshaw
2014-12-11
David Crawshaw
2014-07-28
David Crawshaw
2014-06-13
David Crawshaw
2014-05-14
David Crawshaw
2014-05-06
David Crawshaw
2014-04-18
David Crawshaw
2014-03-08
David Crawshaw
2014-01-17
David Crawshaw

The agent principal-agent problem

David Crawshaw

<h1>The agent principal-agent problem</h1> <p><em>2026-05-07</em></p> <p>Code review is broken.</p> <p>The industry-established code review process, review-then-commit, was a straightforward mechanism that allowed a relatively low-trust group of engineers to collaborate. It appears to have been initially developed for the Apache server OSS project in the 90s, corporatized by Google in the early 2000s, and popularized throughout the industry by several means, most notable of which was the GitHub PR.</p> <p>It was very simple:</p> <ol> <li>A human makes a change.</li> <li>This change is packaged up, sent to another human for commentary.</li> <li>Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This is not Michael Fagan's defect analysis work or the ticket-like processes used for critical systems changes in fields like aerospace. This will not catch your bugs. It will, however, communicate design changes to other engineers who maintain a mental model of the codebase, and reviewers can use the process to teach norms to contributors. It has advantages, and because there is a gate before the main branch changes, it does not require much trust. That makes it a great tool for scaling a company, because beyond ~10-12 engineers (the "two pizza" team, among other names), trust erodes rapidly. It is also great for scaling OSS. It puts work on reviewers, but there was work on the human making the change too. An imbalance existed but was often manageable.</p> <h2>The crisis of code review</h2> <p>Agents broke this. If you insert an agent into the existing process, your best possible outcome is:</p> <ol> <li>A human <strong>instructs a machine</strong> to make a change.</li> <li><strong>The human reviews the code, iterates with comments until they approve it.</strong></li> <li>This change is packaged up, sent to another human for commentary.</li> <li>Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This doubles the amount of review. But companies were already review limited. In a really well-functioning team, a code review cycle could take a day. (Between two engineers who get on well and intimately know each other's work, you could shrink this to an hour.) But across the industry the number was, optimistically, <a href="https://reviewnudgebot.com/blog/how-to-accelerate-code-reviews-with-nudges-insights-from-microsofts-study/">days</a> to get a review merged before agents.</p> <p>Additionally, the whole reason engineers use agents is it improves productivity. More total changes are generated. So we doubled review, and increased the total changes. As you modify the old model, you run out of review bandwidth before you have extracted all the value you can from agents. (And anecdotally, you run out of bandwidth before you get even a fraction of the value of agents.)</p> <p>But things get worse, because no-one actually augments the old processes this way.</p> <h2>The agent principal-agent problem</h2> <p>What happens in reality are processes like this:</p> <ol> <li><strong>A human instructs a machine to make a change.</strong></li> <li>This change is lightly QA'd, packaged up, sent to another human for commentary.</li> <li>Rounds of commentary come back from the reviewer and <strong>are sent wholesale to the machine for adjustments</strong> until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This is an example of what economists call the <a href="https://en.wikipedia.org/wiki/Principal%E2%80%93agent_problem">principal-agent problem</a>: the reviewer is the principal, the contributor is the agent, and code review only worked because the reviewer could cheaply infer effort from reading the code. Agents collapse that signal. This is what is killing OSS, and it is commonly being referred to as "slop PRs". There is no incentive for the human driving the agent to actually read the code or spend time thinking about what the reviewer says.</p> <p>The result is a radical imbalance. "Contributors" type a sentence or two, of the quality of a poor bug report, spend 5 minutes poking at the resulting program, and then generate serious review load for another engineer. You can do this with no understanding of the underlying project, its constraints, or the tools used to construct it. This is an unmanageable disaster. This does not even work in environments where the reviewer is paid to do the work, because they could be more productive by prompting the agent themselves.</p> <h2>Potential solutions</h2> <p>Small high-trust teams have an easy process they can adopt:</p> <ol> <li>A human instructs a machine to make a change.</li> <li>The human reviews the code, iterates with comments until they approve it.</li> <li>They push the change to production and deploy.</li> </ol> <p>There is still a human in the loop. There is still a reviewer who did not get deeply lost in the weeds of how a problem could be solved. Most importantly, there is no principal-agent problem, because the human driving the machine takes on the responsibility for its actions by owning the deployment.</p> <p>Anecdotal evidence suggests this works for small teams. With a team of nine at <a href="https://exe.dev">exe.dev</a> we have been able to make it work. We spend a lot more time writing integration tests, e2e tests, building agent-based workflows for analyzing commits for safety or performance or usability bugs to minimize risk. This is a lot of machinery teams traditionally do not develop until they are far larger and more mature, on the other hand it is much easier to develop thanks to agents. We also have had to be very selective about our colleagues and be intentional in our communication. But we ship this way.</p> <p>This is not tenable in low-trust environments, i.e. large companies. You have to trust your co-workers to start a conversation about architectural changes before they do it. No-one at BigCo trusts their colleagues to make sweeping changes to a service they "own". And no-one at BigCo wants to be on the hook for a major outage without having coverage from a code review to smear the blame around. (Low trust environments are awful places.)</p> <p>I am sure there are small isolated teams at big companies that have broken with standard practices and are getting real value out of agents. I am also sure there are ICs who have work that lets them maximize the value of an agent without involving their colleagues. (E.g. if you work in quality, agents can help you write and execute endless large-scale experiments you never need get reviewed, just send out what works.) But the vast majority of big company engineers cannot make changes, especially cross-functional changes that agents do so well, without review eating all the productivity gains.</p> <h2>Some hints in the history books</h2> <p>As of writing this, I have not seen anyone describe a process that "scales" agent-driven development in a large company. There is, however, evidence from the past that it is possible. I would point to Microsoft in the 1990s, which did not have mandated review-before-commit practices. Some teams may have, but the company, while large, was organized as many independent teams constantly synchronized by QA processes. This is regarded as "old-fashioned" "cowboy" style development by proponents of the large-team processes that came before agents. But it did work. It created some of Microsoft's most long-lived successful products, like the win32 API. (And yes we could critique a 30 year old API endlessly, but it is still there and significantly better than some of its "replacements" that were built with code review processes.) Little appears to be written about this period of Microsoft history, if you were there I would love to hear or read about your experiences.</p> <p>Until someone develops robust processes for agent use in low-trust environments, small teams have a large force multiplier available to them that big teams do not. Ship while you can.</p>