piss

entries

  1. The agent principal-agent problem
    David Crawshaw
  2. I am building a cloud
    David Crawshaw
  3. Eight more months of agents
    David Crawshaw
  4. How I program with Agents
    David Crawshaw
  5. How I program with LLMs
    David Crawshaw
  6. jsonfile: a quick hack for tinkering
    David Crawshaw
  7. new year, same plan
    David Crawshaw
  8. log4j: between a rock and a hard place
    David Crawshaw
  9. Software I’m thankful for
    David Crawshaw
  10. Remembering the LAN
    David Crawshaw
  11. The asymmetry of Internet identity
    David Crawshaw
  12. Zero Trust Networks
    David Crawshaw
  13. Go 1.13: xerrors
    David Crawshaw
  14. Fast compilers for fast programs
    David Crawshaw
  15. UTF-7: a ghost from the time before UTF-8
    David Crawshaw
  16. One process programming notes (with Go and SQLite)
    David Crawshaw
  17. Reasoning with Regret
    David Crawshaw
  18. Searching the Creative Internet
    David Crawshaw
  19. Service Throughput Tradeoffs
    David Crawshaw
  20. Sharp-Edged Finalizers in Go
    David Crawshaw
  21. The Tragedy of Finalizers
    David Crawshaw
  22. Go and SQLite: when database/sql chafes
    David Crawshaw
  23. Experimentation Adrift
    David Crawshaw
  24. Leaving Google
    David Crawshaw
  25. Less cgo overhead in Go 1.8
    David Crawshaw
  26. BBR
    David Crawshaw
  27. Compiler Bomb
    David Crawshaw
  28. On recieving the News
    David Crawshaw
  29. Buried by the media
    David Crawshaw
  30. Smaller Go 1.7 binaries
    David Crawshaw
  31. Good business
    David Crawshaw
  32. Everyone a writer
    David Crawshaw
  33. 2016-06-29
    David Crawshaw
  34. Transaction oriented collector
    David Crawshaw
  35. Machining under a microscope
    David Crawshaw
  36. Limits of Superintelligence
    David Crawshaw
  37. COPY Relocations
    David Crawshaw
  38. Atom Feed
    David Crawshaw
  39. 2016-02-10
    David Crawshaw
  40. 2016-01-23
    David Crawshaw
  41. 2016-01-18
    David Crawshaw
  42. 2016-01-15
    David Crawshaw
  43. 2016-01-09
    David Crawshaw
  44. 2016-01-07
    David Crawshaw
  45. 2016-01-05
    David Crawshaw
  46. 2016-01-04
    David Crawshaw
  47. 2016-01-03
    David Crawshaw
  48. 2016-01-02
    David Crawshaw
  49. 2016-01-01
    David Crawshaw
  50. 2015-12-29
    David Crawshaw
  51. Under the heel of the spirit
    David Crawshaw
  52. 2015-12-27
    David Crawshaw
  53. 2015-12-26
    David Crawshaw
  54. 2015-12-20
    David Crawshaw
  55. 2015-12-15
    David Crawshaw
  56. 2015-12-04
    David Crawshaw
  57. 2015-11-18
    David Crawshaw
  58. 2015-11-16
    David Crawshaw
  59. 2015-10-13
    David Crawshaw
  60. 2015-08-07
    David Crawshaw
  61. 2015-08-04
    David Crawshaw
  62. 2015-07-27
    David Crawshaw
  63. 2015-07-17
    David Crawshaw
  64. 2015-07-15
    David Crawshaw
  65. 2015-07-14
    David Crawshaw
  66. 2015-07-07
    David Crawshaw
  67. 2015-06-26
    David Crawshaw
  68. 2015-06-24
    David Crawshaw
  69. 2015-06-22
    David Crawshaw
  70. 2015-06-01
    David Crawshaw
  71. 2015-05-08
    David Crawshaw
  72. 2015-05-07
    David Crawshaw
  73. 2015-04-02
    David Crawshaw
  74. 2015-03-10
    David Crawshaw
  75. 2015-03-09
    David Crawshaw
  76. 2015-03-01
    David Crawshaw
  77. 2015-01-11
    David Crawshaw
  78. 2015-01-10
    David Crawshaw
  79. 2014-12-11
    David Crawshaw
  80. 2014-07-28
    David Crawshaw
  81. 2014-06-13
    David Crawshaw
  82. 2014-05-14
    David Crawshaw
  83. 2014-05-06
    David Crawshaw
  84. 2014-04-18
    David Crawshaw
  85. 2014-03-08
    David Crawshaw
  86. 2014-01-17
    David Crawshaw

The agent principal-agent problem

David Crawshaw

source

<h1>The agent principal-agent problem</h1> <p><em>2026-05-07</em></p> <p>Code review is broken.</p> <p>The industry-established code review process, review-then-commit, was a straightforward mechanism that allowed a relatively low-trust group of engineers to collaborate. It appears to have been initially developed for the Apache server OSS project in the 90s, corporatized by Google in the early 2000s, and popularized throughout the industry by several means, most notable of which was the GitHub PR.</p> <p>It was very simple:</p> <ol> <li>A human makes a change.</li> <li>This change is packaged up, sent to another human for commentary.</li> <li>Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This is not Michael Fagan's defect analysis work or the ticket-like processes used for critical systems changes in fields like aerospace. This will not catch your bugs. It will, however, communicate design changes to other engineers who maintain a mental model of the codebase, and reviewers can use the process to teach norms to contributors. It has advantages, and because there is a gate before the main branch changes, it does not require much trust. That makes it a great tool for scaling a company, because beyond ~10-12 engineers (the "two pizza" team, among other names), trust erodes rapidly. It is also great for scaling OSS. It puts work on reviewers, but there was work on the human making the change too. An imbalance existed but was often manageable.</p> <h2>The crisis of code review</h2> <p>Agents broke this. If you insert an agent into the existing process, your best possible outcome is:</p> <ol> <li>A human <strong>instructs a machine</strong> to make a change.</li> <li><strong>The human reviews the code, iterates with comments until they approve it.</strong></li> <li>This change is packaged up, sent to another human for commentary.</li> <li>Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This doubles the amount of review. But companies were already review limited. In a really well-functioning team, a code review cycle could take a day. (Between two engineers who get on well and intimately know each other's work, you could shrink this to an hour.) But across the industry the number was, optimistically, <a href="https://reviewnudgebot.com/blog/how-to-accelerate-code-reviews-with-nudges-insights-from-microsofts-study/">days</a> to get a review merged before agents.</p> <p>Additionally, the whole reason engineers use agents is it improves productivity. More total changes are generated. So we doubled review, and increased the total changes. As you modify the old model, you run out of review bandwidth before you have extracted all the value you can from agents. (And anecdotally, you run out of bandwidth before you get even a fraction of the value of agents.)</p> <p>But things get worse, because no-one actually augments the old processes this way.</p> <h2>The agent principal-agent problem</h2> <p>What happens in reality are processes like this:</p> <ol> <li><strong>A human instructs a machine to make a change.</strong></li> <li>This change is lightly QA'd, packaged up, sent to another human for commentary.</li> <li>Rounds of commentary come back from the reviewer and <strong>are sent wholesale to the machine for adjustments</strong> until the reviewer approves (LGTMs) it.</li> <li>The change is committed.</li> </ol> <p>This is an example of what economists call the <a href="https://en.wikipedia.org/wiki/Principal%E2%80%93agent_problem">principal-agent problem</a>: the reviewer is the principal, the contributor is the agent, and code review only worked because the reviewer could cheaply infer effort from reading the code. Agents collapse that signal. This is what is killing OSS, and it is commonly being referred to as "slop PRs". There is no incentive for the human driving the agent to actually read the code or spend time thinking about what the reviewer says.</p> <p>The result is a radical imbalance. "Contributors" type a sentence or two, of the quality of a poor bug report, spend 5 minutes poking at the resulting program, and then generate serious review load for another engineer. You can do this with no understanding of the underlying project, its constraints, or the tools used to construct it. This is an unmanageable disaster. This does not even work in environments where the reviewer is paid to do the work, because they could be more productive by prompting the agent themselves.</p> <h2>Potential solutions</h2> <p>Small high-trust teams have an easy process they can adopt:</p> <ol> <li>A human instructs a machine to make a change.</li> <li>The human reviews the code, iterates with comments until they approve it.</li> <li>They push the change to production and deploy.</li> </ol> <p>There is still a human in the loop. There is still a reviewer who did not get deeply lost in the weeds of how a problem could be solved. Most importantly, there is no principal-agent problem, because the human driving the machine takes on the responsibility for its actions by owning the deployment.</p> <p>Anecdotal evidence suggests this works for small teams. With a team of nine at <a href="https://exe.dev">exe.dev</a> we have been able to make it work. We spend a lot more time writing integration tests, e2e tests, building agent-based workflows for analyzing commits for safety or performance or usability bugs to minimize risk. This is a lot of machinery teams traditionally do not develop until they are far larger and more mature, on the other hand it is much easier to develop thanks to agents. We also have had to be very selective about our colleagues and be intentional in our communication. But we ship this way.</p> <p>This is not tenable in low-trust environments, i.e. large companies. You have to trust your co-workers to start a conversation about architectural changes before they do it. No-one at BigCo trusts their colleagues to make sweeping changes to a service they "own". And no-one at BigCo wants to be on the hook for a major outage without having coverage from a code review to smear the blame around. (Low trust environments are awful places.)</p> <p>I am sure there are small isolated teams at big companies that have broken with standard practices and are getting real value out of agents. I am also sure there are ICs who have work that lets them maximize the value of an agent without involving their colleagues. (E.g. if you work in quality, agents can help you write and execute endless large-scale experiments you never need get reviewed, just send out what works.) But the vast majority of big company engineers cannot make changes, especially cross-functional changes that agents do so well, without review eating all the productivity gains.</p> <h2>Some hints in the history books</h2> <p>As of writing this, I have not seen anyone describe a process that "scales" agent-driven development in a large company. There is, however, evidence from the past that it is possible. I would point to Microsoft in the 1990s, which did not have mandated review-before-commit practices. Some teams may have, but the company, while large, was organized as many independent teams constantly synchronized by QA processes. This is regarded as "old-fashioned" "cowboy" style development by proponents of the large-team processes that came before agents. But it did work. It created some of Microsoft's most long-lived successful products, like the win32 API. (And yes we could critique a 30 year old API endlessly, but it is still there and significantly better than some of its "replacements" that were built with code review processes.) Little appears to be written about this period of Microsoft history, if you were there I would love to hear or read about your experiences.</p> <p>Until someone develops robust processes for agent use in low-trust environments, small teams have a large force multiplier available to them that big teams do not. Ship while you can.</p>