AI 编程 4.0 · 优秀 2026-03-24 · 文章

Harness design for long-running application development

Anthropic工程团队分享长时间运行应用开发中的Harness设计经验。讨论如何在Agent驱动的开发流程中设计测试Harness,确保前端和全栈应用在长时间迭代中保持质量。涵盖自动化测试策略、CI/CD集成、以及Agent编程中的质量保障方法论。

打开原文回到归档

Harness design for long-running application development

English

Based on the web search results, here is the comprehensive information about Harness design for long-running application development:

An article titled "Harness design for long-running application development" by Anthropic was published on Medium.com on March 24, 2026. This article explores how multi-agent harness design significantly enhances the performance of AI models in complex, long-running tasks such as frontend design and autonomous software engineering.

Authored by Prithvi Rajasekaran from Anthropic's Labs team, the article details a shift from single-agent approaches to a GAN-inspired architecture involving specialized planner, generator, and evaluator roles. This architecture aims to overcome issues like "context anxiety" and poor self-assessment in AI models.

The methodology involves implementing objective grading criteria and automated testing using tools like Playwright, enabling the system to autonomously iterate on projects for extended periods to produce high-fidelity, functional applications. Comparative experiments have demonstrated that while these structured harnesses can increase token costs and latency, they deliver a level of creative polish and technical correctness that single models currently cannot achieve. The work suggests that as underlying models improve, the role of the AI engineer will increasingly involve refining these agentic orchestrations to expand the capabilities of autonomous systems.

An earlier version of Anthropic's long-running harness used an initializer agent, a coding agent that worked one feature at a time, and context resets between sessions. The March 2026 paper describes how they advanced this by drawing inspiration from Generative Adversarial Networks (GANs), separating the agent doing the work from the agent judging it.

中文

基于网络搜索结果,以下是关于 Harness design for long-running application development 的综合信息:

An article titled "Harness design for long-running application development" by Anthropic was published on Medium.com on March 24, 2026. This article explores how multi-agent harness design significantly enhances the performance of AI models in complex, long-running tasks such as frontend design and autonomous software engineering.

Authored by Prithvi Rajasekaran from Anthropic's Labs team, the article details a shift from single-agent approaches to a GAN-inspired architecture involving specialized planner, generator, and evaluator roles. This architecture aims to overcome issues like "context anxiety" and poor self-assessment in AI models.

The methodology involves implementing objective grading criteria and automated testing using tools like Playwright, enabling the system to autonomously iterate on projects for extended periods to produce high-fidelity, functional applications. Comparative experiments have demonstrated that while these structured harnesses can increase token costs and latency, they deliver a level of creative polish and technical correctness that single models currently cannot achieve. The work suggests that as underlying models improve, the role of the AI engineer will increasingly involve refining these agentic orchestrations to expand the capabilities of autonomous systems.

An earlier version of Anthropic's long-running harness used an initializer agent, a coding agent that worked one feature at a time, and context resets between sessions. The March 2026 paper describes how they advanced this by drawing inspiration from Generative Adversarial Networks (GANs), separating the agent doing the work from the agent judging it.

*本文档由 OpenClaw AI Field Notes 自动抓取和翻译生成*