Orchard: Open-Source Framework for Scalable Agent Training

Summary

Orchard addresses a critical bottleneck in agentic AI research: the lack of open, scalable infrastructure for training autonomous agents. While many high-performing agent systems rely on proprietary codebases, models, or services, most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. Orchard introduces a lightweight environment service (Orchard Env) that provides reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages.

The paper demonstrates three agentic modeling recipes built on Orchard Env. Orchard-SWE targets coding agents, achieving 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL—a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks, achieving 74.1% on WebVoyager, 67.0% on Online-Mind2Web, and 64.0% on DeepShop. Orchard-Claw targets personal assistant agents, achieving 59.6% pass@3 on Claw-Eval with only 0.2K synthetic tasks.

Key Contributions

Orchard Env: A lightweight, open-source environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages.
Credit-Assignment SFT: A novel training method that learns from productive segments of unresolved trajectories, enabling effective use of partial or failed trajectories.
Balanced Adaptive Rollout (BAR): An RL technique that dynamically adjusts rollout distribution to focus on underperforming scenarios, improving sample efficiency.
Three Domain-Specific Recipes: Orchard-SWE (coding), Orchard-GUI (computer use), and Orchard-Claw (personal assistant) demonstrating state-of-the-art results with minimal training data.
Extreme Data Efficiency: Orchard-GUI and Orchard-Claw achieve competitive results with only hundreds to thousands of training trajectories, orders of magnitude less than prior approaches.

Implications

For Researchers

Orchard provides a standardized, open environment layer that decouples agent training from proprietary infrastructure. This enables reproducible research, fair comparisons across agent architectures, and the ability to share and reuse training data and recipes across labs. The credit-assignment SFT and BAR techniques offer new tools for learning from partial trajectories, which could be applied to other domains where complete successful trajectories are scarce.

For Developers

Developers can leverage Orchard’s reusable environment primitives and pre-trained recipes to build custom agents for coding, GUI automation, or personal assistance without needing massive proprietary datasets or compute resources. The framework’s harness-agnostic design means it can integrate with existing agent frameworks and tools, reducing the barrier to entry for building production-grade agents.

For Users

End users will benefit from more capable, open-source agents that can be deployed locally or in private clouds, reducing reliance on proprietary API services. The data efficiency of Orchard’s recipes means that specialized agents can be trained for niche domains with limited data, enabling greater customization and privacy for enterprise and personal use cases.

References

https://arxiv.org/abs/2605.15040v2

Orchard: Open-Source Framework for Scalable Agent Training

Read this first.

Where this changes the map.

Translated text.