Formal Skill: Executable Runtime Skills for LLM Agents

Summary

Large Language Model (LLM) agents increasingly operate in real-world workspaces, but the skills they use to translate reasoning into action remain largely informal. Current approaches—Markdown skills, instruction packs, function calling, and MCP servers—either encode procedures as long natural-language documents or leave workflow state, policy enforcement, and completion discipline outside the skill itself. This paper introduces Formal Skill, a runtime-native abstraction that represents reusable capabilities through JSON metadata, action schemas, reliable Python executors, hook-governed control logic, and skill-local runtime state.

The authors implement Formal Skill in FairyClaw, an open-source event-driven runtime designed for executable, observable, and composable skills. By moving reusable procedures from repeated prompt text into executable state machines with hook policies, Formal Skill provides agents with a token-efficient and enforceable control surface. On the Harness-Bench benchmark, FairyClaw achieves highly competitive average scores while using substantially fewer tokens, with particularly strong results on tasks that benefit from structured workflow enforcement.

Key Contributions

Formal Skill abstraction: A runtime-native skill representation with JSON metadata, action schemas, Python executors, hook-governed control logic, and local runtime state.
FairyClaw runtime: An open-source, event-driven implementation that makes Formal Skills executable, observable, and composable.
Token efficiency: Demonstrated reduction in token usage while maintaining or improving task accuracy on Harness-Bench.
Policy enforcement via hooks: A novel mechanism for embedding workflow state, policy enforcement, and completion discipline directly into the skill runtime.
Empirical validation: Competitive results on Harness-Bench, especially on tasks requiring structured multi-step workflows.

Implications

For Researchers

This work shifts the research focus from prompt engineering to runtime-native skill design. The Formal Skill abstraction opens avenues for studying formal properties of agent skills—such as termination guarantees, policy compliance, and composability—that are difficult to achieve with informal prompt-based approaches. Researchers can now investigate how executable state machines with hook policies compare to traditional reinforcement learning or planning approaches for agent control.

For Developers

Developers building production agents with frameworks like LangChain, CrewAI, or AutoGen can adopt the Formal Skill pattern to reduce token costs and improve reliability. Instead of crafting lengthy prompt instructions for multi-step workflows, developers can encode these workflows as composable, enforceable skills with built-in state management and policy hooks. The open-source FairyClaw runtime provides a concrete starting point for experimentation and integration.

For Users

End users of AI agents will benefit from more reliable and cost-effective interactions. Formal Skill reduces the likelihood of agents deviating from required workflows or failing to complete tasks, while the token efficiency translates directly to lower operational costs. This is particularly valuable in enterprise settings where policy compliance and auditability are critical.

References

https://arxiv.org/abs/2605.19604v1

Formal Skill: Executable Runtime Skills for LLM Agents

Read this first.

Where this changes the map.

Translated text.