a agentk.it Browse tools
Back to Signals
arXiv · analysis signal

Formal Skill: Executable Runtime Skills for LLM Agents

Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents

Signal thesis

Formal Skill signals a shift from prompt-based skill engineering to runtime-native, executable skill abstractions, promising more reliable and token-efficient LLM agents.

Why it matters

For agentk.it users building production agents, Formal Skill offers a path to reduce token costs and improve reliability by encoding workflows as executable state machines rather than fragile prompt text. This abstraction directly addresses the brittleness of current MCP and function-calling approaches by embedding state, policy, and completion discipline directly into the skill runtime.

Original source

https://arxiv.org/abs/2605.19604v1

Key takeaways

Read this first.

  1. Formal Skill replaces informal prompt-based skills with executable, stateful runtime components that enforce policies via hooks.
  2. The FairyClaw runtime demonstrates that this approach reduces token consumption while maintaining or improving task accuracy.
  3. This abstraction is particularly impactful for multi-step workflows where state management and policy enforcement are critical.
Ecosystem impact

Where this changes the map.

For Researchers

Opens a new research direction in runtime-native agent skill design, moving beyond prompt engineering to executable state machines with formal properties.

For Developers

Provides a concrete implementation (FairyClaw) and design pattern for building more reliable, token-efficient agents. Developers can now encode complex workflows as composable, enforceable skills rather than relying on lengthy prompt instructions.

For Users

End users benefit from agents that are more accurate, use fewer tokens (lower cost), and can reliably enforce business policies and completion requirements.

Full English translation

Translated text.

Summary

Large Language Model (LLM) agents increasingly operate in real-world workspaces, but the skills they use to translate reasoning into action remain largely informal. Current approaches—Markdown skills, instruction packs, function calling, and MCP servers—either encode procedures as long natural-language documents or leave workflow state, policy enforcement, and completion discipline outside the skill itself. This paper introduces Formal Skill, a runtime-native abstraction that represents reusable capabilities through JSON metadata, action schemas, reliable Python executors, hook-governed control logic, and skill-local runtime state.

The authors implement Formal Skill in FairyClaw, an open-source event-driven runtime designed for executable, observable, and composable skills. By moving reusable procedures from repeated prompt text into executable state machines with hook policies, Formal Skill provides agents with a token-efficient and enforceable control surface. On the Harness-Bench benchmark, FairyClaw achieves highly competitive average scores while using substantially fewer tokens, with particularly strong results on tasks that benefit from structured workflow enforcement.

Key Contributions

  • Formal Skill abstraction: A runtime-native skill representation with JSON metadata, action schemas, Python executors, hook-governed control logic, and local runtime state.
  • FairyClaw runtime: An open-source, event-driven implementation that makes Formal Skills executable, observable, and composable.
  • Token efficiency: Demonstrated reduction in token usage while maintaining or improving task accuracy on Harness-Bench.
  • Policy enforcement via hooks: A novel mechanism for embedding workflow state, policy enforcement, and completion discipline directly into the skill runtime.
  • Empirical validation: Competitive results on Harness-Bench, especially on tasks requiring structured multi-step workflows.

Implications

For Researchers

This work shifts the research focus from prompt engineering to runtime-native skill design. The Formal Skill abstraction opens avenues for studying formal properties of agent skills—such as termination guarantees, policy compliance, and composability—that are difficult to achieve with informal prompt-based approaches. Researchers can now investigate how executable state machines with hook policies compare to traditional reinforcement learning or planning approaches for agent control.

For Developers

Developers building production agents with frameworks like LangChain, CrewAI, or AutoGen can adopt the Formal Skill pattern to reduce token costs and improve reliability. Instead of crafting lengthy prompt instructions for multi-step workflows, developers can encode these workflows as composable, enforceable skills with built-in state management and policy hooks. The open-source FairyClaw runtime provides a concrete starting point for experimentation and integration.

For Users

End users of AI agents will benefit from more reliable and cost-effective interactions. Formal Skill reduces the likelihood of agents deviating from required workflows or failing to complete tasks, while the token efficiency translates directly to lower operational costs. This is particularly valuable in enterprise settings where policy compliance and auditability are critical.

References

What to watch next

Follow-up signals.

  • Adoption of Formal Skill patterns in mainstream agent frameworks and MCP implementations.
  • Extensions of the hook-governed control logic to support more complex policy enforcement and multi-agent coordination.
Source and permission

Trace the origin.

Original title
Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents
Source
arXiv
Author
Xi Zhang
Original date
2026-05-19
Permission
open_license
Published
2026-05-24
Source URL
https://arxiv.org/abs/2605.19604v1