a agentk.it Browse tools
Back to Signals
arXiv · analysis signal

Contractual Skills: Making Enterprise AI Agents Governable

Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

Signal thesis

Contractual skills represent a pragmatic step toward governable AI agents, proving that explicit task contracts improve checkability and maintainability without harming performance—but they are not a substitute for runtime safety mechanisms.

Why it matters

For agentk.it users building enterprise agent workflows, this paper provides a concrete, testable pattern for making agent skills inspectable and auditable. It clarifies when to use contractual fields versus runtime guardrails, helping developers design systems that balance flexibility with governance requirements.

Original source

https://arxiv.org/abs/2605.22634v1

Key takeaways

Read this first.

  1. Contractual skills make task intent, boundaries, and acceptance criteria explicit without requiring heavyweight formal specifications
  2. The framework integrates naturally with existing SKILL.md patterns and MCP surfaces, enabling progressive loading and lightweight discovery
  3. Text-generation quality gains over expanded plain skills are marginal, suggesting the primary value is in governance, not generation
  4. Tool-calling safety improves with contractual skills, but model-level differences remain significant and runtime guardrails are still essential
Ecosystem impact

Where this changes the map.

For Researchers

Provides a reproducible experimental framework (960 outputs, 1680 cross-judge scores) for studying skill governance. Opens questions about how contractual fields interact with model reasoning and tool selection.

For Developers

Offers a practical pattern for structuring SKILL.md files that balances readability with inspectability. Clarifies the boundary between skills, YAML contracts, MCP surfaces, and runtime guardrails—reducing architectural confusion.

For Users

Enterprise users gain confidence that agent skills have explicit, auditable boundaries and acceptance criteria. The framework enables better human oversight through defined approval points and handoff rules.

Full English translation

Translated text.

Summary

As enterprises deploy AI agents for increasingly complex tasks, the need for governance mechanisms that are both lightweight and inspectable has become critical. Ting Liu’s paper introduces “contractual skills”—a design framework inspired by GovernSpec that extends the common SKILL.md pattern with explicit fields for goals, input boundaries, permissions, evidence requirements, output contracts, quality criteria, verification steps, human approval points, and handoff rules.

The paper’s key insight is that contractual skills should serve as a governance layer rather than a performance optimization. Through two offline experiments—a text-generation study with 960 outputs across 8 models and a tool-calling challenge with 192 simulated records—Liu demonstrates that contractual skills consistently outperform no-skill and minimal-skill baselines. However, when compared to information-rich plain expanded skills, the gains are small and mixed, confirming that the primary value lies in making task intent and boundaries explicit rather than improving raw generation quality.

The tool-calling results are particularly instructive: while contractual skills reduced high-risk tool attempts, model-level differences persisted, and runtime tool guardrails remained necessary. This reinforces the paper’s central thesis that contractual skills are best understood as a complement to, not a replacement for, runtime safety mechanisms.

Key Contributions

  • Contractual Skills Framework: A GovernSpec-inspired design pattern for SKILL.md files that makes goals, boundaries, permissions, evidence requirements, output contracts, quality criteria, verification steps, human approval points, and handoff rules inspectable
  • Architectural Clarification: Clear delineation between contractual skills, GovernSpec YAML contracts, Model Context Protocol surfaces, tool adapters, runtime guardrails, tracing, and evaluation systems
  • Empirical Validation: Two offline experiments with 960 text-generation outputs and 192 tool-call records across 8 models, with 1680 cross-judge score records
  • Baseline Comparisons: Systematic comparison across four instruction conditions (no-skill, minimal-skill, contractual-skill, expanded-skill) showing where contractual skills add value and where they don’t
  • Safety Analysis: Evidence that contractual skills reduce high-risk tool attempts but cannot replace runtime guardrails

Implications

For Researchers

This paper provides a reproducible experimental framework for studying skill governance in enterprise AI agents. The 960-output, 1680-cross-judge dataset offers a benchmark for future work on skill structure and agent behavior. Researchers should explore how contractual fields interact with different model architectures and reasoning strategies, and investigate whether certain contractual fields (e.g., evidence requirements vs. quality criteria) have disproportionate impact on agent behavior.

For Developers

The framework offers a practical, immediately applicable pattern for structuring SKILL.md files. Developers can adopt contractual skills incrementally—starting with goal and boundary fields, then adding verification steps and approval points as needed. The paper’s architectural clarifications help avoid common confusions between skills, YAML contracts, MCP surfaces, and runtime guardrails, enabling cleaner system designs.

For Users

Enterprise users gain confidence that agent skills have explicit, auditable boundaries and acceptance criteria. The framework enables better human oversight through defined approval points and handoff rules, making it easier to integrate AI agents into regulated workflows. However, users should understand that contractual skills are a governance layer, not a safety guarantee—runtime monitoring remains essential.

References

What to watch next

Follow-up signals.

  • Integration of contractual skills with MCP servers for runtime contract enforcement
  • Empirical studies comparing contractual skills against formal verification approaches for safety-critical agent tasks
  • Tooling ecosystems that auto-generate contractual skill templates from natural language descriptions
Source and permission

Trace the origin.

Original title
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents
Source
arXiv
Author
Ting Liu
Original date
2026-05-21
Permission
open_license
Published
2026-05-25
Source URL
https://arxiv.org/abs/2605.22634v1