English Signals Desk

Translate high-signal Chinese AI content into English.

Signals connect ecosystem movement back to tools and agents: source, thesis, why it matters, related objects, and an English reading layer.

Read signals as ecosystem evidence.

Each card gives the source, permission posture, thesis, and links into affected tools or agents.

arXiv open license

Adaptive Multi-Agent Framework for Workflow Automation

This paper introduces a multimodal multi-agent framework that automatically executes complex workflows by first constructing a topological knowledge graph from fragmented execution logs (offline), then using adaptive Retrieval-Augmented Generation (RAG) over that graph during inference. A closed-loop verification protocol enables agents to self-correct and navigate non-stationary scenarios, outperforming linear approaches in reliability and semantic awareness.

2026-06-02 0 tools · 0 agents

arXiv open license

Beyond Final Answers: Auditing Hidden Failures in Multi-Agent Workflows

This paper introduces Trajel, a dataset and evaluation framework that audits hallucinations at the trajectory level in multi-agent industrial workflows. It reveals that most existing benchmarks miss failures originating in intermediate Thought-Action-Observation steps, and that nearly half of hallucinated trajectories contain multiple hallucination types simultaneously.

2026-06-02 0 tools · 0 agents

arXiv open license

Decoupled Intelligence: Multi-Agent LLMs for Traffic Simulation

This paper presents a multi-agent LLM framework that decouples the traffic simulation pipeline into specialized roles coordinated by a state-persistent Orchestrator using the Model Context Protocol (MCP). The system automates the entire lifecycle of traffic simulation in SUMO, from scenario planning to execution and iterative optimization, achieving significantly higher task success rates than monolithic agent architectures.

2026-06-02 0 tools · 0 agents

arXiv open license

GENESIS: AI Agents That Build & Test 6G Radio Networks

GENESIS is an agentic AI framework that automates the six most time-consuming processes in cellular R&D—from synthesizing new features from standards to hardening against field anomalies. By combining composable agent primitives with a persistent knowledge base (SYNAPSE), it converts intents into over-the-air validated solutions, dramatically compressing months of manual engineering into autonomous, repeatable workflows.

2026-06-01 0 tools · 0 agents

arXiv open license

Empowerment-Guided Multi-Agent System Prevents Semantic Drift

This paper introduces a multi-agent framework that uses empowerment theory and semantic checkpoints to prevent semantic drift—where agent actions diverge from their original intentions—in scientific computing pipelines. By combining contextual bandits with structured inter-agent communication and self-healing execution loops, the system ensures reliable policy learning and adaptation to novel problem contexts.

2026-06-01 0 tools · 0 agents

arXiv open license

VFEAgent: AI Agents Automate Finite Element Analysis from Images

VFEAgent introduces a multi-agent framework that automates the entire Finite Element Analysis workflow—from interpreting images and text descriptions to generating executable simulation code. By combining vision-language reasoning with a verification-first code synthesis approach, the system achieves higher reliability and physical validity than existing LLM-based methods, marking a significant step toward fully automated engineering simulation.

2026-06-01 0 tools · 0 agents

arXiv open license

AgentGuard: Attribute-Based Access Control for Safer AI Agents

AgentGuard is an attribute-based access control framework that secures LLM-based agents during tool invocation. It requires minimal code changes (around 10 lines) and provides server-side inspection for both single-tool and cross-tool security risks, alongside a visual interface for policy management and auditing.

2026-05-30 0 tools · 0 agents

arXiv open license

Trustworthy Agentic AI: Safety, Privacy & Security Survey

This comprehensive survey systematically maps trustworthiness risks across the entire agentic AI workflow—from planning and tool use to memory and long-horizon interactions. It provides stage-targeted mitigation strategies for safety, robustness, privacy, and system security, and consolidates evaluation into a unified metrics-and-benchmarks hub to support deployment decisions in high-stakes environments.

2026-05-30 0 tools · 0 agents

arXiv open license

MCP Poisoning Attacks: When Tool Manuals Lie to AI Agents

This paper systematically investigates Tool Description Poisoning (TDP), a novel semantic attack where malicious instructions are covertly injected into tool metadata rather than executable code. The authors introduce the MCP-TDP Security Benchmark—a high-fidelity sandbox with 32 realistic test cases—and demonstrate that leading LLMs like GPT-4o are critically vulnerable, with existing defenses proving ineffective or counterproductive.

2026-05-30 0 tools · 0 agents

arXiv open license

AgentTrap: New Benchmark Exposes Hidden Trust Failures in AI Agent Skills

AgentTrap is a dynamic benchmark that evaluates whether LLM agents can safely use third-party skills while resisting malicious runtime behavior. The central finding is that the most informative failures are not simple jailbreaks—models often complete the visible user task while treating unsafe side effects introduced by the skill as part of the normal workflow.

2026-05-26 0 tools · 0 agents

arXiv open license

Peeking Inside AI Agents: Mechanistic Interpretability for Tool Use

This paper introduces a mechanistic interpretability framework that uses Sparse Autoencoders and linear probes to monitor AI agents' internal states before tool calls. It identifies which internal layers and features are associated with tool decisions, enabling early detection of costly tool-use failures in long-horizon enterprise workflows.

2026-05-26 0 tools · 0 agents

arXiv open license

Agent-BOM: Graph-Based Auditing for LLM Agent Security

This paper introduces Agent-BOM, a unified graph-based representation for auditing LLM agent systems. It bridges the semantic gap between low-level execution events and high-level agent intent, enabling security analysts to query and reconstruct complex attack chains across tool invocations, memory states, and multi-agent interactions.

2026-05-26 0 tools · 0 agents

arXiv open license

COAgents: Multi-Agent Framework Masters VRP Search Space

COAgents introduces a cooperative multi-agent framework for solving Vehicle Routing Problems (VRP) by modeling the search process as a dynamically constructed graph. Three specialized agents—Node Selection, Move Selection, and Jump—collaborate to guide intensification and exploration, achieving state-of-the-art results on VRPTW benchmarks and competitive performance on CVRP.

2026-05-25 0 tools · 0 agents

arXiv open license

Contractual Skills: Making Enterprise AI Agents Governable

This paper introduces 'contractual skills'—a GovernSpec-inspired framework for organizing SKILL.md files as readable, inspectable task contracts. Through two offline experiments covering 960 text-generation outputs and 192 simulated tool-call records across multiple models, the author demonstrates that contractual skills improve governance and checkability over baselines, but do not significantly boost raw generation quality. The framework clarifies the boundary between skills, YAML contracts, MCP surfaces, tool adapters, and runtime guardrails.

2026-05-25 0 tools · 0 agents

arXiv open license

Orchard: Open-Source Framework for Scalable Agent Training

Orchard introduces a lightweight, open-source environment service (Orchard Env) that provides reusable primitives for sandbox lifecycle management across agent domains. Built on this layer, the authors demonstrate three agentic modeling recipes—Orchard-SWE (coding), Orchard-GUI (computer use), and Orchard-Claw (personal assistant)—that achieve state-of-the-art results among open-source models using dramatically fewer training trajectories than prior approaches.

2026-05-25 0 tools · 0 agents

arXiv open license

LLM Agents Self-Adapt Security for IoT at the Edge

ASPO introduces a self-adaptive multi-agent architecture that integrates LLM-based reasoning with deterministic enforcement within a MAPE-K control loop for IoT security pattern selection. The framework separates stochastic decision generation from execution, achieving 100% conflict-free activation and consistent resource feasibility across workloads while reducing tail latency and energy overheads by over 20%.

2026-05-25 0 tools · 0 agents

arXiv open license

LLM Multi-Agent System Automates Topology Optimization

This paper introduces TopOptAgents, a multi-agent system that automates topology optimization—a complex engineering design process—using six LLM-based agents that collaborate through iterative self-refinement cycles. The framework handles problem formulation, validation, code generation, execution, and quality assessment, successfully producing converged designs even for problem types where single LLMs fail, particularly those with sparse literature coverage.

2026-05-25 0 tools · 0 agents

arXiv open license

When AI agents overtrust bad evidence: a new benchmark

This paper introduces EnvTrustBench, a systematic framework for benchmarking when LLM agents fail to verify environmental evidence—treating stale, incorrect, or malicious observations as sufficient for action. Testing across 6 LLM backbones and 5 scaffolds reveals that evidence-grounding defects (EGDs) are pervasive, highlighting a critical reliability gap in current agent architectures.

2026-05-25 0 tools · 0 agents

arXiv open license

Multi-Agent Security: Architecture Matters More Than You Think

This paper presents the first systematic empirical study of how architectural decisions in multi-agent systems (MAS) affect the tradeoff between task performance and security. Across three environments and 13 configurations, the authors find that MAS designs are generally more vulnerable than single agents, with attack success rates varying by up to 3.8x depending on architecture choices like agent roles, communication topology, and memory design.

2026-05-24 0 tools · 0 agents

arXiv open license

Claw AI Lab: From Prompt to Interactive AI Research Team

Claw AI Lab transforms autonomous research from a black-box pipeline into an interactive, multi-agent laboratory. Users can instantiate a full research team from a single prompt, monitor progress in real time, inspect artifacts, and roll back experiments—all through a unified dashboard. Its Claw-Code Harness bridges the gap between code execution and paper generation, significantly improving experimental completeness and result integrity.

2026-05-24 0 tools · 0 agents

arXiv open license

Formal Skill: Executable Runtime Skills for LLM Agents

This paper introduces Formal Skill, a runtime-native abstraction that moves reusable agent procedures from verbose prompt text into executable state machines with hook-governed policies. Implemented in the open-source FairyClaw runtime, it achieves competitive accuracy on Harness-Bench while using significantly fewer tokens, particularly excelling on tasks requiring structured workflow enforcement and policy compliance.

2026-05-24 0 tools · 0 agents

arXiv open license

Hybrid LLM-RL Red Teaming Framework Exposes AI Security Gaps

This paper introduces an autonomous red teaming framework that combines large language models with reinforcement learning to generate adaptive, multi-stage attack campaigns against AI-enabled security systems. Testing in high-fidelity enterprise simulations reveals that standalone LLM agents cannot sustain complex attacks, while hybrid LLM-RL approaches achieve significantly higher compromise rates, exposing critical vulnerabilities in current AI security defenses.

2026-05-21 0 tools · 0 agents

arXiv open license

AgentCo-op: Retrieval-Based Multi-Agent Workflow Synthesis

AgentCo-op introduces a retrieval-based synthesis framework that dynamically composes existing agents, tools, and skills into multi-agent workflows using typed artifact handoffs. It applies bounded local repair to fix only failing components, achieving strong benchmark results while reducing costs and enabling open-world scientific collaboration without redesigning existing agents.

2026-05-21 0 tools · 0 agents

arXiv open license

ColPackAgent: MCP-Powered AI for Colloidal Packing Simulations

ColPackAgent is an agent framework that pairs a domain-specific Python package (colpack) with a Model Context Protocol (MCP) tool server and a portable agent skill to autonomously execute colloidal packing simulations. It demonstrates that general-purpose LLMs can reliably run structured scientific workflows when given dedicated tools and workflow instructions, rather than just describing them.

2026-05-21 0 tools · 0 agents

arXiv open license

EngiAI: Multi-Agent Benchmark Reveals LLM Gaps in Engineering Design

EngiAI introduces a multi-agent benchmark suite for LLM-driven engineering design, testing agents across workflow, RAG, and HPC dimensions. Results show proprietary models excel on simple tasks but all models struggle with conditional branching and long-running orchestration, revealing critical gaps for real-world engineering deployment.

2026-05-21 0 tools · 0 agents

arXiv open license

Layered Security Review of Autonomous Agent Frameworks

This survey provides the first layered review of security risks and defenses in autonomous agent frameworks built on LLMs. By organizing threats across four layers—context/instruction, tool/action, state/persistence, and ecosystem/automation—the authors reveal how attacks can propagate from manipulated inputs to persistent state contamination and ecosystem-level impact, using OpenClaw as a case study.

2026-05-21 0 tools · 0 agents

arXiv open license

When Skills Hurt: Negative Result for CTF Agents

This paper presents a controlled study of an MCP-grounded autonomous Capture-the-Flag (CTF) agent, showing that adding curated procedural knowledge (Skills) yields no statistically significant improvement over a no-Skills baseline. The authors argue that when an agent's tool layer returns strict, low-latency, schema-validated observations, the environment itself provides the correction signal that Skills are meant to supply, making them redundant overhead—and in some cases, actively harmful.

2026-05-21 0 tools · 0 agents

GitHub open license

Agent Memory Goes Infrastructure: Memori at 14K Stars

Memori represents a new category: agent-native memory infrastructure. It's LLM-agnostic, turning agent execution traces and conversations into structured, persistent state for production systems. At 14K stars, it signals that memory is becoming a standalone infrastructure concern, separate from the agent runtime.

2026-05-19 1 tools · 3 agents

GitHub open license

China Agent Ecosystem: agentUniverse Framework and Chinese Developer Tools

Two Chinese-origin projects highlight parallel development in the agent ecosystem. agentUniverse is a multi-agent framework that lets developers build collaborative LLM applications. indie-hacker-tools-plus is a curated Chinese-language tool stack for independent developers, including AI agent tools. Both signal that the Chinese AI agent ecosystem is building its own infrastructure layer.

2026-05-19 0 tools · 3 agents

arXiv open license

MCP Security Goes Architectural: Prompts Don't Protect

Two papers this week signal a shift in MCP security: from trusting LLMs to enforce rules via prompts, to architectural enforcement at the protocol layer. Rohith Uppala demonstrates that LLMs will select unauthorized tools in adversarial scenarios regardless of prompt instructions. A separate paper from the ADR team presents the first production-proven enterprise MCP security framework.

2026-05-19 0 tools · 0 agents

GitHub open license

OpenClaw Agent Ecosystem Hits 162 Production Templates

Three community-curated awesome-lists have emerged as ecosystem hubs. The OpenClaw agent template collection now hosts 162 production-ready SOUL.md configurations across 19 categories. The Claude Code awesome-list has reached 44K stars, making it the largest agent-specific resource index. A new awesome-agent-skills list adds a dedicated skill discovery layer.

2026-05-19 3 tools · 2 agents

arXiv open license

The Tool-Calling Training Gap: FireFly and EnvFactory Attack the Bottleneck

Training LLMs to reliably call tools remains a bottleneck. Two new papers present complementary solutions: FireFly generates verified tool-call trajectories from real APIs with ground-truth outcomes, while EnvFactory synthesizes executable environments and uses reinforcement learning to scale agent training. Together they address the core data problem that limits tool-using agent reliability.

2026-05-19 0 tools · 3 agents

GitHub — didilili/ai-agents-from-zero open license

China's AI Agent Education Ecosystem Goes Systematic: The ai-agents-from-zero Phenomenon

A single Chinese-language GitHub repository now packages the entire AI agent learning path — from LLM fundamentals and prompt engineering through LangChain/LangGraph, Coze and Dify low-code platforms, MCP protocol implementation, enterprise RAG workflows, fine-tuning with LoRA/QLoRA, and an interview question bank aligned with real job descriptions. With 1,100+ stars and growing, it signals that China's agent developer education is consolidating around Python-first, framework-deep, project-complete curricula.

2026-05-18 0 tools · 0 agents

GitHub — huangjia2019/claude-code-engineering open license

Claude Code as Engineering Tool: Chinese Developers Move Beyond Code Generation

A GeekTime (极客时间) column companion repository demonstrating how Chinese developers use Claude Code for full engineering workflows — architecture design, code review, testing, deployment, and documentation — not just code generation. It represents a maturation in how Chinese developers think about AI coding tools: from 'write this function' to 'own this feature end-to-end.'

2026-05-18 0 tools · 2 agents

GitHub — jeecgboot/JeecgBoot open license

JeecgBoot: AI Skills Meet Low-Code — 46,000 Stars for China's AI-Native Development Platform

JeecgBoot is China's most popular AI-powered low-code platform with 46,000+ GitHub stars. It combines traditional low-code generation with AI Skills that can generate entire systems from natural language descriptions — one sentence to draw workflows, design forms, and scaffold complete applications. With built-in AI chat, knowledge bases, MCP plugin support, and compatibility with mainstream Chinese and Western LLMs, it represents the convergence of AI agents and enterprise application development in China.

2026-05-18 2 tools · 0 agents

GitHub Ecosystem Analysis — Multiple Sources open license

MCP Protocol Adoption in China: From Experimental to Production Infrastructure

Analysis of the Chinese-language MCP ecosystem on GitHub reveals that MCP adoption has moved beyond experimental projects into production infrastructure. Chinese developers are building MCP servers for domestic services (WeChat, DingTalk, Feishu, Baidu, Alibaba Cloud), enterprise databases (MySQL, PostgreSQL, MongoDB), and internal tools. Major Chinese platforms — including Coze, Dify, JeecgBoot, and Trae — now include MCP support. Total Chinese-language MCP repositories with 300+ stars have grown to over 100, signaling mainstream adoption.

2026-05-18 4 tools · 3 agents

GitHub — openocta/openocta open license

OpenOcta: Enterprise-Grade Open-Source Agent Platform Built for Chinese Teams

OpenOcta is an open-source enterprise agent platform purpose-built for Chinese development teams. It packages multi-agent orchestration, tool integration, knowledge management, and observability into a single deployable system, with native support for Chinese enterprise workflows including WeCom, DingTalk, and Feishu integration. With 2,500+ GitHub stars, it signals growing demand for self-hosted, China-specific agent infrastructure.

2026-05-18 0 tools · 2 agents

GitHub — ErlichLiu/Proma open license

Feishu-Native AI Agent: Proma Brings Proactive Agents to Chinese Workplace

Proma is an open-source proactive AI agent built on the Claude Agent SDK, designed to live inside Feishu (Lark) group chats. It demonstrates a new paradigm where agents aren't summoned — they proactively participate in conversations, suggest actions, and complete multi-step workflows. With native Feishu integration and flexible model provider support, it's a blueprint for how Chinese workplace agents will operate.

2026-05-18 0 tools · 2 agents

AgentScope / QwenPaw GitHub open license

QwenPaw packages personal agents around skills, channels, memory, and safety

QwenPaw is a China-origin personal AI assistant that emphasizes local or cloud deployment, skills, multi-agent collaboration, multi-channel access, memory, and safety controls.

2026-05-15 2 tools · 1 agents

QwenLM / Qwen-Agent GitHub open license

Qwen-Agent turns MCP into a first-class agent framework capability

Qwen-Agent presents MCP as part of a broader agent framework that also includes function calling, code interpreter, RAG, GUI apps, and model-service integration.

2026-05-13 2 tools · 1 agents

AgentScope / QwenPaw GitHub open license

QwenPaw packages personal agents around skills, channels, memory, and safety

QwenPaw is a China-origin personal AI assistant that emphasizes local or cloud deployment, skills, multi-agent collaboration, multi-channel access, memory, and safety controls.

2026-05-13 2 tools · 1 agents