Don't trust AI agents

English

When you're building with AI agents, they should be treated as untrusted and potentially malicious. Whether it's prompt injection, a model trying to escape its sandbox, or something nobody's thought of yet, you shouldn't be trusting the agent. The right approach isn't better permission checks or smarter allowlists. It's architecture that assumes agents will misbehave and contains the damage when they do.

That's the principle I built NanoClaw on.

Don't trust the process OpenClaw runs on the host machine by default. It has an opt-in sandbox mode, but most users never turn it on. Without it, security relies on application-level checks: allowlists, confirmation prompts, safe commands. These come from a place of implicit trust that the agent isn't going to do something wrong. Once you accept that an agent is potentially malicious, it's obvious application-level blocks aren't enough.

In NanoClaw, container isolation is core. Each agent runs in its own container, created fresh per invocation and destroyed afterward. The boundary is enforced by the OS, not by the application.

Don't trust other agents Even with OpenClaw's sandbox enabled, all agents share the same container. Your personal assistant and your work agent sit in the same environment, and information can leak between agents that should be accessing different data.

In NanoClaw, each agent gets its own container, filesystem, and session history. Your personal assistant can't see your work agent's data because they run in completely separate sandboxes.

Don't trust what you can't read OpenClaw has nearly half a million lines of code and 70+ dependencies. Nobody has reviewed it all. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide.

NanoClaw is ~3,000 lines. A developer can review the entire codebase in an afternoon. New functionality comes through skills. You review exactly what code gets added, and you only add what you need. Every installation is a few thousand lines tailored to the owner's requirements.

With a 400K-line monolith, even if you only use two integrations, the rest is still loaded, still part of your attack surface. With skills, the boundary is obvious.

Design for distrust If a misbehaving agent can cause a security issue, the security model is broken. Security has to be enforced outside the agentic surface. Containers and filesystem isolation exist so that even when an agent does something unexpected, the blast radius is contained.

None of this eliminates risk. But the right response is to make trust as narrow and as verifiable as possible. Don't trust the agent. Build walls around it.

中文

当你使用 AI 智能体构建系统时，应该将它们视为不受信任且可能恶意的实体。无论是提示注入、模型试图逃脱沙盒，还是其他尚未想到的威胁，都不应该信任智能体。正确的方法不是更好的权限检查或更智能的允许列表，而是架构设计要假设智能体会行为不端并在发生时限制损害。

这就是我构建 NanoClaw 的原则。

不要信任过程 OpenClaw 默认在主机上运行。它有一个可选的沙盒模式，但大多数用户从不启用它。没有它，安全依赖于应用程序级别的检查：允许列表、确认提示、安全命令。这些都基于一种隐式信任，即智能体不会做错事。一旦你接受智能体可能存在恶意，很明显应用程序级别的阻止是不够的。

在 NanoClaw 中，容器隔离是核心。每个智能体在自己的容器中运行，每次调用时创建，之后销毁。边界由操作系统强制执行，而不是应用程序。

不要信任其他智能体即使启用了 OpenClaw 的沙盒，所有智能体都共享同一个容器。你的个人助手和工作助手坐在同一个环境中，信息可能在应该访问不同数据的智能体之间泄露。

在 NanoClaw 中，每个智能体都有自己的容器、文件系统和会话历史。你的个人助手无法看到你的工作助手的数据，因为它们在完全独立的沙盒中运行。

不要信任你无法读取的内容 OpenClaw 有近 50 万行代码和 70 多个依赖项。没有人审查过所有代码。它是在几周内编写的，没有适当的审查流程。复杂性是漏洞隐藏的地方。

NanoClaw 大约 3,000 行。一个开发人员可以在一个下午内审查整个代码库。新功能通过技能添加。你审查添加的确切代码，只添加你需要的内容。每个安装都是根据所有者的要求定制的几千行代码。

对于一个 40 万行的单体代码，即使你只使用两个集成，其余部分仍然被加载，仍然是你攻击面的一部分。有了技能，边界是明确的。

为不信任而设计如果一个行为不端的智能体可以造成安全问题，那么安全模型就被破坏了。安全必须在智能体表面之外执行。容器和文件系统隔离的存在是为了即使智能体做了意想不到的事情，爆炸范围也会被限制。

这些都不能消除风险。但正确的回应是使信任尽可能狭窄和可验证。不要信任智能体。在它周围建墙。

Don't trust AI agents

English

中文

继续阅读