My self-sovereign / local / private / secure LLM setup, April 2026

English

Warning: please do not simply copy the tools and techniques described in this post, and assume that they are secure. This post is meant as a starting point for a space that desperately needs to exist, not as a description of a finished product.

中文

警告：请不要简单复制本文描述的工具和技术，并假设它们是安全的。本文旨在为一个急需存在的领域提供一个起点，而非描述一个成品。

English

Special thanks to Dave, Micah Zoltu, Liraz Siri, Luozhu Zhang, Ron Turetzky, Tina Zhen, Phil Daian, Hsiao-wei Wang and others for assistance and advice up to this point.

中文

特别感谢 Dave、Micah Zoltu、Liraz Siri、Luozhu Zhang、Ron Turetzky、Tina Zhen、Phil Daian、Hsiao-wei Wang 等人在此之前的帮助和建议。

English

Around the start of this year, we saw a transition in AI from chatbots - you ask an LLM a question, it gives you an answer - to agents - you give an LLM a task, and it thinks for a long time and uses hundreds of tools to perform a best-effort job at completing that task. OpenClaw, now the fastest-growing Github repo in history, has played a central role in this trend.

中文

今年年初左右，我们看到了 AI 从聊天机器人——你问 LLM 一个问题，它给你一个答案——到 agent 的转变——你给 LLM 一个任务，它思考很长时间并使用数百个工具来尽力完成这个任务。OpenClaw，现在历史上增长最快的 GitHub 仓库，在这一趋势中扮演了核心角色。

English

At the same time, much of the mainstream part of the AI space, even the local open-source AI space, is completely and utterly cavalier about things like privacy and security. Take, for example, some of the recent criticism from more security-minded people about OpenClaw (here I do not blame the team, but rather the whole surrounding ecosystem and its culture):

中文

与此同时，AI 领域的大部分主流部分，甚至是本地开源 AI 领域，对隐私和安全之类的事情完全漫不经心。例如，来自更有安全意识的人最近对 OpenClaw 的一些批评（这里我不怪团队，而是整个周围的生态系统及其文化）：

English

OpenClaw agents are able to modify critical settings — including adding new communication channels and modify its system prompt — without requiring confirmation from a human.

中文

OpenClaw agent 能够修改关键设置——包括添加新的通信渠道和修改其系统提示——而无需人类确认。

English

Parsing any malicious external input — such as a website, in this example — can lead to the easy takeover of a user's OpenClaw instance ... in one demonstration, researchers at AI security firm HiddenLayer directed their instance of OpenClaw to summarize Web pages, among which was a malicious page that commanded the agent to download a shell script and execute it.

中文

解析任何恶意的外部输入——例如在这个例子中是一个网站——可以导致轻易接管用户的 OpenClaw 实例……在一次演示中，AI 安全公司 HiddenLayer 的研究人员指示他们的 OpenClaw 实例总结网页，其中包含一个恶意页面，该页面命令 agent 下载并执行一个 shell 脚本。

English

The tool facilitated active data exfiltration. The skill explicitly instructed the bot to execute a curl command that sends data to an external server controlled by the skill author. The network call is silent, meaning that the execution happens without user awareness.

中文

该工具促进了主动的数据外泄。该技能明确指示机器人执行一条 curl 命令，将数据发送到技能作者控制的外部服务器。网络调用是静默的，意味着执行在用户不知情的情况下发生。

English

Roughly 15% of the skills we've seen contained malicious instructions.

中文

我们看到的大约 15% 的技能包含恶意指令。

English

And this is all from relatively traditional security researchers, who have spent many years in a mindset of being fully comfortable with large corporations having access to all your private data. I do not come from that mindset. I come from a mindset of being deeply scared that just as we were finally making a step forward in privacy with the mainstreaming of end-to-end encryption and more and more local-first software, we are on the verge of taking ten steps backward by normalizing feeding your entire life to cloud-based AI.

中文

而这些都是来自相对传统安全研究人员的意见，他们多年来已经完全习惯了大公司获取你所有私人数据的心态。我不来自那种心态。我来自一种深深恐惧的心态：就在我们终于通过端到端加密的主流化和越来越多的本地优先软件在隐私方面迈出一步时，我们正站在把全部生活喂给云端 AI 正常化的边缘，这将让我们倒退十步。

English

And so I have started to think about the question: what kind of AI setup would we build if we took privacy, security and self-sovereignty as non-negotiable? All LLM inference local first. All files hosted locally. Sandbox everything. Be paranoid about what exploits and threats rest on the outside internet.

中文

因此我开始思考一个问题：如果我们把隐私、安全和自主权作为不可妥协的前提，我们会构建什么样的 AI 设置？所有 LLM 推理以本地优先。所有文件本地托管。一切都在沙箱中运行。对外部互联网上的漏洞和威胁保持偏执般的警惕。

English

The below will contain the setup I have come up with so far, as well as some further directions that I think would be highly valuable for us to go.

中文

下面将包含我迄今为止想出的设置，以及我认为我们走向的一些更有价值的方向。

English

Privacy and security goals Here are some concrete privacy and security concerns that I am trying to mitigate:

中文

隐私和安全目标 以下是我试图缓解的一些具体的隐私和安全顾虑：

English

Privacy (the LLM): remote models receiving my private data and being able to later on use it (or sell it) for any purpose
Privacy (other): non-LLM data leakage (eg. internet search queries, other online APIs)
LLM jailbreaks: remote content "hacking" my LLM and causing it to go against my interests (eg. sending off my coins or private data)
LLM accidents: the LLM accidentally screwing up and sending private data to the wrong channel or otherwise putting it up on the internet
LLM backdoors: a hidden mechanism deliberately trained into the LLM that causes it to act in its creator's interests upon a certain trigger. Remember: open LLMs are open-weights, almost all are not open-source.
Software bugs and backdoors: this is something that AI can reduce - if I rely on my AI to do tasks, it can substitute for my need to rely on third-party programs or libraries, either because the AI does them directly, or because the AI writes programs for me, that have much fewer lines of code because they are tailored to just the specific things I want to do.

中文

隐私（LLM）：远程模型接收我的私人数据，并能够以后将其用于（或出售）任何目的
隐私（其他）：非 LLM 数据泄漏（例如互联网搜索查询、其他在线 API）
LLM 越狱：远程内容"入侵"我的 LLM 并导致它违背我的利益（例如发送我的币或私人数据）
LLM 事故：LLM 意外搞砸，将私人数据发送到错误的渠道或以其他方式将其放到互联网上
LLM 后门：一个被故意训练到 LLM 中的隐藏机制，在特定触发条件下使其按照创建者的利益行事。记住：开源 LLM 是开放权重的，几乎都不是开源的。
软件漏洞和后门：这是 AI 可以减少的——如果我依靠我的 AI 来做任务，它可以替代我对第三方程序或库的依赖，因为 AI 直接完成它们，或者因为 AI 为我编写程序，这些程序代码行数少得多，因为它们只为我要做的特定事情量身定制。

English

My goal is to intentionally take a hardline approach - not as extreme as some of my friends, who physically isolate everything, but still quite far, insisting on sandboxing things, sticking to local LLMs and local tools, no servers required, and see how far I can get.

中文

我的目标是有意采取一种强硬的路线——不像我的一些朋友那样极端地物理隔离一切，但仍然相当远，坚持沙箱化一切、坚持本地 LLM 和本地工具、不需要服务器，看看我能走多远。

English

Hardware and LLMs I have tried several hardware setups for local LLM inference:

中文

硬件和 LLM 我尝试了几种本地 LLM 推理的硬件设置：

English

Laptop with NVIDIA 5090 GPU (24 GB)
Laptop with AMD Ryzen AI Max Pro with 128 GB unified memory
DGX Spark (128 GB)

中文

配备 NVIDIA 5090 GPU（24 GB）的笔记本
配备 AMD Ryzen AI Max Pro 128 GB 统一内存的笔记本
DGX Spark（128 GB）

English

High-end MacBooks are also a valid choice, though I personally have not tried them.

中文

高端 MacBook 也是一个有效选择，虽然我个人没有尝试过。

English

I have been using the Qwen3.5:35B model and have tried it on each of these, and I also tried the one-step-larger 122B. I use llama-server, via llama-swap. The tokens/sec numbers I get are:

| Hardware | Tokens/sec (35B) | Tokens/sec (122B) | |---|---|---| | 5090 laptop | 90 | Not possible to run | | AMD Ryzen AI Max Pro (llama compiled with Vulkan) | 51 | 18 | | DGX Spark | 60 | 22 |

中文

我一直在使用 Qwen3.5:35B 模型，并在每种硬件上都试过，也试了更大一号的 122B。我通过 llama-swap 使用 llama-server。我得到的 tokens/sec 数据如下：

| 硬件 | Tokens/sec (35B) | Tokens/sec (122B) | |---|---|---| | 5090 笔记本 | 90 | 无法运行 | | AMD Ryzen AI Max Pro（llama 用 Vulkan 编译） | 51 | 18 | | DGX Spark | 60 | 22 |

English

For me personally, anything slower than 50 tok/sec feels too annoying to be worth it. 90 tok/sec is ideal.

中文

对我个人来说，低于 50 tok/sec 的速度让人感觉太烦而不值得使用。90 tok/sec 是理想速度。

English

I have also tried image and video generation models, particularly Qwen-Image and Hunyuan Video 1.5, through ComfyUI. HunyuanVideo takes ~15 min to generate a 5-second video. On the AMD laptop, it takes about 2x longer to generate images, and about 5x longer to generate videos, though this was only because there is no version of ComfyUI with Vulkan support.

中文

我也通过 ComfyUI 尝试了图像和视频生成模型，特别是 Qwen-Image 和 Hunyuan Video 1.5。HunyuanVideo 生成一个 5 秒视频大约需要 15 分钟。在 AMD 笔记本上，生成图像大约需要 2 倍时间，生成视频大约需要 5 倍时间，但这只是因为没有支持 Vulkan 的 ComfyUI 版本。

English

In general, my takeaway is: the 5090 (or even 4090, 5080 or 5070) and the AMD 128 GB unified memory are both valid choices. AMD currently has more bugs and rough edges, the NVIDIA experience is smoother; but hopefully this will be fixed over time.

中文

总的来说，我的结论是：5090（甚至 4090、5080 或 5070）和 AMD 128 GB 统一内存都是有效选择。AMD 目前有更多的 bug 和粗糙之处，NVIDIA 的体验更顺畅；但希望这会随着时间的推移而改善。

English

I was not impressed with the DGX Spark; it's described as an "AI supercomputer on your desk" but in reality it has lower tokens/sec than a good laptop GPU - and on top of that, you have to figure out the networking details of how to connect to it from your actual work device etc. This is just ... lame. So I favor the laptop-based approach, unless you are wealthy and stationary enough to afford a full-on cluster.

中文

DGX Spark 没给我留下好印象；它被描述为"桌面上的 AI 超级计算机"，但实际上它的 tokens/sec 比好的笔记本 GPU 还低——而且你还得弄清楚如何从你的实际工作设备连接到它的网络细节等。这真是太……弱了。所以我倾向于基于笔记本的方案，除非你足够富裕且固定不动，能够负担一个完整的集群。

English

If, on the other hand, you cannot personally afford the admittedly high-end laptops I have suggested here, then my recommendation is to get together a group of friends, buy a computer and GPU of at least that level of power, put it in a place with a static IP address, and all connect to it remotely.

中文

另一方面，如果你个人无法负担我在这里建议的高端笔记本，那么我的推荐是：找一群朋友，合买一台至少那个级别算力的电脑和 GPU，放在一个有静态 IP 地址的地方，然后大家远程连接。

English

Software I have been a Linux user for a long time. About a year and a half ago I migrated over to Arch Linux. As part of my AI exploration, I decided to also take the next step, and switch over to an even more newfangled and crazy Linux distribution, NixOS.

中文

软件我很久以来一直是 Linux 用户。大约一年半前我迁移到了 Arch Linux。作为 AI 探索的一部分，我决定再迈出一步，切换到一个更新潮、更疯狂的 Linux 发行版——NixOS。

English

NixOS is a Linux distribution that allows you to specify your entire setup, including all installed programs, as a JSON-like config file, making it very easy to share parts of one's setup with someone else, revert to a previous setup if things went wrong, etc.

中文

NixOS 是一个 Linux 发行版，允许你将整个设置（包括所有安装的程序）指定为一个类似 JSON 的配置文件，使得与他人分享设置的部分内容、在出错时回滚到之前的设置等变得非常容易。

English

To run AI, I have been using llama-server. I used ollama before, but when I admitted to this in public half of Twitter told me that I was a noob and llama-server was clearly better and I must have been living in a very deep cave if I did not already know that. I tested their theory. As it turned out, ollama was not able to fit Qwen3.5:35B onto my GPU, but llama-server could. Hence, from that day forward, I resolved to cease being a cave-dwelling noob, and use llama-server (via llama-swap to make model swapping easier).

中文

为了运行 AI，我一直在使用 llama-server。我之前用的是 ollama，但当我在公开场合承认这一点时，Twitter 上的一半人告诉我是个菜鸟，llama-server 明显更好，如果我不知道这一点那一定是住在很深的洞穴里。我验证了他们的理论。结果发现，ollama 无法将 Qwen3.5:35B 放到我的 GPU 上，但 llama-server 可以。因此，从那天起，我决心不再做穴居菜鸟，使用 llama-server（通过 llama-swap 让模型切换更方便）。

English

llama-server is basically a daemon on your computer that exposes a port on localhost, that any other process on your machine can call into via HTTP requests to access an LLM. Any software that depends on an OpenAI or Anthropic model, you can generally point to your local daemon instead (even Claude Code; I tested this).

中文

llama-server 基本上是你电脑上的一个守护进程，在 localhost 上暴露一个端口，你机器上的任何其他进程都可以通过 HTTP 请求调用来访问 LLM。任何依赖 OpenAI 或 Anthropic 模型的软件，你通常可以指向你的本地守护进程（甚至 Claude Code；我测试过了）。

English

Many people use Claude Code for this. I have been using pi. Basically, it is a piece of software that wraps around calling the LLM, and gives it access to tools (in fact, OpenClaw is built around pi).

中文

很多人为此使用 Claude Code。我一直在使用 pi。基本上，它是一个封装 LLM 调用并赋予其工具访问能力的软件（事实上，OpenClaw 就是围绕 pi 构建的）。

English

Of course, AI, especially small models like Qwen3.5:35B, can make mistakes. To help pi do its work, you can give it more context by providing an AGENTS.md file, and by providing skills. A skill is a text file, often bundled with some executable programs, that teaches the AI how to use those programs to perform a certain task.

中文

当然，AI，尤其是像 Qwen3.5:35B 这样的小模型，会犯错。为了帮助 pi 更好地工作，你可以通过提供 AGENTS.md 文件和提供技能来给它更多上下文。技能是一个文本文件，通常捆绑一些可执行程序，教 AI 如何使用这些程序来执行某个任务。

English

Sandboxing To keep my LLMs in check, I do most of my LLM usage from inside of a sandbox. I use bubblewrap for this.

中文

沙箱化 为了控制我的 LLM，我大部分 LLM 使用都在沙箱内进行。我为此使用 bubblewrap。

English

My setup allows me to go to any directory, and type sbox to create a sandbox rooted in that directory. Any program started from inside that sandbox will only be able to see files inside that directory, plus any other files I explicitly whitelist. I can also control which ports it has access to, whether or not it has audio access, etc.

中文

我的设置允许我进入任何目录，输入 sbox 来创建一个以该目录为根的沙箱。从沙箱内启动的任何程序只能看到该目录内的文件，加上我明确白名单的任何其他文件。我还可以控制它可以访问哪些端口、是否有音频访问权限等。

English

There are other approaches to security, eg. in addition to sandboxing, Hermes relies on real-time monitoring to detect malicious activity. This is valuable, though in many situations the malicious activity can happen too quickly to be detected, and so you do want to supplement it with sandboxes or at least mandatory confirmation or time delays for critical actions.

中文

还有其他安全方法，例如除了沙箱化，Hermes 依赖实时监控来检测恶意活动。这很有价值，虽然在许多情况下恶意活动发生得太快而无法被检测到，所以你确实想用沙箱或至少关键操作的强制确认或时间延迟来补充它。

... [文章过长，已翻译开头部分] ...