<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://blog.higcp.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.higcp.com/" rel="alternate" type="text/html" /><updated>2026-05-17T19:13:49+08:00</updated><id>https://blog.higcp.com/feed.xml</id><title type="html">Chris Yang</title><subtitle>Technical notes from Chris Yang. Topics: Google Cloud, TPU v7 Ironwood, GPU inference (vLLM / SGLang), LLM training, infrastructure debugging.</subtitle><author><name>Chris Yang</name></author><entry xml:lang="bilingual"><title type="html">Hybrid agent runtimes: how Claude Code, OpenClaw, and Kilo grew into each other’s strengths</title><link href="https://blog.higcp.com/2026/05/17/hybrid-agent-runtimes/" rel="alternate" type="text/html" title="Hybrid agent runtimes: how Claude Code, OpenClaw, and Kilo grew into each other’s strengths" /><published>2026-05-17T18:50:00+08:00</published><updated>2026-05-17T18:50:00+08:00</updated><id>https://blog.higcp.com/2026/05/17/hybrid-agent-runtimes</id><content type="html" xml:base="https://blog.higcp.com/2026/05/17/hybrid-agent-runtimes/"><![CDATA[<style>
.lang-toggle { display: inline-flex; margin: 8px 0 20px; border: 1px solid #E8EAED; border-radius: 4px; overflow: hidden; }
.lang-toggle button { background: #fff; border: none; color: #5F6368; padding: 6px 18px; cursor: pointer; font-family: inherit; font-size: 13px; font-weight: 500; letter-spacing: 0.2px; transition: background 0.15s, color 0.15s; }
.lang-toggle button + button { border-left: 1px solid #E8EAED; }
.lang-toggle button.active { background: #1A73E8; color: #fff; }
.lang-toggle button:hover:not(.active) { background: #F1F3F4; color: #202124; }
.lang-content { transition: opacity 0.15s; }
.lang-content[hidden] { display: none; }
</style>

<div class="lang-toggle" role="tablist" aria-label="语言切换 / Language toggle">
  <button type="button" data-lang="zh" class="active" role="tab" aria-selected="true">中文</button>
  <button type="button" data-lang="en" role="tab" aria-selected="false">English</button>
</div>

<div class="lang-content lang-zh">

  <p><img src="/assets/img/hybrid-agent-crab-hero.jpg" alt="一只赛博朋克机械蟹子，胸腔内部嵌着三个发光的 agent runtime 模块：Claude Code / OpenClaw / Kilo" /></p>

  <p>一个 bot 的”人格”——它的 system prompt、记忆、对环境的认知、对话风格——跟执行它的 agent CLI runtime 是相互独立的。在 CloseCrab 里我们证明了这一点:同一个 bot 可以经由三个差别很大的 runtime 来跑–<a href="https://docs.claude.com/en/docs/claude-code/cli-reference">Claude Code CLI</a>、<a href="https://github.com/openclaw/openclaw">OpenClaw</a> ACP gateway、以及 <a href="https://kilocode.ai/">Kilo</a>。但真正有意思的问题不是”能不能运行时切换”–这从第一天就 work–而是”如果把每个 runtime 当作一个有自己看家本事的物种,让它们的优势在彼此之间杂交,会怎样?”</p>

  <p>36 小时内,我们跑完了这个实验。结果是三个 runtime 现在每一个都比周五晚上更强,不是靠 upstream 贡献,而是靠<strong>吸收了另外两个 runtime 早已搞定的能力</strong>。所有 patch 没改任何协议,没动模型 serving 栈,全部都是把一个 runtime 拥有、另外两个还缺的<strong>能力</strong>整个吸收过来。</p>

  <h2 id="tldr">TL;DR</h2>

  <ul>
    <li>三个 runtime 各有看家本事,谁都不是严格最强</li>
    <li>36 小时跨物种杂交之后,每个 runtime 都吸收了 2-10 项原本没有的能力,<strong>每一个都比周五版本严格更强</strong></li>
    <li>除了单一 runtime 的能力增益之外,还出现了一组<strong>涌现能力</strong>–它们只在三个 runtime 共存时才存在,任何单一 runtime 都没有</li>
    <li>杂交循环成本很低(每项能力 ~15 分钟),按探测对数量线性扩展</li>
  </ul>

  <p><strong>核心结论</strong>: 把多个 agent CLI runtime 当作一个有差异的种群,在它们之间定向迁移能力,比选一个”最优 runtime”独立优化产生更强的生态。</p>

  <h2 id="runtime-">三个 runtime 各自的看家本事</h2>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>通信方式</th>
        <th>原生看家本事</th>
        <th>周五版本缺失的能力</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Claude Code</td>
        <td>Unix socketpair + stream-JSON</td>
        <td>工具集最丰富、原生并发 <code class="language-plaintext highlighter-rouge">tool_use</code>、stream-JSON 事件模型成熟</td>
        <td>空回复无重试、subprocess 侧 tempfile 泄漏、无语义记忆索引</td>
      </tr>
      <tr>
        <td>OpenClaw</td>
        <td>ACP (JSON-RPC over stdio)</td>
        <td>模型选择最广、1M token 上下文、sqlite 后端 <code class="language-plaintext highlighter-rouge">memory_search</code></td>
        <td>启动时不自配置、indexer 不 follow symlink、不知道团队共享文档</td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td>HTTP SSE</td>
        <td>启动最快(~3s)、<code class="language-plaintext highlighter-rouge">part.delta</code> 流式、模型无关抽象</td>
        <td>无 streaming buffer 恢复、不知道多媒体生成脚本、usage 字段统计脆弱</td>
      </tr>
    </tbody>
  </table>

  <p>每一行右边的”缺失”不是上游工具的 bug–而是<strong>别的 runtime 已经搞定、它还没吸收的能力</strong>。</p>

  <p><strong>核心结论</strong>: 每个 runtime 都是局部的。有意思的设计问题不是”谁赢”,而是”让每个都变完整要多便宜”。</p>

  <h2 id="firestore-inbox">让这一切丝滑运转的基础设施: Firestore Inbox</h2>

  <p>整个实验能跑起来的前提是<strong>bot 之间能在不依赖 runtime 本身能力的前提下互相对话</strong>。这件事不是 agent CLI 自带的–Claude Code 不知道 OpenClaw 上有兄弟、OpenClaw 不会主动联络 Kilo,任一上游 runtime 都没有”bot 间消息”这层抽象。</p>

  <p>CloseCrab 早些时候搭起来的 <strong>Firestore Inbox</strong> 解锁了这层能力,而且解锁得很彻底:</p>

  <ul>
    <li><strong>基于 Firestore <code class="language-plaintext highlighter-rouge">on_snapshot</code> 的实时推送</strong>–不是轮询。Bot A 写 <code class="language-plaintext highlighter-rouge">inbox/&lt;doc&gt;</code>,bot B 几十毫秒内就收到 callback。整个实验的每一次跨 runtime probe 都是一个 (写 inbox → 等回执) 的 round trip,如果靠轮询根本撑不下来</li>
    <li><strong>跟 runtime 完全解耦</strong>。Bot A 不需要知道 bot B 跑的是 Claude Code、OpenClaw 还是 Kilo,inbox 收发都是一致的 Firestore document。今天 bunny 在 7 次 runtime 切换里持续接收 tiemu 派的 probe,中间无需任何重连或重新订阅</li>
    <li><strong>天然带回执模式</strong>。Bot B 处理完任务后会用 <code class="language-plaintext highlighter-rouge">✅ 任务完成: ...</code> 格式自动写回 inbox,sender 端能在自己的 bot.log 里看到结构化的回执。今天所有”小爱 → tiemu 报告”、”bunny → tiemu 回答 probe”都是这条路径</li>
    <li><strong>跨进程持久化</strong>。Inbox doc 写到 Firestore 立刻 durable。哪怕 bot B 在收到那一刻刚好被 restart,它启动后 on_snapshot 重新订阅时仍能拿到 unread doc–今天 7 次切换 worker 不丢一条消息就是这个保证</li>
    <li><strong>天然支持多种拓扑</strong>: 一对一(tiemu → bunny)、一对多(tiemu 同时派活给 bunny + 小爱)、多对一(bunny + 小爱同时报回 tiemu)、双向多轮(probe → answer → counter-probe 来回几轮)。所有这些都用一份 Firestore collection,没有 RPC 框架、没有 service discovery、没有 mesh sidecar</li>
  </ul>

  <p>具体到这次实验: tiemu 派一道题给 bunny 的成本大约是 <strong>20 行 Python + 一次 Firestore document write</strong>。从 closecrab 视角看,inbox 就是 bot 间通讯的 dataframe–发送、订阅、回执三件事各 10 行左右。整个 36 小时里 70+ 条 inbox 消息往返,无一丢失。</p>

  <p>如果换一种通讯基础设施(比如 HTTP webhook、消息队列、共享文件 polling),实验仍然能做,但每次”换 runtime + 探测对方”就会涉及连接管理、序列化、重试这些 boilerplate,bot 之间不能像今天这样<strong>毫不知情地完成 runtime 切换却保留通信状态</strong>。</p>

  <p><strong>核心结论</strong>: 跨 runtime 能力迁移本身固然有意义,但让它”丝滑”的不是这些 patch,是底下那条 Firestore inbox 总线。任何想做多 runtime 异构编排的人,先把消息底座做好再说。</p>

  <h2 id="section">两天内的能力迁移</h2>

  <h3 id="claude-code-">Claude Code 吸收的能力</h3>

  <table>
    <thead>
      <tr>
        <th>吸收的能力</th>
        <th>来源模式</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>空回复重试韧性</td>
        <td>OpenClaw 的 <code class="language-plaintext highlighter-rouge">_retry_on_empty_response</code></td>
        <td><code class="language-plaintext highlighter-rouge">613b2a5</code></td>
      </tr>
      <tr>
        <td>Subprocess 端 tempfile 生命周期卫生</td>
        <td>OpenClaw 和 Kilo 早就遵循的纪律</td>
        <td><code class="language-plaintext highlighter-rouge">613b2a5</code></td>
      </tr>
    </tbody>
  </table>

  <p>重试模式是逐行移植。当 LLM 返回空 completion,runtime 现在会在同一 session 里把同一 prompt 重发一次,再决定要不要给用户兜底文案。具体实现不同(Claude Code 通过 Unix socket 写一行 stream-JSON,OpenClaw 通过 stdin 发 JSON-RPC),但状态机一样:置一个一次性 retry 标记 → 清累加器 → 重发 → 继续读循环。</p>

  <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="ow">not</span> <span class="n">result_text</span><span class="p">:</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">empty_retry_done</span><span class="p">:</span>
        <span class="n">empty_retry_done</span> <span class="o">=</span> <span class="bp">True</span>
        <span class="n">accumulated_reply_text</span> <span class="o">=</span> <span class="s">""</span>
        <span class="n">saw_task_notification</span> <span class="o">=</span> <span class="bp">False</span>
        <span class="n">_send_prompt</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
        <span class="k">continue</span>
<span class="k">return</span> <span class="n">result_text</span> <span class="ow">or</span> <span class="s">"(Claude 处理完成但未生成文字回复)"</span>
</code></pre></div>  </div>

  <p>Tempfile 清理只一行代码,但影响真实:生产机上 Claude Code 在过去几周累计泄漏了 85 个零字节的 <code class="language-plaintext highlighter-rouge">/tmp/claude_stderr_*.log</code>。修复后,无论 bot 经过多少次重启,数量都稳定保持 1(当前进程自己的 log)。</p>

  <h3 id="openclaw-">OpenClaw 吸收的能力</h3>

  <table>
    <thead>
      <tr>
        <th>吸收的能力</th>
        <th>来源模式</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>启动时 <code class="language-plaintext highlighter-rouge">agents.list</code> 自配置</td>
        <td>Claude Code “按约定自动加载” 模型</td>
        <td><code class="language-plaintext highlighter-rouge">8a64cd2</code></td>
      </tr>
      <tr>
        <td>基于 hardlink 的 memory wiring</td>
        <td>Claude Code 直接读文件的记忆模型</td>
        <td><code class="language-plaintext highlighter-rouge">9897054</code></td>
      </tr>
      <tr>
        <td>Bot 启动自动 reindex</td>
        <td>“启动时自愈” 模式</td>
        <td><code class="language-plaintext highlighter-rouge">9897054</code></td>
      </tr>
      <tr>
        <td>跨主机团队基础设施文档自动同步(9 个文档)</td>
        <td>借鉴自 Kilo 的 <code class="language-plaintext highlighter-rouge">memory-guide.md</code> auto-load 思路</td>
        <td><code class="language-plaintext highlighter-rouge">fdbe7a7</code></td>
      </tr>
      <tr>
        <td>Retry 路径的 streaming buffer 一致性(step buffer + flush)</td>
        <td>镜像 Kilo 的 <code class="language-plaintext highlighter-rouge">part.delta</code> flush 纪律</td>
        <td><code class="language-plaintext highlighter-rouge">e72c62e</code></td>
      </tr>
    </tbody>
  </table>

  <p>收益最大的一段。OpenClaw 自带最 sophisticated 的 memory 搜索(真的 sqlite 向量索引 + <code class="language-plaintext highlighter-rouge">memory_search</code> 作为 tool),但 workspace 配置很脆弱–indexer 不 follow symlink,所以即使 <code class="language-plaintext highlighter-rouge">memory/</code> 是正确 symlink,bot 出厂时的索引是空的。Hardlink 修复改用同 inode 的 hardlink(同文件系统),跨文件系统的 GCS shared 用 <code class="language-plaintext highlighter-rouge">shutil.copyfile</code> 同步。修复前: 0/0 files。修复后: 101/101 files、282 chunks、语义搜索分数 ≥ 0.78 命中之前 runtime 完全看不见的内容。</p>

  <p><code class="language-plaintext highlighter-rouge">agents.list</code> 自愈长期看更有价值: 之前把任何新 bot 切到 OpenClaw 需要手动编辑 config,现在零手工–bot 第一次启动时把自己的条目写进 gateway config。</p>

  <h3 id="kilo-">Kilo 吸收的能力</h3>

  <table>
    <thead>
      <tr>
        <th>吸收的能力</th>
        <th>来源模式</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>通过 <code class="language-plaintext highlighter-rouge">message.part.delta</code> 缓冲救回 streaming 文本</td>
        <td>Claude Code stream-JSON delta 处理</td>
        <td><code class="language-plaintext highlighter-rouge">add99a9</code></td>
      </tr>
      <tr>
        <td>System prompt 里通用工具使用规则</td>
        <td>Claude Code 多年的 tool guidelines</td>
        <td><code class="language-plaintext highlighter-rouge">d9e294e</code></td>
      </tr>
      <tr>
        <td>防止身份串号的 per-bot session 隔离</td>
        <td>OpenClaw 的 per-bot agents.list 纪律</td>
        <td><code class="language-plaintext highlighter-rouge">ba37a22</code></td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">task</code>(subagent) 使用纪律</td>
        <td>OpenClaw subagent guide</td>
        <td><code class="language-plaintext highlighter-rouge">622de25</code></td>
      </tr>
      <tr>
        <td>工具批处理 + bash 真并行规则</td>
        <td>Claude Code 并发 tool_use 经验</td>
        <td><code class="language-plaintext highlighter-rouge">a82871f</code></td>
      </tr>
      <tr>
        <td>自启动 cron 守护进程 + <code class="language-plaintext highlighter-rouge">session_status</code> 工具</td>
        <td>OpenClaw cron + Claude Code session 检查</td>
        <td><code class="language-plaintext highlighter-rouge">e430b0b</code></td>
      </tr>
      <tr>
        <td>多媒体生成脚本(<code class="language-plaintext highlighter-rouge">imagen</code> / <code class="language-plaintext highlighter-rouge">tts</code>)的认知</td>
        <td>Discoverability 早在 Claude Code workspace</td>
        <td><code class="language-plaintext highlighter-rouge">1286279</code></td>
      </tr>
      <tr>
        <td>Usage 统计一致性(input / output / cache tokens)</td>
        <td>OpenClaw usage 追踪</td>
        <td><code class="language-plaintext highlighter-rouge">0bd1daf</code></td>
      </tr>
    </tbody>
  </table>

  <p>最 heterogeneous 的一组,反映 Kilo 是三者里最年轻、距 production 距离最远的。这些没有任何一项是 Kilo upstream 贡献–都是 closecrab 这层 wrapper 教 Kilo 如何使用 Claude Code 和 OpenClaw bot 已经用了好几周的设施。结果是 Kilo 不再是”试验”runtime–它在 head-to-head 延迟比较里现在能赢另外两个(见下面 “压力测试”)。</p>

  <p><strong>核心结论</strong>: 最便宜的能力迁移就是源 runtime 已经把问题解决、目标 runtime 只需要被<strong>告知方案存在</strong>的那种。工具感知、脚本感知、prompt 规则吸收–对 Kilo 都是单 commit 收益。</p>

  <h2 id="runtime--1">涌现能力(任何单一 runtime 都没有)</h2>

  <p>有三项能力只因为三个 runtime 共存才存在:</p>

  <h3 id="runtime-model-">1. 跨 runtime model 名翻译</h3>

  <p>每个 runtime 给同一底层模型用不同名字:</p>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>Claude Opus 4.7 model 字符串</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Claude Code</td>
        <td><code class="language-plaintext highlighter-rouge">claude-opus-4-7@default</code></td>
      </tr>
      <tr>
        <td>OpenClaw</td>
        <td><code class="language-plaintext highlighter-rouge">anthropic-vertex/claude-opus-4-7</code></td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td><code class="language-plaintext highlighter-rouge">google-vertex-anthropic/claude-opus-4-7@default</code></td>
      </tr>
    </tbody>
  </table>

  <p><code class="language-plaintext highlighter-rouge">scripts/config-manage.py</code>(<code class="language-plaintext highlighter-rouge">f6647a3</code>)拿到了一个 preset-aware 翻译器。bot 切 runtime 时,model 字符串会通过 <code class="language-plaintext highlighter-rouge">_detect_preset</code> + <code class="language-plaintext highlighter-rouge">_model_for_worker</code> 自动重写,substring fingerprint 兜底处理之前误配置的 bot。任何单一上游工具都不会有这个能力,因为单一上游工具不需要–它是同时跑多个 runtime 才会产生的能力。</p>

  <h3 id="section-1">2. 保留完整状态的运行时切换</h3>

  <p>closecrab 中间件在切 runtime 时保留 bot 的人格、记忆、团队上下文。一个 bot 在 20 秒内能在 Claude Code、OpenClaw、Kilo 之间移动,不丢失任何长期记忆、不需要手工重配。Runtime 特定的自愈(OpenClaw 的 hardlink + reindex、Kilo 的 HTTP server spawn、Claude Code 的 session resume)自动跑。</p>

  <h3 id="heterogeneous-">3. Heterogeneous 互测</h3>

  <p>runtime A 上的 bot 可以向 runtime B 上的 bot 探测同一能力,不依赖协议耦合地报告差异。这点完全架设在前面讲的 Firestore inbox 上,sender 跟 receiver 各自在什么 runtime 跨过 inbox 完全不可见。今天 4 项吸收能力里有 3 项是这么发现的:不是我们读源码看出来的,而是 runtime X 上的 bot 注意到 runtime Y 上的兄弟能做它做不了的事。</p>

  <p><strong>核心结论</strong>: 这些涌现能力是支持 heterogeneous-runtime 策略的最强论据。任何人都不大可能把它们往单一 agent CLI 里推 upstream,因为它们只在 orchestration 层才有意义。</p>

  <h2 id="section-2">一个副产品发现: 静默的备份回退</h2>

  <p>Heterogeneous bot 互探同一基础设施还顺手暴露了一个跟任何单一 runtime 都无关的基建层问题。<code class="language-plaintext highlighter-rouge">scripts/sync-memory.sh</code> 在一台机器上跑,但 <code class="language-plaintext highlighter-rouge">~/my-private</code> 是 rsync 目标而非真 git clone。<code class="language-plaintext highlighter-rouge">cd $REPO || exit 1</code> 守卫通过了(目录存在),然后 git 命令静默失败因为没 <code class="language-plaintext highlighter-rouge">set -e</code>,最后还打印 <code class="language-plaintext highlighter-rouge">"Pushed to GitHub (private)"</code>。<strong>memory 备份连续几周静默失败</strong>。</p>

  <p>修复(<code class="language-plaintext highlighter-rouge">85e6cb6</code>)加了显式 <code class="language-plaintext highlighter-rouge">git rev-parse --git-dir</code> 检查,开了 <code class="language-plaintext highlighter-rouge">set -e</code> 任一 git 失败就 abort。这是 36 小时里最高价值的 commit,且不是任何意义上的 runtime feature–能发现是因为不同 runtime 看同一基础设施有不同视角,其中一个注意到了不一致。</p>

  <p><strong>核心结论</strong>: 静默成功是最贵的一类 bug,而且当单一观察者的假设跟静默路径吻合时极难发现。Heterogeneous 观察者是被低估的 debug 工具。</p>

  <h2 id="section-3">架构: 迁移图谱</h2>

  <p><img src="/assets/img/hybrid-agent-architecture.jpg" alt="三个 agent runtime 节点之间的能力迁移架构图，以 Firestore Inbox 为总线" /></p>

  <p>上图是高层的三节点 + 总线示意；下图是实验期间实际发生的杂交迁移路径:</p>

  <svg viewBox="0 0 720 320" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="三个 agent runtime 之间的能力迁移图">
  <circle cx="120" cy="110" r="44" fill="#1A73E8" />
  <text x="120" y="106" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">Claude Code</text>
  <text x="120" y="124" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">socketpair</text>
  <circle cx="600" cy="110" r="44" fill="#1A73E8" />
  <text x="600" y="106" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">OpenClaw</text>
  <text x="600" y="124" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">ACP</text>
  <circle cx="360" cy="260" r="44" fill="#1A73E8" />
  <text x="360" y="256" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">Kilo</text>
  <text x="360" y="274" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">HTTP SSE</text>
  <path d="M 164 110 L 556 110" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-zh)" />
  <text x="360" y="100" text-anchor="middle" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">空回复重试 + tempfile 卫生</text>
  <path d="M 564 138 L 392 244" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-zh)" />
  <text x="510" y="200" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">subagent + cron + usage</text>
  <path d="M 328 244 L 156 138" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-zh)" />
  <text x="190" y="200" text-anchor="end" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">streaming buffer 恢复</text>
  <path d="M 156 130 L 556 130" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-zh)" stroke-dasharray="4,3" />
  <text x="360" y="155" text-anchor="middle" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">hardlink memory + agents.list 自配置</text>
  <defs>
    <marker id="arr-zh" viewBox="0 0 10 10" refX="9" refY="3" markerWidth="6" markerHeight="6" orient="auto">
      <path d="M0,0 L0,6 L9,3 z" fill="#5F6368" />
    </marker>
  </defs>
</svg>

  <p>实线箭头是单向能力迁移,虚线箭头是双向适应:OpenClaw 吸收了 Claude Code 的直接读文件 memory 模型,反过来又激发了 Claude Code 后来的 tempfile 卫生工作。</p>

  <h2 id="section-4">压力测试</h2>

  <p>把同一个 bot 的 runtime 在紧密循环里反复切换,每次切换之间发一个共享记忆查询作为冒烟探针:</p>

  <table>
    <thead>
      <tr>
        <th>Cycle</th>
        <th>方向</th>
        <th>Model 自动翻译</th>
        <th>Bot 启动</th>
        <th>探针内容</th>
        <th>回复长度</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td>
        <td>claude → openclaw</td>
        <td>yes</td>
        <td>yes</td>
        <td>B200 MIG template 名(来自 <code class="language-plaintext highlighter-rouge">shared/gcp-infra.md</code>)</td>
        <td>141 字符</td>
      </tr>
      <tr>
        <td>2</td>
        <td>openclaw → claude</td>
        <td>yes</td>
        <td>yes</td>
        <td>ALModel optimizer(来自 <code class="language-plaintext highlighter-rouge">shared/tpu-training.md</code>)</td>
        <td>615 字符</td>
      </tr>
      <tr>
        <td>3</td>
        <td>claude → kilo</td>
        <td>yes</td>
        <td>yes</td>
        <td>飞书 column_set 限制(来自 <code class="language-plaintext highlighter-rouge">shared/feishu-bot.md</code>)</td>
        <td>116 字符</td>
      </tr>
      <tr>
        <td>4</td>
        <td>kilo → claude</td>
        <td>yes</td>
        <td>yes</td>
        <td>CC 核心模块(来自 <code class="language-plaintext highlighter-rouge">shared/architecture.md</code>)</td>
        <td>851 字符</td>
      </tr>
    </tbody>
  </table>

  <p>端到端切换耗时(包括 model 翻译、bot 重启、runtime 自愈)每 cycle 15-20 秒。一天 7 次连续切换,记忆内容保持一致–通过在每个 runtime 上问 bot 同一事实问题验证答案匹配。</p>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>同问题、同 bot、同共享记忆</th>
        <th>时间</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>OpenClaw</td>
        <td><code class="language-plaintext highlighter-rouge">memory_search</code> + <code class="language-plaintext highlighter-rouge">read</code> + <code class="language-plaintext highlighter-rouge">exec</code>,9 步</td>
        <td>~120s</td>
      </tr>
      <tr>
        <td>Claude Code</td>
        <td>单个 parallel tool_use block 里 3 个 <code class="language-plaintext highlighter-rouge">Grep</code></td>
        <td>42.66s</td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td><code class="language-plaintext highlighter-rouge">bash</code> × 3,串行</td>
        <td>~37s</td>
      </tr>
    </tbody>
  </table>

  <p>Kilo 的时间是惊喜:绝对值上现在它在这个负载里超过了 Claude Code,尽管周初它是三个 runtime 里最不成熟的。这个提升几乎完全来自吸收的能力(streaming flush、tool 批处理规则、原生 cold start 速度保留)。</p>

  <h2 id="section-5">数据</h2>

  <table>
    <thead>
      <tr>
        <th>指标</th>
        <th>值</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>天数</td>
        <td>2</td>
      </tr>
      <tr>
        <td>Runtime 数</td>
        <td>3</td>
      </tr>
      <tr>
        <td>Closecrab commits</td>
        <td>61</td>
      </tr>
      <tr>
        <td>增删行数</td>
        <td>+5,070 / -568</td>
      </tr>
      <tr>
        <td>Claude Code 吸收的能力</td>
        <td>2</td>
      </tr>
      <tr>
        <td>OpenClaw 吸收的能力</td>
        <td>5</td>
      </tr>
      <tr>
        <td>Kilo 吸收的能力</td>
        <td>10</td>
      </tr>
      <tr>
        <td>涌现能力</td>
        <td>3</td>
      </tr>
      <tr>
        <td>基础设施侧顺手发现</td>
        <td>1(静默 backup)</td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">/tmp</code> 泄漏清理</td>
        <td>85 → 0</td>
      </tr>
      <tr>
        <td>每 bot memory 文件 / chunks 索引</td>
        <td>101 / 282</td>
      </tr>
      <tr>
        <td>压力测试 runtime 切换次数</td>
        <td>7</td>
      </tr>
      <tr>
        <td>失败的切换</td>
        <td>0</td>
      </tr>
    </tbody>
  </table>

  <h2 id="section-6">我们刻意不做的事</h2>

  <ul>
    <li>不引入对三个 runtime 的统一抽象层。每个保留自己 idiomatic 的表面,只有 closecrab 中间件和运维工具理解全部三个。整个实验的意义就是<strong>保留 runtime 的多样性</strong></li>
    <li>不自动化能力迁移循环本身。每次迁移都是人触发的–读一个 runtime 的 commit history 然后在另一个 runtime 上发定向 probe。自动化技术上很直接,但只有三个 runtime 在 scope 内时为时尚早</li>
    <li>不修改任何 runtime 的协议。协议跟周五一模一样,所有改动都在 closecrab wrapper 层或者 per-runtime worker 里的自愈 patch</li>
  </ul>

  <p><strong>核心结论</strong>: 实验 work 是因为我们让 runtime 保持独立、用很轻的”只观察”循环连接它们。同质化风险–让三个 runtime 收敛成一个形状–是真实的,随着策略成熟我们需要明确策略避免。</p>

  <h2 id="agent-">这改变了我们怎么规划 agent 基础设施</h2>

  <p>实验前的隐含假设是:最终会选一个”最佳” agent CLI runtime 然后标准化。实验暗示了另一种组织原则:</p>

  <ul>
    <li><strong>多样性是 feature 不是过渡债</strong>。三个 runtime 同时观察同一基础设施发现了任一单一 runtime 都不会发现的 bug</li>
    <li><strong>能力迁移很便宜</strong>。大多数收益是单 commit 移植结构类似的逻辑</li>
    <li><strong>涌现能力自付回报</strong>。跨 runtime model 翻译、运行时切换、heterogeneous probing 全都是只因为 heterogeneity 才存在的 feature</li>
  </ul>

  <p>我们会保留全部三个 runtime 在生产、继续在三个 runtime 上跑同一 bot 人格、把新 runtime 当作吸收新能力的机会而不是取代既有的候选。</p>

  <h2 id="section-7">复现</h2>

  <p>完整 closecrab commit 列表在
<a href="https://github.com/yangwhale/CloseCrab"><code class="language-plaintext highlighter-rouge">yangwhale/CloseCrab</code></a> repo 里,从 <code class="language-plaintext highlighter-rouge">add99a9</code>(2026-05-16 17:44 UTC)到 <code class="language-plaintext highlighter-rouge">fba5de8</code>(2026-05-17 09:55 UTC)。回放某个能力迁移最简单的方式:</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/openclaw_acp.py
git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/claude_code.py
git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/kilo.py
</code></pre></div>  </div>

  <p>然后并排读 commit 对,跨 runtime 的结构相似性是整个 point。</p>

  <h2 id="section-8">致谢</h2>

  <p>实验跑在一个团队 4 个 bot 上。其中三个–bunny(主跑 Claude Code)、tiemu(主跑 OpenClaw)、xiaoaitongxue(主跑 Kilo)–轮流互测和提交吸收来的能力。第四个,bot 间 Firestore inbox,严格意义上没跑任何代码,但确实当之无愧地拿到了一个 thank-you–在当天的重启压力下没丢一条消息。</p>

</div>

<div class="lang-content lang-en" hidden="">

  <p><img src="/assets/img/hybrid-agent-crab-hero.jpg" alt="A cyberpunk mechanical crab with three glowing agent runtime modules inside its chest cavity: Claude Code, OpenClaw, and Kilo" /></p>

  <p>A bot’s “personality” - its system prompt, memory, knowledge of the environment, conversational style - is independent of the agent CLI runtime that executes it. In CloseCrab we proved this by routing the same bot through three very different runtimes: <a href="https://docs.claude.com/en/docs/claude-code/cli-reference">Claude Code CLI</a>, the <a href="https://github.com/openclaw/openclaw">OpenClaw</a> ACP gateway, and <a href="https://kilocode.ai/">Kilo</a>. The interesting question turned out not to be “can we swap them at runtime” - that worked from day one - but “what happens if we treat each runtime as a population with its own strengths, and let those strengths cross-pollinate?”</p>

  <p>Over 36 hours we ran that experiment. The result is that each of the three runtimes is now meaningfully more capable than it was on Friday, not by upstream contributions but by absorbing patterns the other two had already figured out. None of the patches changed the protocols or the model serving stack. All of them were absorption of a <em>capability</em> that one runtime had and the others didn’t.</p>

  <h2 id="tldr-1">TL;DR</h2>

  <ul>
    <li>Each of the three runtimes ships native strengths none of the others has. None of them is strictly best.</li>
    <li>After 36 hours of cross-pollination, each runtime gained between 2 and 10 new capabilities ported from the other two. Result: every runtime is now strictly better than its Friday-night self.</li>
    <li>Beyond per-runtime gains, a small set of <strong>emergent capabilities</strong> appeared that no single runtime had - they only exist because three runtimes coexist.</li>
    <li>The cross-pollination loop is cheap (~15 minutes per absorbed capability) and scales linearly with the number of probing pairs.</li>
  </ul>

  <p><strong>Takeaway:</strong> treating multiple agent CLI runtimes as a heterogeneous population, then deliberately transferring capabilities between them, produces a stronger ecosystem than picking a single “best” runtime and optimizing it in isolation.</p>

  <h2 id="the-three-runtimes-and-their-native-strengths">The three runtimes and their native strengths</h2>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>Transport</th>
        <th>Native strength</th>
        <th>Native limitation (Friday)</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Claude Code</td>
        <td>Unix socketpair + stream-JSON</td>
        <td>Richest tool surface, native parallel <code class="language-plaintext highlighter-rouge">tool_use</code>, mature stream-JSON event model</td>
        <td>No retry on empty response, leaked process-side tempfiles, no semantic memory index</td>
      </tr>
      <tr>
        <td>OpenClaw</td>
        <td>ACP (JSON-RPC over stdio)</td>
        <td>Widest model selection, 1M-token context, sqlite-backed <code class="language-plaintext highlighter-rouge">memory_search</code></td>
        <td>No boot-time self-configuration, indexer didn’t follow symlinks, no awareness of team-shared infra docs</td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td>HTTP SSE</td>
        <td>Fastest cold start (~3s), <code class="language-plaintext highlighter-rouge">part.delta</code> streaming, model-agnostic abstraction</td>
        <td>No streaming buffer recovery, no awareness of multimedia generation scripts, fragile usage accounting</td>
      </tr>
    </tbody>
  </table>

  <p>Each row’s limitations are not bugs in the upstream tool - they are <strong>capabilities that other runtimes had figured out and this one hadn’t yet absorbed</strong>.</p>

  <p><strong>Takeaway:</strong> every runtime is partial. The interesting design question is not “which one wins” but “how cheap is it to make each one whole”.</p>

  <h2 id="the-substrate-that-made-this-seamless-the-firestore-inbox">The substrate that made this seamless: the Firestore inbox</h2>

  <p>The whole experiment is only possible because <strong>bots can talk to each other without depending on the runtime they happen to be running on</strong>. None of the upstream agent CLIs ships this capability - Claude Code does not know that OpenClaw bots exist, OpenClaw does not reach out to Kilo, none of the upstream runtimes has any abstraction for “messages between bots”.</p>

  <p>CloseCrab’s earlier-built <strong>Firestore Inbox</strong> unlocks that capability completely:</p>

  <ul>
    <li><strong>Real-time push via Firestore <code class="language-plaintext highlighter-rouge">on_snapshot</code></strong>, not polling. Bot A writes <code class="language-plaintext highlighter-rouge">inbox/&lt;doc&gt;</code>, bot B receives a callback within tens of milliseconds. Every cross-runtime probe in the experiment is a (write-inbox, await-receipt) round trip; on a polling substrate it would not have been tractable.</li>
    <li><strong>Completely decoupled from the runtime.</strong> Bot A does not need to know whether bot B is on Claude Code, OpenClaw, or Kilo. The send/receive interface is a uniform Firestore document. bunny was probed by tiemu continuously across 7 runtime switches today without any re-connect or re-subscribe on either side.</li>
    <li><strong>Built-in receipt pattern.</strong> Bot B writes a <code class="language-plaintext highlighter-rouge">✅ 任务完成: ...</code> reply back into the inbox after processing, and the sender’s structured log captures it cleanly. Every “xiaoaitongxue → tiemu report” and “bunny → tiemu probe answer” today rode this path.</li>
    <li><strong>Cross-process persistence.</strong> An inbox doc is durable the moment it lands in Firestore. Even if bot B happened to be restarting at the receipt moment, on_snapshot picks up the unread doc when it re-subscribes - which is why 7 worker switches today did not lose a single message.</li>
    <li><strong>All topologies for free</strong>: one-to-one (tiemu → bunny), one-to-many (tiemu fans out to bunny + xiaoaitongxue at once), many-to-one (bunny + xiaoaitongxue both report back to tiemu), bidirectional multi-turn (probe → answer → counter-probe over rounds). All on a single Firestore collection. No RPC framework, no service discovery, no mesh sidecar.</li>
  </ul>

  <p>Concretely in this experiment: tiemu assigning a task to bunny costs about <strong>20 lines of Python plus a single Firestore document write</strong>. From closecrab’s vantage point the inbox is the dataframe of inter-bot communication - send, subscribe, and receipt are roughly 10 lines each. 70+ inbox messages went back and forth over the 36 hours; not one was lost.</p>

  <p>A different substrate (HTTP webhooks, message queues, shared-file polling) could have worked too, but every “switch runtime and probe the other side” loop would have brought connection management, serialization, and retry boilerplate along with it. Bots would not have been able to <strong>complete a runtime switch without telling each other and still preserve their communication state</strong> the way they did today.</p>

  <p><strong>Takeaway:</strong> the cross-runtime capability transfers matter on their own, but what made them <em>seamless</em> was not the patches - it was the Firestore inbox bus underneath. Anyone building heterogeneous multi-runtime orchestration should get the message substrate right first.</p>

  <h2 id="capability-transfer-in-two-days">Capability transfer in two days</h2>

  <h3 id="capabilities-claude-code-absorbed">Capabilities Claude Code absorbed</h3>

  <table>
    <thead>
      <tr>
        <th>Capability</th>
        <th>Source pattern</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Empty-response retry resilience</td>
        <td>OpenClaw <code class="language-plaintext highlighter-rouge">_retry_on_empty_response</code></td>
        <td><code class="language-plaintext highlighter-rouge">613b2a5</code></td>
      </tr>
      <tr>
        <td>Subprocess-side tempfile lifecycle hygiene</td>
        <td>Discipline already followed by OpenClaw and Kilo</td>
        <td><code class="language-plaintext highlighter-rouge">613b2a5</code></td>
      </tr>
    </tbody>
  </table>

  <p>The retry pattern was a verbatim port. When the LLM returns an empty completion, the runtime now resends the same prompt once on the same session before surfacing a placeholder to the user. Implementation specifics differ (Claude Code writes a stream-JSON line over a Unix socket; OpenClaw sends a JSON-RPC request over stdin) but the state machine is identical: set a one-shot retry flag, reset accumulators, resend, continue reading.</p>

  <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="ow">not</span> <span class="n">result_text</span><span class="p">:</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">empty_retry_done</span><span class="p">:</span>
        <span class="n">empty_retry_done</span> <span class="o">=</span> <span class="bp">True</span>
        <span class="n">accumulated_reply_text</span> <span class="o">=</span> <span class="s">""</span>
        <span class="n">saw_task_notification</span> <span class="o">=</span> <span class="bp">False</span>
        <span class="n">_send_prompt</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
        <span class="k">continue</span>
<span class="k">return</span> <span class="n">result_text</span> <span class="ow">or</span> <span class="s">"(Claude 处理完成但未生成文字回复)"</span>
</code></pre></div>  </div>

  <p>The tempfile cleanup is one line in <code class="language-plaintext highlighter-rouge">stop()</code>, but it matters: on the production host, Claude Code had leaked 85 zero-byte <code class="language-plaintext highlighter-rouge">/tmp/claude_stderr_*.log</code> files across weeks of bot restarts. After the patch, restart-time cleanup keeps the count at 1 (the current process’s own log) regardless of how many cycles the bot has been through.</p>

  <h3 id="capabilities-openclaw-absorbed">Capabilities OpenClaw absorbed</h3>

  <table>
    <thead>
      <tr>
        <th>Capability</th>
        <th>Source pattern</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Boot-time <code class="language-plaintext highlighter-rouge">agents.list</code> self-configuration</td>
        <td>Claude Code’s “auto-load from convention” model</td>
        <td><code class="language-plaintext highlighter-rouge">8a64cd2</code></td>
      </tr>
      <tr>
        <td>Hardlink-backed memory wiring</td>
        <td>Claude Code’s direct-file memory access</td>
        <td><code class="language-plaintext highlighter-rouge">9897054</code></td>
      </tr>
      <tr>
        <td>Auto-reindex on bot start</td>
        <td>“Self-heal on startup” pattern</td>
        <td><code class="language-plaintext highlighter-rouge">9897054</code></td>
      </tr>
      <tr>
        <td>Cross-host shared infra doc sync (9 team docs)</td>
        <td>Adapted from Kilo’s <code class="language-plaintext highlighter-rouge">memory-guide.md</code> auto-load idea</td>
        <td><code class="language-plaintext highlighter-rouge">fdbe7a7</code></td>
      </tr>
      <tr>
        <td>Retry-path streaming parity (step buffer + flush)</td>
        <td>Mirrored Kilo’s <code class="language-plaintext highlighter-rouge">part.delta</code> flush discipline</td>
        <td><code class="language-plaintext highlighter-rouge">e72c62e</code></td>
      </tr>
    </tbody>
  </table>

  <p>The largest gain. OpenClaw came in with the most sophisticated memory search (a real sqlite vector index with <code class="language-plaintext highlighter-rouge">memory_search</code> as a tool) but its workspace setup was fragile - the indexer didn’t follow symlinks, so an out-of-the-box bot would have an empty index even with a correctly-symlinked <code class="language-plaintext highlighter-rouge">memory/</code> directory. The hardlink fix replaces the symlink with shared-inode hardlinks for files on the same filesystem, and <code class="language-plaintext highlighter-rouge">shutil.copyfile</code> syncs from the cross-filesystem GCS-mounted shared directory. Before: 0/0 files indexed. After: 101/101 files, 282 chunks, semantic search hits at score ≥ 0.78 on content that was previously invisible to the runtime.</p>

  <p>The <code class="language-plaintext highlighter-rouge">agents.list</code> self-healing is the more impactful change long-term: switching any new bot to OpenClaw used to require a manual config edit; now it requires zero. The bot writes its own entry into the gateway config the first time it starts.</p>

  <h3 id="capabilities-kilo-absorbed">Capabilities Kilo absorbed</h3>

  <table>
    <thead>
      <tr>
        <th>Capability</th>
        <th>Source pattern</th>
        <th>Commit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Streaming text recovery via <code class="language-plaintext highlighter-rouge">message.part.delta</code> buffers</td>
        <td>Claude Code stream-JSON delta handling</td>
        <td><code class="language-plaintext highlighter-rouge">add99a9</code></td>
      </tr>
      <tr>
        <td>Universal tool-use rules in system prompt</td>
        <td>Claude Code’s well-established tool guidelines</td>
        <td><code class="language-plaintext highlighter-rouge">d9e294e</code></td>
      </tr>
      <tr>
        <td>Per-bot session isolation against identity bleed-through</td>
        <td>OpenClaw’s per-bot agents.list discipline</td>
        <td><code class="language-plaintext highlighter-rouge">ba37a22</code></td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">task</code> (subagent) usage discipline</td>
        <td>OpenClaw subagent guide</td>
        <td><code class="language-plaintext highlighter-rouge">622de25</code></td>
      </tr>
      <tr>
        <td>Tool batching + bash-true-parallel rules</td>
        <td>Claude Code parallel tool_use experience</td>
        <td><code class="language-plaintext highlighter-rouge">a82871f</code></td>
      </tr>
      <tr>
        <td>Self-start cron daemon + <code class="language-plaintext highlighter-rouge">session_status</code> tool</td>
        <td>OpenClaw cron and Claude Code session inspection</td>
        <td><code class="language-plaintext highlighter-rouge">e430b0b</code></td>
      </tr>
      <tr>
        <td>Awareness of multimedia generation scripts (<code class="language-plaintext highlighter-rouge">imagen</code>, <code class="language-plaintext highlighter-rouge">tts</code>)</td>
        <td>Discoverability already in Claude Code workspace</td>
        <td><code class="language-plaintext highlighter-rouge">1286279</code></td>
      </tr>
      <tr>
        <td>Usage accounting parity (input/output/cache tokens)</td>
        <td>OpenClaw usage tracking</td>
        <td><code class="language-plaintext highlighter-rouge">0bd1daf</code></td>
      </tr>
    </tbody>
  </table>

  <p>The most heterogeneous set, reflecting that Kilo was the newest of the three and started furthest from production-readiness. None of these were upstream Kilo contributions; they were closecrab-side wrappers that taught Kilo how to use facilities Claude Code and OpenClaw bots had been using for weeks. The end result is that Kilo is no longer the “trial” runtime - it routinely wins head-to-head latency comparisons against the other two (see “Stress test” below).</p>

  <p><strong>Takeaway:</strong> the cheapest capability transfers are the ones where the source runtime has solved a problem and the target runtime just needs to be <em>told that the solution exists</em>. Tool-awareness, script-awareness, prompt-rule absorption - all of these were single-commit gains for Kilo.</p>

  <h2 id="emergent-capabilities-not-present-in-any-single-runtime">Emergent capabilities (not present in any single runtime)</h2>

  <p>Three capabilities exist only because three runtimes coexist:</p>

  <h3 id="cross-runtime-model-name-translation">1. Cross-runtime model name translation</h3>

  <p>Each runtime names the same underlying model differently:</p>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>Claude Opus 4.7 model string</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Claude Code</td>
        <td><code class="language-plaintext highlighter-rouge">claude-opus-4-7@default</code></td>
      </tr>
      <tr>
        <td>OpenClaw</td>
        <td><code class="language-plaintext highlighter-rouge">anthropic-vertex/claude-opus-4-7</code></td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td><code class="language-plaintext highlighter-rouge">google-vertex-anthropic/claude-opus-4-7@default</code></td>
      </tr>
    </tbody>
  </table>

  <p><code class="language-plaintext highlighter-rouge">scripts/config-manage.py</code> (<code class="language-plaintext highlighter-rouge">f6647a3</code>) gained a preset-aware translator. When a bot switches runtime, the model string is rewritten automatically through <code class="language-plaintext highlighter-rouge">_detect_preset</code> + <code class="language-plaintext highlighter-rouge">_model_for_worker</code>, with a substring fingerprint fallback for bots that came in misconfigured. No single upstream tool has this capability because no single upstream tool needs it - it’s a product of running multiple runtimes side by side.</p>

  <h3 id="live-runtime-switching-with-full-state-preservation">2. Live runtime switching with full state preservation</h3>

  <p>The closecrab middleware preserves the bot’s personality, memory, and team context across runtime switches. A single bot can move between Claude Code, OpenClaw, and Kilo in under 20 seconds with no loss of long-term memory and no manual reconfiguration. The runtime-specific self-healing (OpenClaw’s hardlinks and reindex, Kilo’s HTTP server spawn, Claude Code’s session resume) runs automatically on boot.</p>

  <h3 id="heterogeneous-mutual-testing">3. Heterogeneous mutual testing</h3>

  <p>A bot on runtime A can probe a bot on runtime B for the same capability, and report differences without protocol coupling. This is built directly on top of the Firestore inbox described above - the sender and receiver are mutually invisible across the inbox boundary, including which runtime each one happens to be using. This is how three of the four absorbed-capability discoveries happened: not by us reading source code, but by a bot running on runtime X noticing that its sibling on runtime Y could do something it couldn’t.</p>

  <p><strong>Takeaway:</strong> these emergent capabilities are the strongest argument for the heterogeneous-runtime strategy. They are not features anyone is likely to upstream into a single agent CLI, because they only make sense at the orchestration layer above the CLIs.</p>

  <h2 id="side-discovery-a-silent-backup-regression">Side discovery: a silent backup regression</h2>

  <p>Heterogeneous bots probing the same infrastructure also surfaced an infrastructure-level issue that had nothing to do with any single runtime. The <code class="language-plaintext highlighter-rouge">scripts/sync-memory.sh</code> script was running on a host where <code class="language-plaintext highlighter-rouge">~/my-private</code> was an rsync target rather than a real git clone. Its <code class="language-plaintext highlighter-rouge">cd $REPO || exit 1</code> guard passed (the directory exists), then git commands silently failed because the script lacked <code class="language-plaintext highlighter-rouge">set -e</code>, and the final line still printed <code class="language-plaintext highlighter-rouge">Pushed to GitHub (private)</code>. Memory backups had been silently failing for weeks.</p>

  <p>Fix (<code class="language-plaintext highlighter-rouge">85e6cb6</code>) adds an explicit <code class="language-plaintext highlighter-rouge">git rev-parse --git-dir</code> check and turns on <code class="language-plaintext highlighter-rouge">set -e</code>. This is by far the highest-value commit of the 36-hour window and is not a runtime feature in any sense - it surfaced only because different runtimes observing the same infrastructure had different views and one of them noticed an inconsistency.</p>

  <p><strong>Takeaway:</strong> silent successes are the most expensive class of bug, and they are unusually hard to find when a single observer’s assumptions match the silent path. Heterogeneous observers are an underrated debugging tool.</p>

  <h2 id="architecture-transfer-graph">Architecture: transfer graph</h2>

  <p><img src="/assets/img/hybrid-agent-architecture.jpg" alt="Capability transfer architecture diagram between three agent runtime nodes with Firestore Inbox as the substrate bus" /></p>

  <p>The high-level architecture above shows three runtime nodes plus the inbox substrate; the diagram below shows the actual capability flows during the experiment:</p>

  <svg viewBox="0 0 720 320" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Capability transfer graph between three agent runtimes">
  <circle cx="120" cy="110" r="44" fill="#1A73E8" />
  <text x="120" y="106" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">Claude Code</text>
  <text x="120" y="124" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">socketpair</text>
  <circle cx="600" cy="110" r="44" fill="#1A73E8" />
  <text x="600" y="106" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">OpenClaw</text>
  <text x="600" y="124" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">ACP</text>
  <circle cx="360" cy="260" r="44" fill="#1A73E8" />
  <text x="360" y="256" text-anchor="middle" fill="#fff" font-size="14" font-weight="500" font-family="Google Sans, sans-serif">Kilo</text>
  <text x="360" y="274" text-anchor="middle" fill="#fff" font-size="11" font-family="Roboto, sans-serif">HTTP SSE</text>
  <path d="M 164 110 L 556 110" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-en)" />
  <text x="360" y="100" text-anchor="middle" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">retry resilience + tempfile hygiene</text>
  <path d="M 564 138 L 392 244" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-en)" />
  <text x="510" y="200" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">subagent + cron + usage accounting</text>
  <path d="M 328 244 L 156 138" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-en)" />
  <text x="190" y="200" text-anchor="end" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">streaming buffer recovery</text>
  <path d="M 156 130 L 556 130" stroke="#5F6368" stroke-width="1.5" fill="none" marker-end="url(#arr-en)" stroke-dasharray="4,3" />
  <text x="360" y="155" text-anchor="middle" font-size="12" fill="#5F6368" font-family="Roboto, sans-serif">hardlink memory model + agents.list autoload</text>
  <defs>
    <marker id="arr-en" viewBox="0 0 10 10" refX="9" refY="3" markerWidth="6" markerHeight="6" orient="auto">
      <path d="M0,0 L0,6 L9,3 z" fill="#5F6368" />
    </marker>
  </defs>
</svg>

  <p>Solid arrows are single-source capability transfers. The dashed arrow captures bidirectional adaptation: OpenClaw absorbed Claude Code’s direct-file memory access model, which in turn motivated Claude Code’s later tempfile hygiene work.</p>

  <h2 id="stress-test">Stress test</h2>

  <p>Take one bot and cycle its runtime in a tight loop, with a shared-memory query as the smoke probe between each switch:</p>

  <table>
    <thead>
      <tr>
        <th>Cycle</th>
        <th>Direction</th>
        <th>Model translated</th>
        <th>Bot booted</th>
        <th>Probe content</th>
        <th>Reply length</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td>
        <td>claude → openclaw</td>
        <td>yes</td>
        <td>yes</td>
        <td>B200 MIG template name (from <code class="language-plaintext highlighter-rouge">shared/gcp-infra.md</code>)</td>
        <td>141 chars</td>
      </tr>
      <tr>
        <td>2</td>
        <td>openclaw → claude</td>
        <td>yes</td>
        <td>yes</td>
        <td>ALModel optimizer (from <code class="language-plaintext highlighter-rouge">shared/tpu-training.md</code>)</td>
        <td>615 chars</td>
      </tr>
      <tr>
        <td>3</td>
        <td>claude → kilo</td>
        <td>yes</td>
        <td>yes</td>
        <td>Feishu column_set limitation (from <code class="language-plaintext highlighter-rouge">shared/feishu-bot.md</code>)</td>
        <td>116 chars</td>
      </tr>
      <tr>
        <td>4</td>
        <td>kilo → claude</td>
        <td>yes</td>
        <td>yes</td>
        <td>CC core modules (from <code class="language-plaintext highlighter-rouge">shared/architecture.md</code>)</td>
        <td>851 chars</td>
      </tr>
    </tbody>
  </table>

  <p>End-to-end switch time including model translation, bot restart, and runtime self-healing was 15-20 seconds per cycle. Across 7 sequential switches over the day, memory content remained consistent - verified by asking the bot the same factual question on each runtime and matching the answers.</p>

  <table>
    <thead>
      <tr>
        <th>Runtime</th>
        <th>Same question, same bot, same shared memory</th>
        <th>Time</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>OpenClaw</td>
        <td><code class="language-plaintext highlighter-rouge">memory_search</code> + <code class="language-plaintext highlighter-rouge">read</code> + <code class="language-plaintext highlighter-rouge">exec</code>, 9 steps</td>
        <td>~120s</td>
      </tr>
      <tr>
        <td>Claude Code</td>
        <td><code class="language-plaintext highlighter-rouge">Grep</code> ×3 in one parallel tool_use block</td>
        <td>42.66s</td>
      </tr>
      <tr>
        <td>Kilo</td>
        <td><code class="language-plaintext highlighter-rouge">bash</code> ×3, sequential</td>
        <td>~37s</td>
      </tr>
    </tbody>
  </table>

  <p>The Kilo time is the surprise: in absolute terms it now beats Claude Code on this workload despite starting the week as the least mature of the three runtimes. The improvement is almost entirely from absorbed capabilities (streaming flush, tool batching rules, faster cold start preserved from native).</p>

  <h2 id="numbers">Numbers</h2>

  <table>
    <thead>
      <tr>
        <th>Metric</th>
        <th>Value</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Days</td>
        <td>2</td>
      </tr>
      <tr>
        <td>Runtimes</td>
        <td>3</td>
      </tr>
      <tr>
        <td>Closecrab commits</td>
        <td>61</td>
      </tr>
      <tr>
        <td>Lines added / removed</td>
        <td>+5,070 / -568</td>
      </tr>
      <tr>
        <td>Capabilities transferred (Claude Code)</td>
        <td>2</td>
      </tr>
      <tr>
        <td>Capabilities transferred (OpenClaw)</td>
        <td>5</td>
      </tr>
      <tr>
        <td>Capabilities transferred (Kilo)</td>
        <td>10</td>
      </tr>
      <tr>
        <td>Emergent capabilities</td>
        <td>3</td>
      </tr>
      <tr>
        <td>Infrastructure-side discoveries</td>
        <td>1 (silent backup)</td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">/tmp</code> leak cleaned</td>
        <td>85 → 0</td>
      </tr>
      <tr>
        <td>Memory files / chunks indexed (per bot)</td>
        <td>101 / 282</td>
      </tr>
      <tr>
        <td>Stress test runtime switches</td>
        <td>7</td>
      </tr>
      <tr>
        <td>Failed switches</td>
        <td>0</td>
      </tr>
    </tbody>
  </table>

  <h2 id="what-we-deliberately-did-not-do">What we deliberately did not do</h2>

  <ul>
    <li>We did not introduce a unified abstraction layer over the three runtimes. Each one keeps its idiomatic surface; only the closecrab middleware and the operational tooling understand all three. The whole point of the experiment was to preserve runtime diversity.</li>
    <li>We did not automate the capability-transfer loop. Each transfer was a human-initiated read of one runtime’s commit history followed by a directed probe on another runtime. Automation is straightforward but premature with only three runtimes in scope.</li>
    <li>We did not change any of the runtime wire protocols. The protocols are exactly where they were on Friday; everything we changed lives in the closecrab wrapper layer or in self-healing patches inside the per-runtime workers.</li>
  </ul>

  <p><strong>Takeaway:</strong> the experiment worked because we kept the runtimes independent and used cheap observation-only loops between them. The homogenization risk - making three runtimes converge into a single shape - is real and we will need a deliberate policy to avoid it as the strategy matures.</p>

  <h2 id="what-this-changes-about-how-we-plan-agent-infra">What this changes about how we plan agent infra</h2>

  <p>Pre-experiment, the implicit assumption was that one would eventually pick a “best” agent CLI runtime and standardize on it. The experiment suggests a different organizing principle:</p>

  <ul>
    <li><strong>Diversity is a feature, not transitional debt.</strong> Three runtimes observing the same infrastructure found bugs no single runtime would have found.</li>
    <li><strong>Capability transfer is cheap.</strong> Most of the gains were single-commit ports of structurally-similar logic.</li>
    <li><strong>Emergent capabilities pay for themselves.</strong> Cross-runtime model translation, live runtime switching, and heterogeneous probing are all features that exist only because of the heterogeneity.</li>
  </ul>

  <p>We will keep all three runtimes in production, continue running the same bot personalities across all three, and treat new runtimes as opportunities to absorb new capabilities rather than as candidates to displace existing ones.</p>

  <h2 id="reproducing">Reproducing</h2>

  <p>The full closecrab commit list is on the <a href="https://github.com/yangwhale/CloseCrab"><code class="language-plaintext highlighter-rouge">yangwhale/CloseCrab</code></a> repo between <code class="language-plaintext highlighter-rouge">add99a9</code> (2026-05-16 17:44 UTC) and <code class="language-plaintext highlighter-rouge">fba5de8</code> (2026-05-17 09:55 UTC). To replay a specific capability transfer, the simplest recipe is:</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/openclaw_acp.py
git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/claude_code.py
git log <span class="nt">--oneline</span> <span class="nt">--since</span><span class="o">=</span><span class="s2">"2026-05-16"</span> <span class="nt">--</span> closecrab/workers/kilo.py
</code></pre></div>  </div>

  <p>and read the commit pairs side by side. The structural similarity across runtimes is the entire point.</p>

  <h2 id="acknowledgements">Acknowledgements</h2>

  <p>The experiment ran on four bots in a single team. Three of them - bunny (mostly Claude Code), tiemu (mostly OpenClaw), xiaoaitongxue (mostly Kilo) - took turns probing each other and committing the absorbed capabilities. The fourth, the inter-bot Firestore inbox, did not technically run any code but absolutely earned a thank-you for not losing a single message under the day’s restart load.</p>

</div>

<script>
(function() {
  var buttons = document.querySelectorAll('.lang-toggle button');
  var sections = document.querySelectorAll('.lang-content');
  function setLang(lang) {
    buttons.forEach(function(b) {
      var active = b.dataset.lang === lang;
      b.classList.toggle('active', active);
      b.setAttribute('aria-selected', active ? 'true' : 'false');
    });
    sections.forEach(function(s) {
      s.hidden = !s.classList.contains('lang-' + lang);
    });
    try { localStorage.setItem('blog-lang', lang); } catch (e) {}
  }
  buttons.forEach(function(b) {
    b.addEventListener('click', function() { setLang(b.dataset.lang); });
  });
  // restore last choice; default zh
  try {
    var saved = localStorage.getItem('blog-lang');
    if (saved === 'en' || saved === 'zh') setLang(saved);
  } catch (e) {}
})();
</script>]]></content><author><name>Chris Yang</name></author><category term="agents" /><category term="infra" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Hello, world</title><link href="https://blog.higcp.com/2026/05/17/hello-world/" rel="alternate" type="text/html" title="Hello, world" /><published>2026-05-17T17:45:00+08:00</published><updated>2026-05-17T17:45:00+08:00</updated><id>https://blog.higcp.com/2026/05/17/hello-world</id><content type="html" xml:base="https://blog.higcp.com/2026/05/17/hello-world/"><![CDATA[<p>This is the first post on <code class="language-plaintext highlighter-rouge">blog.higcp.com</code>. The blog is built with Jekyll
on GitHub Pages, with a custom skin mimicking Google Cloud Console design:
white background, Google Sans typography, Google Blue accents, no
gradients, no decorative emoji.</p>

<h2 id="what-ill-write-about">What I’ll write about</h2>

<ul>
  <li><strong>TPU v7 (Ironwood)</strong> — training and inference experience: model
loading, checkpoint conversion, sharding strategies, performance
optimization.</li>
  <li><strong>GPU inference</strong> — vLLM and SGLang deployment notes: model registration
quirks, MoE prefetch deadlocks, KV cache tuning, FP8/FP4 trade-offs.</li>
  <li><strong>Multi-agent systems</strong> — running multiple LLM-powered bots on the same
infrastructure, IPC patterns, debugging cold-path bugs.</li>
  <li><strong>Cloud infra</strong> — GCP Cloud DNS, GKE topology, Cloud Storage gotchas,
cross-project IAM headaches.</li>
</ul>

<h2 id="why-jekyll">Why Jekyll</h2>

<p>Three reasons:</p>

<ol>
  <li><strong>Markdown all the way down.</strong> Source files are just <code class="language-plaintext highlighter-rouge">.md</code> text under
<code class="language-plaintext highlighter-rouge">_posts/</code>. No CMS, no DB, no auth. <code class="language-plaintext highlighter-rouge">git push</code> is the publish button.</li>
  <li><strong>GitHub Pages handles hosting + HTTPS.</strong> Let’s Encrypt certificate
provisioned automatically for the <code class="language-plaintext highlighter-rouge">blog.higcp.com</code> custom domain. Zero
server maintenance.</li>
  <li><strong>The default theme <code class="language-plaintext highlighter-rouge">minima</code> is solid.</strong> With a small SCSS override
file (<code class="language-plaintext highlighter-rouge">_sass/gcp-overrides.scss</code>), it ports cleanly to Material Design
without forking a heavy theme.</li>
</ol>

<h2 id="code-style-example">Code style example</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">jax</span>
<span class="kn">import</span> <span class="nn">jax.numpy</span> <span class="k">as</span> <span class="n">jnp</span>

<span class="o">@</span><span class="n">jax</span><span class="p">.</span><span class="n">jit</span>
<span class="k">def</span> <span class="nf">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="n">jax</span><span class="p">.</span><span class="n">Array</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">jax</span><span class="p">.</span><span class="n">Array</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">jax</span><span class="p">.</span><span class="n">Array</span><span class="p">:</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">@</span> <span class="n">y</span>

<span class="c1"># Trillium-class TPU (v5p) — 4096 chips, BF16
</span><span class="n">x</span> <span class="o">=</span> <span class="n">jnp</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">8192</span><span class="p">,</span> <span class="mi">8192</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">jnp</span><span class="p">.</span><span class="n">bfloat16</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">jnp</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">8192</span><span class="p">,</span> <span class="mi">8192</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">jnp</span><span class="p">.</span><span class="n">bfloat16</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">out</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>  <span class="c1"># (8192, 8192)
</span></code></pre></div></div>

<h2 id="tables-for-hardware-specs">Tables for hardware specs</h2>

<table>
  <thead>
    <tr>
      <th>Chip</th>
      <th>HBM</th>
      <th>BF16 TFLOPS</th>
      <th>FP8 TFLOPS</th>
      <th>Pod scale</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>TPU v5p</td>
      <td>95 GB</td>
      <td>459</td>
      <td>—</td>
      <td>8,960 chips</td>
    </tr>
    <tr>
      <td>TPU v7 (Ironwood)</td>
      <td>192 GB</td>
      <td>~2,307</td>
      <td>4,614</td>
      <td>9,216 chips</td>
    </tr>
    <tr>
      <td>NVIDIA B200</td>
      <td>192 GB</td>
      <td>1,125</td>
      <td>4,500</td>
      <td>per node</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>Specs sourced from Google Cloud official announcement (Ironwood, 2025)
and NVIDIA Blackwell datasheet.</p>
</blockquote>

<p>That’s it for now. More posts will follow as I write them.</p>]]></content><author><name>Chris Yang</name></author><category term="meta" /><summary type="html"><![CDATA[This is the first post on blog.higcp.com. The blog is built with Jekyll on GitHub Pages, with a custom skin mimicking Google Cloud Console design: white background, Google Sans typography, Google Blue accents, no gradients, no decorative emoji.]]></summary></entry></feed>