Changing the architecture, Hermes and OpenClaw tokens are not consumed in the same way.

Why is OpenClaw by default heavier?

OpenClaw’s design logic was never “to create a lightweight agent for a quick chat,” but rather “to build up a long-term, persistent agent workbench.” This is stated very clearly in the official workspace documentation.

AGENTS.md, SOUL.md, and USER.md are loaded for every session. These files, such as IDENTITY.md, TOOLS.md, HEARTBEAT.md, BOOT.md, and MEMORY.md, also revolve around the same workspace. The documentation even specifically reminds that HEARTBEAT.md should be kept very short to avoid token burn. This reminder itself indicates that OpenClaw is well aware that its default context is quite thick.

However, one thing that needs to be made clear is that OpenClaw doesn’t have everything permanently resident in the context. Its official token documentation clearly states that when skills are added to the system prompt, they are by default just metadata, and specific instructions must be explicitly read as needed. Therefore, its issue isn’t “not knowing how to save,” but rather that “the base it defaults to carrying is inherently larger.”

This is actually not difficult to understand. OpenClaw aims to solve the issues of “long-term online presence” and “multi-message surface visibility.”

You have to make an agent live simultaneously in places like Telegram, Discord, Slack, and WhatsApp. It also needs identity, routing, boundaries, and delegation. Many rules cannot just be tacked on temporarily. It first needs to know who it is, how to speak, whom it’s facing, what the constraints of the current workspace are, where to find skills, which memories to carry, and which memories not to carry.

So, the token consumption for OpenClaw is more like a large fixed overhead.

Every time you send a message, you are not just saying one sentence to a model; you are activating an entire set of pre-configured “assistant environment.” This environment is very useful, but the cost is that the base prompt is thicker, and much context, even if not directly used in this round, will be placed there first.

Why Hermes Looks More Restrained

Regarding Hermes, the most interesting part of the documentation is that it keeps emphasizing two things: on-demand loading and prompt cache preservation.

First, look at the context files. Hermes only loads the project contexts that match the current working directory when a session starts; things like AGENTS.md, CLAUDE.md, and .cursorrules are first-match wins, so they won’t all be loaded at once. More importantly, the AGENTS.md in a subdirectory isn’t read entirely at startup; it is discovered and injected incrementally only when you actually navigate to that directory, read that file, or reach that path.

The official documentation clearly writes out the benefits of this design:

no system prompt bloat
prompt cache preservation

This taste is very different from OpenClaw. Hermes seems to be saying that it’s best not to make the system prompt too long; place context that can be deferred, and don’t fill up the first round with things that only appear at relevant times.

The skills also follow the same logic. The Hermes skills documentation explicitly mentions progressive disclosure, with the goal of minimizing token usage. In other words, the skill is not permanently present in the full text by default; instead, the model first sees a lightweight index and only expands to show the detailed content when it’s truly necessary.

Furthermore, its memory system is also combinatorial. The official documentation severely limits the built-in memory: MEMORY.md is about 800 tokens, and USER.md is about 500 tokens, totaling a fixed capacity of around 1300 tokens. SOUL.md is for fixed identity, while USER.md and memory are included in the system prompt, but items like session search, memory provider, and Honcho feel more like external layers. This structure doesn’t naturally shorten the prompt, but it provides a default posture that is easier to control cost-wise.

So, from an architectural perspective, I would be more inclined to categorize Hermes as “default restraint, gradually increasing weight.”

It’s not about who is more advanced, but where the cost is placed

Many comparison articles like to write token consumption as a single conclusion, which I think is incorrect.

OpenClaw is heavier, but that doesn’t mean it’s poorly designed. On the contrary, it intentionally front-loads a lot of features. If you want an assistant that is long-term online, cross-platform, has personality, has a workspace, and has routing, then these contexts will eventually cost something. It just chooses to bring all these things together on the default path first.

Hermes is more restrained, but that doesn’t mean it has no overhead. Once you stack a long SOUL.md, the project AGENTS.md, multiple skills, MCP, memory provider, sub-agent, and long tool outputs together, tokens can still roll very quickly. Especially if the tool output itself is very long, or if you make it search back and forth in a complex codebase, that’s not something you can save just by naming the architecture.

So, a more accurate way to say it is:

OpenClaw moves the cost to “assistant environment residency.”

Hermes shifts the cost to “capability on demand.”

Neither approach is absolutely superior, but their billing structures are completely different.

What truly makes the difference is not just the system prompt

There are a few details that are quite easy to overlook.

First, Hermes was designed to stabilize the prefix in the context files. The subdirectory hints are appended to the results of relevant tools, rather than continuously feeding all project contexts into the system prompt. This approach not only saves tokens but also essentially leaves space for the provider’s prompt cache.

Second, tools like Hermes’s execute_code are designed to only return the final result printed by the script back to the model; all the intermediate results from RPC tool calls will not be included in the context. In complex workflows, this structure can eliminate a large amount of meaningless tool noise.

Third, OpenClaw’s workspace philosophy means it’s more like “bringing everything with you.” Even if many individual files are short, as long as there are many types of files, many layers of persona, many layers of rules, and many levels of skills and memories, the foundational context will gradually become thicker. This thickness is not a bug; it is part of its product design trade-offs.

Fourth, how both sides handle the sub-agent will also affect the total ledger. Many of Hermes’ designs emphasize local context and local execution, while OpenClaw emphasizes the organizational capabilities of multiple agent identities, routes, and delegates. One is more like switching context, and the other is more like organizing existence relationships. Finally, when it comes to the token billing, the shape is naturally different.

If you want to calculate it, don’t rely on your mouth

If you are really planning to use it long-term, don’t just listen to anyone saying “this is more energy efficient”; look directly at the data provided by the tool.

OpenClaw has commands like /context detail and /usage tokens, and the documentation will also remind you which files are injected at the start of each session. This is very suitable for seeing how thick the base knowledge really is.

Regarding Hermes, the official documentation provides clues such as token budget, skills progressive disclosure, and context file injection methods in Honcho and related features. You can look at this by combining hermes insights, session storage, and actual provider billing.

To put it plainly, the architecture can only determine “how to spend,” but it cannot decide “how much to spend.” What truly determines the bill is how long your SOUL.md is, how much memory you’ve stuffed in, how many skills you’ve enabled, whether the tool output was contained, and how expensive the model itself is.

I’m still on this side

If I only look at the default architecture and not extreme configurations, I still tend to think that Hermes is better at keeping token consumption within a relatively restrained range.

It’s not because it’s stronger, but because from the start of prompt design, it has been avoiding “meaningless persistent context bloat.” OpenClaw is the opposite; it would rather carry a bit more to ensure that the identity, workspace, and message boundary of that long-term online assistant don’t fall apart.

So if you are particularly concerned about token costs, and the workflow mainly happens in local, CLI, code repository, or skill knowledge base scenarios, Hermes’ approach will generally be more convenient.

If you want a long-term assistant that is truly available across various messaging platforms, then spending more tokens with OpenClaw feels like a normal cost. You can’t expect it to be online long-term like a person while also requiring it to be as lightweight as a one-off tool every turn.

Ah, this matter ultimately comes back to the old question.

Are you raising an online assistant, or are you raising a local agent core? Figure that out, and the token accounting won’t be so hard to understand.

References

Writing Notes

Original Prompt

Prompt: Hermes and OpenClaw write another article about their token consumption. Since the architectures are different, the consumption must be different.

Writing Idea Summary

Continue the judgment from the previous article, but narrow the focus to token consumption, and no longer repeat the overall architecture comparison.
Focus on comparing the default context assembly methods of both sides, rather than just talking about “who is more efficient.”
Hermes focuses on progressive disclosure, on-demand discovery, and prompt cache preservation.
OpenClaw focuses on workspace persistent files, long-term online assistant environments, and fixed overhead.
The conclusion retains the core judgment provided by the user, but adds a layer of practical reminder: to truly see the bill, one must combine official usage/context tools with actual provider billing.