Tags

5 pages

Codex

Fewer tokens, so why is GPT-5.5 in Codex actually more expensive?

Stunned. / Dumbfounded.

The official ChatGPT side doesn’t make it easy to track tokens and costs directly, so I found a third-party platform and ran a round of similar tasks using GPT-5.4 and GPT-5.5 in Codex, setting the thinking mode to high. The result was very straightforward: simple questions were relatively mild (in terms of cost), with GPT-5.5 being about 30% more expensive than GPT-5.4; however, once complex tasks were involved, the costs shot up to 2.6 times, and both the request count and token consumption increased simultaneously.

My current assessment is very straightforward: this isn’t something that can be decided just because of the statement “5.5 has a higher unit price.” In simple tasks, the cost mainly comes from the unit price; but in complex tasks, what is actually expensive is the entire calling chain (or execution flow). However, looking at it another way, 5.5 does genuinely feel like it’s absorbing your rework costs for you. The model is more willing to think through multiple steps, perform more actions, and check things more thoroughly. Ultimately, the billing isn’t based on a single answer; it’s based on the complete set of actions, which also minimizes the number of back-and-forth cycles required from the human user.

Codex defaults to medium, but I later switched to high.

During my time using Codex, there was one thing that always felt a bit awkward: the default thinking level is medium, but when chatting online about GPT-5.4, everyone’s tone is very strong. When it comes down to actually using it, what exactly is the difference between medium, high, and xhigh? The official documentation hasn’t provided a particularly straightforward chart. My current conclusion is quite clear: for daily coding, I prefer to start directly at high. Medium isn’t unusable; it’s fine for quick tasks, minor tweaks, or exploring directions. But when dealing with multi-file modifications, ambiguous requirements, and needing to judge while looking at code, medium easily wastes computational power in the wrong places. I actually don’t use xhigh often; I save it for really difficult tasks where I get stuck.

Skill is not a new prompt, it is the job manual for the agent.

These past few days, while reading about AI programming, people were first discussing MCP, and then immediately started talking about Skill. Many people who see this term for the first time will instinctively treat it as another new protocol or another advanced prompt.

My judgment is very straightforward: Skill isn’t here to replace MCP; rather, it’s more like providing an occupational manual for the agent. MCP solves the problem of “enabling the agent to connect to the external world,” while Skill solves the problem of “how to reliably get the job done after connecting.” These two are not a replacement relationship; they are more like one following the other.

Simply put, MCP gives the agent hands and feet, and Skill tells the agent not to mess around.

The End of Low-Cost API Gateways: Large Model Experiences and the Impossible Triangle in March

Throughout March, I was constantly testing between various large model API hubs. It is indeed cheap. You can test out foreign models like ChatGPT, Claude, and Gemini for a small amount of money per month, which at first glance seems like finding an extremely cost-effective solution. However, after actually using it, I increasingly feel that this path has always been constrained by an impossible triangle: Quality, Stability, and Affordability—it is difficult for all three to be achieved simultaneously. By last weekend, the situation became quite clear. During the two days from 2026-03-28 to 2026-03-29, I felt a noticeable tightening of risk controls on ChatGPT channels, and Claude was no different. Many low-cost relays that were previously usable suddenly became unstable or even completely failed. For me, this basically signaled the temporary end of the low-cost API relay model.

Command-line AI Coding Interaction

  • The convention is to open Trae and prepare to start coding, and a notification arrives: the Claude model has been shut down and cannot be used; it’s highly likely that it won’t be recovered. The official provided a compensation plan, increasing usage by 300 (as of January).
  • Checking it out, as expected, Anthropic is following US regulations to prohibit domestic companies from continuing to use the Claude series models. I joined the trae Discord community and saw many people complaining about the shutdown of the Claude model – after all, most people came here for Claude. The signs had already appeared before the Claude 4.5 model was synchronized on Trae; it hadn’t launched.

Attempt

With a last-ditch effort, I experimented with other models that are still supported, including OpenAI’s gpt-3.5-turbo, gpt-4, and Google’s Gemini Pro.