<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Gpt-5.4 on Uncle Xiang&#39;s Notebook</title>
        <link>https://ttf248.life/en/tags/gpt-5.4/</link>
        <description>Recent content in Gpt-5.4 on Uncle Xiang&#39;s Notebook</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 09 Apr 2026 15:45:31 +0800</lastBuildDate><atom:link href="https://ttf248.life/en/tags/gpt-5.4/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Codex defaults to medium, but I later switched to high.</title>
        <link>https://ttf248.life/en/p/codex-default-medium-vs-high/</link>
        <pubDate>Wed, 08 Apr 2026 22:57:47 +0800</pubDate>
        
        <guid>https://ttf248.life/en/p/codex-default-medium-vs-high/</guid>
        <description>&lt;p&gt;During my time using Codex, there was one thing that always felt a bit awkward: the default thinking level is &lt;code&gt;medium&lt;/code&gt;, but when chatting online about &lt;code&gt;GPT-5.4&lt;/code&gt;, everyone&amp;rsquo;s tone is very strong. When it comes down to actually using it, what exactly is the difference between &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, and &lt;code&gt;xhigh&lt;/code&gt;? The official documentation hasn&amp;rsquo;t provided a particularly straightforward chart.
My current conclusion is quite clear: for daily coding, I prefer to start directly at &lt;code&gt;high&lt;/code&gt;. &lt;code&gt;Medium&lt;/code&gt; isn&amp;rsquo;t unusable; it&amp;rsquo;s fine for quick tasks, minor tweaks, or exploring directions. But when dealing with multi-file modifications, ambiguous requirements, and needing to judge while looking at code, &lt;code&gt;medium&lt;/code&gt; easily wastes computational power in the wrong places. I actually don&amp;rsquo;t use &lt;code&gt;xhigh&lt;/code&gt; often; I save it for really difficult tasks where I get stuck.&lt;/p&gt;
&lt;h2 id=&#34;clarifying-medium&#34;&gt;Clarifying &amp;ldquo;Medium&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;The most confusing part of this topic is that &lt;code&gt;medium&lt;/code&gt; has more than one meaning.
As of &lt;code&gt;2026-04-08&lt;/code&gt;, in OpenAI&amp;rsquo;s public documentation, &lt;code&gt;GPT-5.4&lt;/code&gt;&amp;rsquo;s &lt;code&gt;reasoning.effort&lt;/code&gt; supports &lt;code&gt;none&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, and &lt;code&gt;xhigh&lt;/code&gt;, with a default of &lt;code&gt;none&lt;/code&gt;. However, within the same documentation, &lt;code&gt;verbosity&lt;/code&gt; also has &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, and &lt;code&gt;high&lt;/code&gt;, and the default value for &lt;code&gt;GPT-5.4&lt;/code&gt;&amp;rsquo;s &lt;code&gt;verbosity&lt;/code&gt; is &lt;code&gt;medium&lt;/code&gt;.
So, if you see something online that says &amp;ldquo;the default is medium,&amp;rdquo; don&amp;rsquo;t immediately assume it refers to the &amp;ldquo;thinking level.&amp;rdquo; Often, they are talking about completely different things.
If you are using it directly in Codex and see that the default is &lt;code&gt;medium&lt;/code&gt;, I am more inclined to interpret it as a preset provided by the product layer rather than the underlying default value mentioned in the model documentation. If we don&amp;rsquo;t separate this distinction, our subsequent discussions will constantly contradict each other.&lt;/p&gt;
&lt;h2 id=&#34;the-official-documentation-doesnt-fully-explain-the-gap&#34;&gt;The Official Documentation Doesn&amp;rsquo;t Fully Explain the Gap
&lt;/h2&gt;&lt;p&gt;Let&amp;rsquo;s look at the official documentation.
The public documents currently confirm a few things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gpt-5.4&lt;/code&gt; is the officially recommended default model for general coding tasks.&lt;/li&gt;
&lt;li&gt;In the code generation guide, the examples provided by the official team for &lt;code&gt;gpt-5.4&lt;/code&gt; directly use &lt;code&gt;reasoning: high&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Codex-oriented models like &lt;code&gt;gpt-5.3-codex&lt;/code&gt; explicitly support &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, and &lt;code&gt;xhigh&lt;/code&gt; on their public pages.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gpt-5.4-pro&lt;/code&gt; is another line; it&amp;rsquo;s not simply a matter of turning up the dial on regular &lt;code&gt;gpt-5.4&lt;/code&gt;. It is an independent model designed for &amp;ldquo;thinking longer with more compute power.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, the official documentation hasn&amp;rsquo;t provided a particularly useful chart, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Exactly how much lower the success rate is when using &lt;code&gt;medium&lt;/code&gt; compared to &lt;code&gt;high&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;How much extra time or tokens are spent using &lt;code&gt;xhigh&lt;/code&gt; compared to &lt;code&gt;high&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In coding scenarios, what kind of tasks are worth jumping straight to &lt;code&gt;xhigh&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, the official team gave you the knobs, but they didn&amp;rsquo;t draw out the experience curve for you.&lt;/p&gt;
&lt;h2 id=&#34;whats-actually-useful-is-how-the-tiers-are-separated-on-the-leaderboard&#34;&gt;What&amp;rsquo;s Actually Useful is How the Tiers are Separated on the Leaderboard
&lt;/h2&gt;&lt;p&gt;A flash of inspiration struck me, so I went to check the code leaderboard on Arena, and now it&amp;rsquo;s much clearer.
The code leaderboard on &lt;code&gt;arena.ai&lt;/code&gt; separates the tiers. The page update time is &lt;code&gt;2026-04-01&lt;/code&gt;, and as of when I write this article:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gpt-5.4-high (codex-harness)&lt;/code&gt; ranks &lt;code&gt;6th&lt;/code&gt; with a score of &lt;code&gt;1457&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gpt-5.4-medium (codex-harness)&lt;/code&gt; ranks &lt;code&gt;16th&lt;/code&gt; with a score of &lt;code&gt;1427&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gpt-5.3-codex (codex-harness)&lt;/code&gt; ranks &lt;code&gt;18th&lt;/code&gt; with a score of &lt;code&gt;1407&lt;/code&gt;
Looking at these numbers together makes the meaning very direct.
For the same &lt;code&gt;GPT-5.4&lt;/code&gt;, the difference between &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;medium&lt;/code&gt; isn&amp;rsquo;t just a &amp;ldquo;slight experience gap&amp;rdquo;; it represents a noticeable tier separation. If you only read the statement &amp;ldquo;GPT-5.4 is very strong,&amp;rdquo; the information is actually insufficient because the leaderboard itself has separated &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;medium&lt;/code&gt; into two distinct lines. When people say something is &amp;ldquo;very strong,&amp;rdquo; they are most likely referring to the performance achieved with the high-thinking tier, not speaking on behalf of the medium tier.
Of course, the leaderboard isn&amp;rsquo;t the absolute truth for your project. It measures agentic coding + harness scenarios, not just a single local repository you have. But the direction is very clear: when it comes to coding, the inference tier genuinely changes the results, not just the speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-should-i-choose-now&#34;&gt;How Should I Choose Now
&lt;/h2&gt;&lt;p&gt;Simply put, my current usage is very straightforward.
Use &lt;code&gt;medium&lt;/code&gt; for these scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Changing a few small files&lt;/li&gt;
&lt;li&gt;Fixing obvious bugs&lt;/li&gt;
&lt;li&gt;Getting the model to spit out a draft first&lt;/li&gt;
&lt;li&gt;When speed is key and I don&amp;rsquo;t want to wait too long
Use &lt;code&gt;high&lt;/code&gt; as the daily default:&lt;/li&gt;
&lt;li&gt;Modifying multiple interconnected files&lt;/li&gt;
&lt;li&gt;When there are slightly vague parts in the requirements&lt;/li&gt;
&lt;li&gt;When I need to read the code before making changes&lt;/li&gt;
&lt;li&gt;When judgment is required, not just code completion
I reserve &lt;code&gt;xhigh&lt;/code&gt; for the tough nuts:&lt;/li&gt;
&lt;li&gt;High-risk refactoring&lt;/li&gt;
&lt;li&gt;Troubleshooting long chains of issues&lt;/li&gt;
&lt;li&gt;Architectural changes&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;high&lt;/code&gt; fails to solve the problem after two rounds
The most crucial point here is not how amazing &lt;code&gt;xhigh&lt;/code&gt; is, but not using &lt;code&gt;medium&lt;/code&gt; as a &amp;ldquo;cure-all.&amp;rdquo; The real problem with &lt;code&gt;medium&lt;/code&gt; isn&amp;rsquo;t that it&amp;rsquo;s weak, but that it too easily gives you the illusion of &amp;ldquo;good enough&amp;rdquo; on complex tasks. The result is saving a little time in the first round, but doing much more rework later.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;back-to-gpt-54-what-level-is-actually-powerful&#34;&gt;Back to GPT-5.4, What Level is Actually Powerful?
&lt;/h2&gt;&lt;p&gt;So, finally back to that question: When people say &amp;ldquo;GPT-5.4&amp;rdquo; online is very powerful, what level are they referring to?
My judgment is that if the tier isn&amp;rsquo;t specified, it&amp;rsquo;s more reliable to assume they mean the higher thinking level when saying &amp;ldquo;GPT-5.4 is powerful.&amp;rdquo; At least in coding scenarios, don&amp;rsquo;t directly interpret it as &lt;code&gt;medium&lt;/code&gt;. If the other party is talking about &lt;code&gt;gpt-5.4-pro&lt;/code&gt;, that&amp;rsquo;s an entirely different matter; that&amp;rsquo;s a separate, more computationally intensive version.
I previously wrote about &lt;a class=&#34;link&#34; href=&#34;https://ttf248.life/en/p/command-line-ai-coding-interaction/&#34; &gt;AI Coding Interaction Based on Command Line&lt;/a&gt;, which was more about changes in interaction methods. Looking back now, the change in interaction is one thing, but what level the model is actually running at has become another, more practical issue.
I am very clear on this now: &lt;code&gt;high&lt;/code&gt; is sufficient for daily use; if that fails, try &lt;code&gt;xhigh&lt;/code&gt;. This balance point between speed, cost, and success rate seems more correct.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/api/docs/guides/latest-model&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Using GPT-5.4 | OpenAI API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/api/docs/guides/code-generation&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Code generation | OpenAI API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/api/docs/models/gpt-5.4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GPT-5.4 Model | OpenAI API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/api/docs/models/gpt-5.4-pro&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GPT-5.4 pro Model | OpenAI API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/api/docs/models/gpt-5.3-codex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GPT-5.3-Codex Model | OpenAI API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://arena.ai/leaderboard/code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Code AI Leaderboard - Best AI Models for Coding&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;writing-notes&#34;&gt;Writing Notes
&lt;/h2&gt;&lt;h3 id=&#34;original-prompt&#34;&gt;Original Prompt
&lt;/h3&gt;&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;While using $blog-writer codex, I have a question: the default thinking level is medium. What is the difference between the remaining &#39;high&#39; and &#39;xhigh&#39; capabilities? Which one should I use for daily use? The official documentation doesn&#39;t provide clear instructions, and online sources say that GPT-5.4 is very powerful—which level of thinking are they referring to? Suddenly, I remembered the large model ranking: https://arena.ai/leaderboard/code. Here, it clearly states the thinking levels of large models. It seems that gpt-5.4-high (codex-harness) ranks sixth. Using &#39;high&#39; by default should be sufficient. If it still can&#39;t handle it, I can try &#39;xhigh&#39; to balance cost and speed.
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;writing-approach-summary&#34;&gt;Writing Approach Summary
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Use &amp;ldquo;daily use high, xhigh as fallback&amp;rdquo; as the main judgment point, rather than creating a tier encyclopedia.&lt;/li&gt;
&lt;li&gt;Separate &lt;code&gt;reasoning&lt;/code&gt; and &lt;code&gt;verbosity&lt;/code&gt; to avoid confusing the two &lt;code&gt;medium&lt;/code&gt; levels mentioned in public documentation.&lt;/li&gt;
&lt;li&gt;Official materials are mainly used to confirm supported tiers, default values, and code generation examples; do not fabricate an ability gap table that is not provided by the official sources.&lt;/li&gt;
&lt;li&gt;The Arena leaderboard uses the rankings and scores from the &lt;code&gt;2026-04-01&lt;/code&gt; page to provide factual anchors supporting the claim that &amp;ldquo;high is clearly superior to medium.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Structurally, first explain the source of confusion, then define the boundaries according to official statements, and finally conclude with practical daily selection advice.&lt;/li&gt;
&lt;/ul&gt;</description>
        </item>
        
    </channel>
</rss>
