<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Agent on Uncle Xiang&#39;s Notebook</title>
        <link>https://ttf248.life/en/tags/agent/</link>
        <description>Recent content in Agent on Uncle Xiang&#39;s Notebook</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Sun, 14 Jun 2026 07:36:00 +0800</lastBuildDate><atom:link href="https://ttf248.life/en/tags/agent/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Codex goal embeds the completion criteria within the task itself</title>
        <link>https://ttf248.life/en/p/codex-goal-command-explained/</link>
        <pubDate>Wed, 27 May 2026 20:11:41 +0800</pubDate>
        
        <guid>https://ttf248.life/en/p/codex-goal-command-explained/</guid>
        <description>&lt;p&gt;&lt;code&gt;/goal&lt;/code&gt; is easily misinterpreted as a command to &amp;ldquo;let the agent work for a bit longer.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This, of course, is merely its surface manifestation. If you give Codex a goal, it can continuously progress around that objective, instead of stopping after a single round of answers. But what is truly noteworthy is not how long it &amp;ldquo;runs,&amp;rdquo; but rather that it converts &amp;ldquo;what constitutes completion&amp;rdquo; from a temporary reminder into an intrinsic part of the task itself.&lt;/p&gt;
&lt;p&gt;A standard prompt describes what needs to happen next. A &lt;code&gt;goal&lt;/code&gt;, however, is more like attaching a checklist/acceptance form to an agent: What is the objective? Where are the boundaries? Which validations must pass? What conditions must be met for it to be considered complete?&lt;/p&gt;
&lt;h2 id=&#34;goal-is-not-the-continue-button&#34;&gt;goal is not the continue button
&lt;/h2&gt;&lt;p&gt;If we only look at its command form, &lt;code&gt;/goal&lt;/code&gt; resembles an enhanced version of &amp;ldquo;continue until complete.&amp;rdquo; However, this shifts the focus.&lt;/p&gt;
&lt;p&gt;The hardest part of long tasks isn&amp;rsquo;t whether the model wants to continue, but who decides whether or not to proceed after each round.&lt;/p&gt;
&lt;p&gt;When you ask an agent to migrate a frontend project, it might stop simply after adjusting the routes, believing the job is done; when you ask it to fix a test, it might just make the current failing use case pass and quit; and when you ask it to rewrite a batch of articles, it might think it&amp;rsquo;s handled all the critical parts by only tweaking a few high-risk drafts.&lt;/p&gt;
&lt;p&gt;These stopping points may not be incorrect, but they often differ from human acceptance criteria (or: human standards/judgments).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;goal&lt;/code&gt; is to solve this: clearly define the acceptance criteria beforehand, so that every subsequent round can judge against it.&lt;/p&gt;
&lt;p&gt;A broad objective is as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;/goal Help me migrate the frontend to Next.js
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Its problem isn&amp;rsquo;t that its output is too short, but rather that it lacks a definite termination/stopping condition. Codex can rewrite multiple pages, can restructure components on the fly, and also continuously fill in whatever content it deems necessary.&lt;/p&gt;
&lt;p&gt;A more usable way would be something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;/goal Migrate the order management backend from React Router to Next.js App Router.
The visual behavior of the login page, order list, order details, and checkout page must be consistent with the old version.
Do not change the API contracts or database schemas.
After completing a batch of pages, run npm run build, npm test, and Playwright critical paths.
Only when all these validations pass can it be considered complete.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The extra details in this paragraph are not filler; they constitute four control planes:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Element&lt;/th&gt;
          &lt;th&gt;Role&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Goal&lt;/td&gt;
          &lt;td&gt;What is the final desired outcome?&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Boundary&lt;/td&gt;
          &lt;td&gt;Which interfaces, data, files, or behaviors cannot be handled manually?&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Validation&lt;/td&gt;
          &lt;td&gt;What evidence proves that it was truly completed?&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Termination&lt;/td&gt;
          &lt;td&gt;After fulfilling what conditions can it stop?&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These four things are what raise the &lt;code&gt;goal&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-does-it-run-for-a-long-time&#34;&gt;Why Does It Run For A Long Time?
&lt;/h2&gt;&lt;p&gt;Codex can continue to make progress on the goal, not because a single answer was elongated.&lt;/p&gt;
&lt;p&gt;The actual workflow resembles a loop: plan, execute, observe tool results, revise, and then decide whether or not to continue. Build failures, test failures, screenshot inconsistencies, lint errors, or failed evaluation samples will all send the task back to the next iteration.&lt;/p&gt;
&lt;p&gt;If the goal specifies a verification method, the agent cannot simply declare it &amp;ldquo;should be fine&amp;rdquo; based on intuition. It must obtain evidence. If the evidence hasn&amp;rsquo;t returned, further investigation continues; if the evidence fails, remediation is required; only when all evidence passes can the work be signed off.&lt;/p&gt;
&lt;p&gt;This is why &lt;code&gt;goal&lt;/code&gt; is suitable for tasks such as transfer, refactoring, batch revision, prompt eval, and troubleshooting long pipelines. Their common characteristic is that they cannot be completed in a single pass, and their completion cannot rely solely on subjective judgment.&lt;/p&gt;
&lt;p&gt;Conversely, such a goal is very dangerous:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;/goal Think of a more advanced product solution
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It lacks boundaries, validation, or a stopping condition. The agent might run for a long time, but running for a long time does not equate to usefulness. At a minimum, you must clearly state how many sets of solutions are produced, what constraints are covered, what criteria are used for filtering, and when it should stop.&lt;/p&gt;
&lt;h2 id=&#34;claude-code-is-also-handling-the-same-thing&#34;&gt;Claude Code is also handling the same thing
&lt;/h2&gt;&lt;p&gt;Claude Code also has &lt;code&gt;/goal&lt;/code&gt;. The official documentation explains it more directly: The user sets a completion condition, and Claude will continue working across turns until the condition is met.&lt;/p&gt;
&lt;p&gt;The Claude Code documentation also mentions that at the end of each round, it checks whether the completion condition is met; if not, it proceeds to the next round. This point is critical because it externalizes the action of &amp;ldquo;proceeding&amp;rdquo; from the model&amp;rsquo;s own subjective conclusion and turns it into an additional conditional judgment.&lt;/p&gt;
&lt;p&gt;The specific implementation details of the two companies do not need to be forced to be equivalent, but their direction is consistent: the terminal agent is moving from &amp;ldquo;executing the next instruction&amp;rdquo; towards &amp;ldquo;continuously progressing around verifiable goals.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;We can simply categorize it as:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Capability&lt;/th&gt;
          &lt;th&gt;What problem it addresses&lt;/th&gt;
          &lt;th&gt;Suitable Scenarios&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Defines explicit completion criteria and advances to verifiable results across multiple rounds&lt;/td&gt;
          &lt;td&gt;Migration, Refactoring, Batch Fixing, Long-running tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;/loop&lt;/code&gt; or looping capabilities&lt;/td&gt;
          &lt;td&gt;Allows the same task to execute repeatedly based on count or condition&lt;/td&gt;
          &lt;td&gt;Retries, Generating candidates, Batch exploration&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;hooks&lt;/td&gt;
          &lt;td&gt;Automatically executes rules on fixed events&lt;/td&gt;
          &lt;td&gt;Formatting, Testing, Notification, Logging&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Sub-agent/Multi-agent view&lt;/td&gt;
          &lt;td&gt;Decomposes tasks for observation and progression by different worker threads&lt;/td&gt;
          &lt;td&gt;Parallel analysis, Modular implementation, Long-term background tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Memory/Project README file&lt;/td&gt;
          &lt;td&gt;Solidifies long-term constraints and repository rules&lt;/td&gt;
          &lt;td&gt;Team standards, Code style, Tool entry points&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In this table, the position of &lt;code&gt;goal&lt;/code&gt; is very clear: it neither replaces hooks nor replaces memory. It manages &amp;ldquo;what level constitutes completion for this task.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;goals-should-be-written-like-acceptance-criteria&#34;&gt;Goals should be written like acceptance criteria
&lt;/h2&gt;&lt;p&gt;I will now write a goal out in four lines:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;Objective: What final user-visible result must appear.
Boundaries: Which files, interfaces, data, visual elements, or behaviors cannot be modified/tampered with.
Verification: What commands, tests, screenshots, evaluations, or manual checks serve as evidence.
Stopping Condition: Stop when all conditions are met; pause when encountering specific permissions, facts, or product judgments.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is very different from a regular prompt.&lt;/p&gt;
&lt;p&gt;A regular prompt is more like a next step action, while a goal is more like the completion criteria. It&amp;rsquo;s not about making the human disappear from the process; rather, it&amp;rsquo;s about embedding human judgment upfront. You no longer have to remind it round after round that &amp;ldquo;this isn&amp;rsquo;t complete,&amp;rdquo; but instead write down what constitutes completion as mandatory conditions from the very beginning.&lt;/p&gt;
&lt;p&gt;Therefore, the more you want the agent to run autonomously, the narrower you must define its goals.&lt;/p&gt;
&lt;p&gt;The less you focus on monitoring the process, the more rigorous the validation documentation must be.&lt;/p&gt;
&lt;p&gt;The more you don&amp;rsquo;t want it to deviate, the clearer you must write the boundaries.&lt;/p&gt;
&lt;p&gt;This is where &lt;code&gt;goal&lt;/code&gt; truly deserves attention. It&amp;rsquo;s not about adding more commands, nor is it how long it can run; rather, it&amp;rsquo;s that the terminal agent is beginning to bring &amp;ldquo;who determines completion&amp;rdquo; into focus.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/codex/use-cases/follow-goals&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Follow a goal | Codex use cases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/codex/cli/slash-commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Slash commands in Codex CLI | OpenAI Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/blog/run-long-horizon-tasks-with-codex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Run long horizon tasks with Codex | OpenAI Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://code.claude.com/docs/en/goal&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Keep Claude working toward a goal | Claude Code Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;authors-notes&#34;&gt;Author&amp;rsquo;s Notes
&lt;/h2&gt;&lt;h3 id=&#34;original-prompt&#34;&gt;Original Prompt
&lt;/h3&gt;&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;$blog-writer Detail Codex&#39;s newly released goal command: what is its working principle, why does it run for such a long time, and what are the official examples? Does Claude Code have similar naming/commands? Additionally, compile the useful and popular new features recently released by two terminal tools into a table.
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;writing-idea-outline&#34;&gt;Writing Idea Outline
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Keep the original main judgment: The core of &lt;code&gt;goal&lt;/code&gt; is not the command name, but the completion condition.&lt;/li&gt;
&lt;li&gt;Reinsert the Claude Code comparison and terminal function table required in the original prompt.&lt;/li&gt;
&lt;li&gt;Eliminate &amp;ldquo;official document recitation&amp;rdquo; and focus on how to write a usable goal.&lt;/li&gt;
&lt;/ul&gt;</description>
        </item>
        
    </channel>
</rss>
