Codex goal embeds the completion criteria within the task itself

goal is not the continue button

If we only look at its command form, /goal resembles an enhanced version of “continue until complete.” However, this shifts the focus.

The hardest part of long tasks isn’t whether the model wants to continue, but who decides whether or not to proceed after each round.

When you ask an agent to migrate a frontend project, it might stop simply after adjusting the routes, believing the job is done; when you ask it to fix a test, it might just make the current failing use case pass and quit; and when you ask it to rewrite a batch of articles, it might think it’s handled all the critical parts by only tweaking a few high-risk drafts.

These stopping points may not be incorrect, but they often differ from human acceptance criteria (or: human standards/judgments).

The goal is to solve this: clearly define the acceptance criteria beforehand, so that every subsequent round can judge against it.

A broad objective is as follows:

/goal Help me migrate the frontend to Next.js

Its problem isn’t that its output is too short, but rather that it lacks a definite termination/stopping condition. Codex can rewrite multiple pages, can restructure components on the fly, and also continuously fill in whatever content it deems necessary.

A more usable way would be something like this:

/goal Migrate the order management backend from React Router to Next.js App Router.
The visual behavior of the login page, order list, order details, and checkout page must be consistent with the old version.
Do not change the API contracts or database schemas.
After completing a batch of pages, run npm run build, npm test, and Playwright critical paths.
Only when all these validations pass can it be considered complete.

The extra details in this paragraph are not filler; they constitute four control planes:

Element	Role
Goal	What is the final desired outcome?
Boundary	Which interfaces, data, files, or behaviors cannot be handled manually?
Validation	What evidence proves that it was truly completed?
Termination	After fulfilling what conditions can it stop?

These four things are what raise the goal.

Why Does It Run For A Long Time?

Codex can continue to make progress on the goal, not because a single answer was elongated.

The actual workflow resembles a loop: plan, execute, observe tool results, revise, and then decide whether or not to continue. Build failures, test failures, screenshot inconsistencies, lint errors, or failed evaluation samples will all send the task back to the next iteration.

If the goal specifies a verification method, the agent cannot simply declare it “should be fine” based on intuition. It must obtain evidence. If the evidence hasn’t returned, further investigation continues; if the evidence fails, remediation is required; only when all evidence passes can the work be signed off.

This is why goal is suitable for tasks such as transfer, refactoring, batch revision, prompt eval, and troubleshooting long pipelines. Their common characteristic is that they cannot be completed in a single pass, and their completion cannot rely solely on subjective judgment.

Conversely, such a goal is very dangerous:

/goal Think of a more advanced product solution

It lacks boundaries, validation, or a stopping condition. The agent might run for a long time, but running for a long time does not equate to usefulness. At a minimum, you must clearly state how many sets of solutions are produced, what constraints are covered, what criteria are used for filtering, and when it should stop.

Claude Code is also handling the same thing

Claude Code also has /goal. The official documentation explains it more directly: The user sets a completion condition, and Claude will continue working across turns until the condition is met.

The Claude Code documentation also mentions that at the end of each round, it checks whether the completion condition is met; if not, it proceeds to the next round. This point is critical because it externalizes the action of “proceeding” from the model’s own subjective conclusion and turns it into an additional conditional judgment.

The specific implementation details of the two companies do not need to be forced to be equivalent, but their direction is consistent: the terminal agent is moving from “executing the next instruction” towards “continuously progressing around verifiable goals.”

We can simply categorize it as:

Capability	What problem it addresses	Suitable Scenarios
`/goal`	Defines explicit completion criteria and advances to verifiable results across multiple rounds	Migration, Refactoring, Batch Fixing, Long-running tasks
`/loop` or looping capabilities	Allows the same task to execute repeatedly based on count or condition	Retries, Generating candidates, Batch exploration
hooks	Automatically executes rules on fixed events	Formatting, Testing, Notification, Logging
Sub-agent/Multi-agent view	Decomposes tasks for observation and progression by different worker threads	Parallel analysis, Modular implementation, Long-term background tasks
Memory/Project README file	Solidifies long-term constraints and repository rules	Team standards, Code style, Tool entry points

In this table, the position of goal is very clear: it neither replaces hooks nor replaces memory. It manages “what level constitutes completion for this task.”

Goals should be written like acceptance criteria

I will now write a goal out in four lines:

Objective: What final user-visible result must appear.
Boundaries: Which files, interfaces, data, visual elements, or behaviors cannot be modified/tampered with.
Verification: What commands, tests, screenshots, evaluations, or manual checks serve as evidence.
Stopping Condition: Stop when all conditions are met; pause when encountering specific permissions, facts, or product judgments.

This is very different from a regular prompt.

A regular prompt is more like a next step action, while a goal is more like the completion criteria. It’s not about making the human disappear from the process; rather, it’s about embedding human judgment upfront. You no longer have to remind it round after round that “this isn’t complete,” but instead write down what constitutes completion as mandatory conditions from the very beginning.

Therefore, the more you want the agent to run autonomously, the narrower you must define its goals.

The less you focus on monitoring the process, the more rigorous the validation documentation must be.

The more you don’t want it to deviate, the clearer you must write the boundaries.

This is where goal truly deserves attention. It’s not about adding more commands, nor is it how long it can run; rather, it’s that the terminal agent is beginning to bring “who determines completion” into focus.

References

Author’s Notes

Original Prompt

$blog-writer Detail Codex's newly released goal command: what is its working principle, why does it run for such a long time, and what are the official examples? Does Claude Code have similar naming/commands? Additionally, compile the useful and popular new features recently released by two terminal tools into a table.

Writing Idea Outline

Keep the original main judgment: The core of goal is not the command name, but the completion condition.
Reinsert the Claude Code comparison and terminal function table required in the original prompt.
Eliminate “official document recitation” and focus on how to write a usable goal.

goal is not the continue button

Why Does It Run For A Long Time?

Claude Code is also handling the same thing

Goals should be written like acceptance criteria

References

Author’s Notes

Original Prompt

Writing Idea Outline

Loop engineering moves the person to the checkpoint

Skill is not a new prompt, it is the job manual for the agent.

The End of Low-Cost API Gateways: Large Model Experiences and the Impossible Triangle in March

Command-line AI Coding Interaction

Vera Rubin's 10x Tokens per Megawatt — Will It Bring Down AI Prices?