Back to Resources

Which OpenAI model should I use, and when?

If you only want one default, start with GPT-5.4. Push routine work to mini or nano, and save pro or Codex for the passes that actually earn the extra weight.

Last updated: 2026-04-15 · Tested against OpenAI Docs model lineup, GPT-5.4 guidance, reasoning best practices, and code-generation guidance

If you are asking which OpenAI model should I use, the shortest honest answer is this: start with gpt-5.4, push repeated work down to mini or nano, and only reach for pro or Codex when the workflow actually earns it. Most teams do the opposite. They pick the biggest thing first, then spend the next week building guardrails around a stack that was too heavy for the job.

I am not trying to turn this into a benchmark roundup. I want to hand you a sane default and a clean reason to move off it.

Which OpenAI model should I use if I only want one default?

Start with gpt-5.4. OpenAI's current model docs position it as the default place to begin for broad general-purpose work, and the code-generation guide points to it as the default for most API code generation (Models, Code generation).

That is where I start too, but only on the part of the workflow where judgement actually matters. A default is useful because it removes hesitation. It becomes expensive when you let that same default sit in every formatting pass, every retry loop, and every tiny support task long after the hard thinking is over.

This is also where people overbuild. If the workflow is still fuzzy, a bigger model stack usually makes the confusion look sophisticated instead of solving it. That is the same trap behind a lot of unnecessary agent talk, which is why You Don't Need an AI Agent is a useful gut check before you add more moving parts.

When should I move down to mini or nano?

This is the part people tend to skip. They find a model that works once, then refuse to split the workload by task shape.

OpenAI's current docs frame gpt-5.4-mini around high-volume coding, computer use, and subagents, while gpt-5.4-nano is aimed at simple high-volume classification, extraction, and ranking (GPT-5.4 mini, GPT-5.4 nano).

I think about the lineup like this:

OptionBest fitI would avoid it when
gpt-5.4Ambiguous work, final answers, difficult debugging, messy planningThe task is repetitive and easy to verify
gpt-5.4-miniRepeated coding loops, structured transforms, support tasks inside a workflowThe step needs deep business judgement
gpt-5.4-nanoClassification, extraction, ranking, simple routingThe output needs open-ended synthesis
gpt-5.4-proSlow, expensive second opinions on genuinely hard workThe user is waiting on every request

That table is not about prestige. It is about where the failure cost lives. In our experience, mini is the workhorse tier. Nano is the utility knife. The mistake is not choosing a smaller model. The mistake is choosing a smaller tier for a step that still needs a human to reread everything line by line.

The decision tree below is the practical version I would use before I start debating pricing, snapshots, or architecture.

Flowchart
12 linesmedium

flowchart TD
  A["New task"] --> B{"Is the step ambiguous,<br />high-stakes, or multi-step?"}
  B -- "Yes" --> C["Start with gpt-5.4"]
  B -- "No" --> D{"Is it repeated coding,<br />computer use, or subagent work?"}
  D -- "Yes" --> E["Use gpt-5.4-mini"]
  D -- "No" --> F{"Is it classification,<br />extraction, or ranking?"}
  F -- "Yes" --> G["Use gpt-5.4-nano"]
  F -- "No" --> H["Stay on gpt-5.4 or test mini"]
  C --> I{"Need a slower,<br />deeper pass?"}
  I -- "Yes" --> J["Escalate to gpt-5.4-pro"]
  C --> K{"Need tool use,<br />files, tests, and diffs?"}
  K -- "Yes" --> L["Use Codex"]
Rendered from Mermaid source with the native ZeroLabs diagram container.

If a team can explain its workflow in that shape, the model choice usually stops being dramatic.

When should I use pro or a reasoning-style workflow?

This is where I slow down and get specific. I would not put gpt-5.4-pro in front of every user message or every background task. I would treat it as the expensive second opinion for the narrow pass where a miss would be painful.

The current GPT-5.4 guidance and reasoning best-practices docs are useful here because they frame the tradeoff plainly: move toward the slower, stronger path when reliability matters more than speed or cost, especially on harder multi-step work (Using GPT-5.4, Reasoning best practices).

In practice, that means I use the stronger pass late. Planning a difficult implementation. Reviewing a risky answer. Pressure-testing a final recommendation before it leaves the system. That is the same pattern we care about in AI Review Agents Content Pipeline: spend your expensive judgement where it compounds, not where it just pads the stack.

Should I use Codex or call a model directly?

This one gets cleaner once you stop treating Codex as just another model name. OpenAI's code-generation guide separates the jobs pretty well: gpt-5.4 is the default for most API code generation, while Codex is the fit for agentic software engineering (Code generation).

If you want one response, a structured transform, or a normal chat-style coding assist, call a model directly. If the job needs file edits, tool calls, tests, diffs, retries, and a system that can keep working through a task, that is where Codex starts to make more sense.

I would not reach for Codex just to feel advanced. I would reach for it when the work itself has become agent-shaped. Until then, a direct model call is often easier to reason about, easier to audit, and easier to replace later.

Frequently asked questions

Which model should I start with if I only want one default?

Start with gpt-5.4, but test it on a small set of real tasks from your workflow, not on generic chat prompts. Five honest examples from the actual job will teach you more than fifty abstract prompts because you can see where the model is reasoning, stalling, or over-answering.

When is it worth moving from gpt-5.4 to mini or nano?

Move down when the output shape is stable enough that software or a quick human scan can catch failures. If the cheaper step still creates a long manual review loop, you probably downgraded too early and turned price savings into cleanup work.

Should I use Codex, or just call a model directly?

Use Codex when the task needs an environment, not just an answer. If the system has to touch files, run checks, inspect diffs, and keep working across several steps, the product shape matters as much as the underlying model.

What should I recheck before I publish internal model guidance?

I would recheck the model chooser page, the current GPT-5.4 family guidance, and the code-generation docs before publishing.

Pick the smallest model that can honestly carry the job

That is the whole rule.

Start with gpt-5.4 when the task is still fuzzy or the judgement is hard. Push the repeated middle work down to mini. Use nano when the output is tight and easy to verify. Bring in pro when a slower second opinion is genuinely worth the wait. Use Codex when the job has crossed from answering into doing.

That path keeps the work legible. It also stops the choice from turning into a bigger engineering story than the job deserves.


Ready to apply this to a real build? Start with You Don't Need an AI Agent if the workflow still feels heavier than the job, then read AI Review Agents Content Pipeline if you want a concrete example of where the stronger model should sit.

Read You Don't Need an AI Agent | Read AI Review Agents Content Pipeline

Share