OpenAI o3 and o4-mini: Tool-Using Reasoning

Tool-Using Reasoning Models

In April 2025, OpenAI released o3 and o4-mini—reasoning models that can use tools during their thinking process. Unlike previous reasoning models (o1, o3-mini) that could only think internally, o3 and o4-mini can browse the web, execute code, analyze files, and generate images as part of their chain-of-thought reasoning.

This is a fundamental architectural shift. Previous models had to complete their reasoning before using tools. o3 interleaves thinking and tool use, enabling complex multi-step workflows that were previously impossible in a single model call.

How Tool-Integrated Reasoning Works

Traditional reasoning models follow a think-then-act pattern. o3 introduces think-act-think cycles:

text
Traditional (o1):
Think → Think → Think → Answer → [Tools available after]

o3 Architecture:
Think → Use Tool → Think about results → Use another tool → 
Think more → Generate code → Analyze output → Answer

Practical example:

text
User: "Which country has higher GDP growth - India or Vietnam? 
       Show me a chart of the last 5 years."

o3's reasoning chain:
1. [THINK] I need current GDP data for both countries
2. [BROWSE] Search for India GDP growth 2020-2024
3. [THINK] Found India data: 7.2%, -6.6%, 8.7%, 7.2%, 6.8%
4. [BROWSE] Search for Vietnam GDP growth 2020-2024
5. [THINK] Found Vietnam data: 2.9%, 2.6%, 8.0%, 5.1%, 6.5%
6. [CODE] Generate matplotlib chart comparing both
7. [THINK] India has higher average but more volatile
8. [ANSWER] Comprehensive analysis with chart

Benchmark Performance

o3 represents a significant leap in reasoning benchmarks:

Benchmark	o3	o1	GPT-4o	Claude 3.5
AIME 2025	88.9%	79.2%	26.7%	32.1%
GPQA Diamond	87.7%	78.0%	53.6%	65.0%
SWE-bench Verified	69.1%	48.9%	33.2%	49.0%
Codeforces (Elo)	2727	1891	900	1200
ARC-AGI (semi)	87.5%	32.0%	5.0%	21.0%

The SWE-bench Verified score of 69.1% is remarkable—o3 can autonomously resolve nearly 7 out of 10 real GitHub issues, including understanding codebases, writing fixes, and running tests.

o4-mini: Efficient Reasoning

o4-mini offers a compelling efficiency trade-off:

Aspect	o3	o4-mini	GPT-4o
Speed	Slow	Fast	Fast
Cost	$$$	$	$$
AIME 2025	88.9%	92.7%	26.7%
MATH-500	98.6%	98.0%	94.3%
Coding	Excellent	Excellent	Good
Tool Use	Yes	Yes	Yes

Surprisingly, o4-mini outperforms o3 on AIME 2025 (92.7% vs 88.9%), suggesting that for math and competition problems, the smaller model's focused reasoning is more effective.

Codex Integration

OpenAI simultaneously launched Codex—a cloud-based software engineering agent powered by o3:

Autonomous coding: Assign tasks and Codex works independently
Git-integrated: Creates branches, writes code, runs tests, submits PRs
Sandboxed execution: Each task runs in an isolated environment
Multi-file understanding: Navigates entire repositories
Test-driven: Runs existing test suites to verify changes

python
# Codex task example
# User assigns: "Add pagination to the /api/users endpoint"
# 
# Codex autonomously:
# 1. Reads existing API code and tests
# 2. Implements cursor-based pagination
# 3. Adds query parameters (limit, cursor)
# 4. Updates tests
# 5. Runs test suite
# 6. Creates PR with description

Pricing and Access

Model	Input (per 1M tokens)	Output (per 1M tokens)	Reasoning Tokens
o3	$10.00	$40.00	Included in output
o4-mini	$1.10	$4.40	Included in output
GPT-4o	$2.50	$10.00	N/A

o4-mini at $1.10/$4.40 per million tokens offers frontier-level reasoning at a fraction of o3's cost, making it the practical choice for most applications.

Impact on AI Development

The o3/o4-mini release signals several trends:

Tool use is becoming native: Future models will think and act simultaneously
Reasoning is the differentiator: Raw knowledge matters less than the ability to reason through problems
Agentic AI is here: Models that can autonomously navigate codebases, browse the web, and execute multi-step workflows
Efficiency gains: o4-mini shows smaller models can match or exceed larger ones on reasoning tasks
Software engineering transformation: 69% SWE-bench means AI can handle the majority of routine software tasks

Developer Integration

Using o3's tool capabilities via the API:

python
from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="o3",
    input="Analyze the latest Python 3.13 release notes and create a summary",
    tools=[
        {"type": "web_search"},
        {"type": "code_interpreter"},
        {"type": "file_search"}
    ]
)

The unified tool interface means developers don't need to orchestrate tool calls manually—the model decides when and how to use tools as part of its reasoning process.

Sources: OpenAI o3/o4-mini, OpenAI Codex, OpenAI API Docs

Conclusion

The o3 and o4-mini release marks a pivotal moment in AI development: the transition from models that simply generate text to models that actively interact with the world while reasoning. Tool-integrated reasoning is the foundation for truly autonomous AI agents—systems that can research, code, analyze, and create without constant human guidance.

For developers, the practical implication is clear: build applications that leverage tool use, not just text generation. The most valuable AI applications in 2025 and beyond will be those that combine reasoning depth with real-world action capability.

Sources: OpenAI o3/o4-mini, OpenAI API, OpenAI Research

OpenAI o3 and o4-mini: Tool-Using Reasoning Models

Tool-Using Reasoning Models

How Tool-Integrated Reasoning Works

Benchmark Performance

o4-mini: Efficient Reasoning

Codex Integration

Pricing and Access

Impact on AI Development

Developer Integration

Conclusion

Let's Take the Next Step Together

OpenAI o3 and o4-mini: Tool-Using Reasoning Models

Tool-Using Reasoning Models

How Tool-Integrated Reasoning Works

Benchmark Performance

o4-mini: Efficient Reasoning

Codex Integration

Pricing and Access

Impact on AI Development

Developer Integration

Conclusion

Related Articles

Samsung Galaxy S26 Turns Your Phone Into an AI Agent

An AI Model Just Read 30,000 Brain MRIs with 97.5% Accuracy

OpenAI's Pentagon Deal: The Autonomous Weapons Debate That Split the AI Industry

Let's Take the Next Step Together