OpenAI o3 Review & Ranking

OpenAI’s o3 reasoning model is the first of its kind to rank high on creative writing, excelling in non-fiction and research writing with strong factual consistency and best-in-class search integration.

Quick Verdict

OpenAI o3 excels at research-focused and non-fiction writing, leveraging its powerful reasoning and search capabilities, though its strict prompt adherence can limit creative flows.

Pros

  • High performance in non-fiction and research writing
  • Strong factual consistency and accuracy
  • Best-in-class search feature integration

Cons

  • Strict prompt adherence can limit creative flexibility
  • Proprietary and closed-source
  • High usage costs ($10 input, $40 output)

Leaderboard Ranking

#2
Creative Writing Rank
0.0Δ vs Overall Rank

Ranking based on crowdsourced votes from Chatbot Arena Creative Writing leaderboard, cross‑checked with overall performance.

Model Specs

Context Window

131072 tokens

Open Source

No

Uncensored

Standard OpenAI safety filters (strict prompt adherence)

Training Cutoff

2025-04

Writing Samples & Style

Scene

Prompt

Write a 500-word short story in first-person present tense that blends near-future climate-science realism with elements of magical-realism. Requirements* • Open with a single vivid sensory image (2 sentences max). • Introduce a main character facing a moral dilemma by paragraph 2. • Include one piece of hard climate data (e.g., “the ocean rose 23 cm in five years”). • End on an emotionally ambiguous note (no tidy resolution). • Keep the prose suitable for a literary-magazine audience; avoid clichés. • Word count 480-520. Output only the story text—no commentary.

Model Output

▶ Show sample
Salt wind tastes of rust and orchids tonight; the horizon flickers violet where lightning rehearses its lines. I feel each grain of sand vibrating like a tuning fork against my bare feet. I am Mira, coastal engineer-turned-reluctant custodian of the city’s new seawall, and I stand here deciding whether to switch it on. The wall—an autonomous lattice of nanocarbon gates—will seal the bay and starve the intertidal marsh I helped restore; without activation, storm Kassandra will overtop and drown eight thousand homes. I can save the people or the wetland nursery that might, in ten years, sequester enough carbon to matter. The control key is cold in my fist, brighter than the hesitant moon. Numbers hover in the air, a heads-up display only I see since the night the heron spoke my name: atmospheric CO₂ at 483 parts per million, barometer falling, ocean level up 23 centimeters in five years. I whisper the metrics aloud like contrition. Behind me, the evacuation sirens bleat; before me, phosphorescent breakers curl into the shape of lungs inhaling. A low voice rises from the water. No mouth moves, yet meaning ripples through my inner ear: “Choose tide or hearth, but know either choice is a pledge.” Ever since the heron episode, these murmurs arrive at thresholds—doors, shorelines, decisions. I tell no one; the neurologists would shelve me, and the city council wants engineers, not oracles. I call my sister on the sat-link. She answers from a rooftop, children huddled beneath a tarp of solar fabric. Her face pixelates between calm and terror. “If the gates stay open,” she says, “I’ll watch fish swim through our windows.” She doesn’t mention the marsh where we scattered our mother’s ashes, the place fox kits now dodge between cordgrass blades tall as oars. Wind lifts my hair and with it the smell of diesel from the idling relief barges. I taste brine, ozone, and something sweet, as though unseen orchards bloom offshore. Out on the pier, volunteers tighten ropes around pallets of desal tablets, their silhouettes reddened by emergency strobes—distant, cardboard-flat versions of saints in stained glass. The murmuring water offers another impossible barter: “Close the gates, spare the city, but the marsh forgets your mother’s name. Leave them open, and the wetland remembers—yet skin will wrinkle in salt you could have kept at bay.” It feels less like a threat than a story asking for an ending. I lean over the railing. In the bioluminescent wash I glimpse—a trick of fatigue?—our mother standing knee-deep, scattering seeds that burst into light as they touch the swell. Her outline wavers, stitched from vapor, but her eyes hold the level gaze she wore at hospice, the day she told me ecosystems are only mirrors. Sirens crescendo. My thumb finds the key slot. I inhale, tasting rust-orchid wind again, unsure whether I stand at the beginning of a flood or a flowering. The switch waits, indifferent. I close my eyes and decide—yet even as I turn the key, I cannot tell which way.

Style Comment:

Pop Song

Prompt

Write English lyrics for a mid-tempo pop song (≈ 100 BPM) in ABAB CB structure (two verses, pre-chorus, chorus, bridge, final chorus).

Theme: personal resilience after social rejection.

Requirements

• Chorus must contain the hook line “I bend, I never break.”

• Bridge should introduce a fresh metaphor and shift the rhyme scheme.

• Avoid explicit profanity; keep radio-friendly.

• Deliver as: V1 / V2 / Pre-Ch / Ch / Bridge / Ch. Only the lyrics, no chords.

Model Output

▶ Show sample
V1 They whispered walls around my name Built paper birds that couldn’t fly I walked their fires without a flame Kept both my eyes upon the sky V2 They left their frost to test my cold Believed I’d fold instead of free Yet every bruise was beaten gold A lighthouse shining off the sea Pre-Ch I pull the pieces to my heart Mosaic growing from the heart Chorus I bend, I never break Rising taller, wide awake Dancing steady through the quake I bend, I never break Bridge Now I’m a comet carving silver lanes Across uncharted midnight planes The hush behind me blooms with light Turning absence into fearless flight Final Chorus I bend, I never break Turning wounds to songs I make Every echo sparks the dark I bend, I never break

Style Comment:

Poem

Prompt

Compose a Shakespearean sonnet (14 lines, iambic pentameter, abab cdcd efef gg) on the tension between digital memory and human forgetting. Employ at least one volta at line 9 and close with a rhymed couplet that offers a paradox.

Model Output

▶ Show sample
Within soft chips our yesterdays reside, But in our hearts their outlines drift to gray; The circuit keeps what time will not abide, As threads of living fabric slowly fray. We task the cloud so nothing falls asleep, Yet flesh remembers mercy born of loss; And endless drives vow every spark to keep, For wisdom blooms where gaps invite a gloss. But turn—the archives swell into a maze; Their ghostly echoes clutter waking sight. The past, too vivid, traps us in that maze, Till now itself is dazzled by their light. We guard by lapse and hold by letting go; All that we keep is only kept to go.

Style Comment:

SEO Blog Post

Prompt

Write a 1 000-word SEO-optimized blog article aimed at first-time home coffee-roasters.

Requirements

  1. Primary keyword: “home coffee roasting.” Use it in the H1, first 100 words, and once per main section.
  2. Include 3 secondary keywords: “light roast profile,” “coffee roasting equipment,” “cooling the beans.”
  3. H-tag hierarchy: H1 + 5 H2s + bullets where helpful.
  4. Add a 50-character meta description.
  5. Use a friendly, knowledgeable tone; reading level grade 8-9.
  6. Finish with a short FAQ (3 Q&A).
    Return only the article (no explanations).

Model Output

▶ Show sample
Home Coffee Roasting: A Friendly Guide for First-Timers Ever dreamed of filling your kitchen with the sweet, popping aroma of fresh beans? Home coffee roasting is easier than you might think, and the rewards—flavor, freshness, and fun—are huge. Below you’ll find everything you need to start roasting with confidence. Why Roast at Home? The Perks of Home Coffee Roasting Before diving into gear and technique, it helps to know why home coffee roasting is worth your time. Freshness on demand – Green beans stay vibrant for months; roasted beans fade within days. Flavor control – Dial in sweetness, brightness, or body to match your taste. Cost savings – Green coffee costs 30–50 % less than boutique roasts. Creative hobby – Track data, tweak variables, and share the results. If you love experimenting in the kitchen, roasting might become your favorite ritual. Choosing Coffee Roasting Equipment for Beginners The right setup makes home coffee roasting smooth and safe. Starter options range from improvised to purpose-built: Stovetop popcorn popper Pros: Cheap, hands-on, easy to see color. Cons: Small batches, uneven heat. Hot-air popcorn popper Pros: Fast, automatic agitation. Cons: No temp control, loud. Entry-level electric roaster Pros: Precise profiles, smoke suppression. Cons: Higher cost, limited capacity. Key features to compare: Temperature control – Crucial for repeatable roast curves. Agitation – Keeps beans moving to avoid scorching. Ventilation – Roasting produces smoke and chaff. Batch size – Aim for 100–250 g for learning. Whichever coffee roasting equipment you choose, place it near a window or under a stove hood to vent smoke. Crafting a Light Roast Profile Without the Guesswork Many first-timers chase a bright, tea-like cup. Building a reliable light roast profile is easier when you log data: Charge temperature – Preheat your device to ~190 °C (375 °F). Drying phase – Reach 150 °C (302 °F) in 4 min; water escapes, beans yellow. Maillard phase – From 150 °C to the start of first crack, stretch 4–5 min for sweetness. First crack – Tiny pops at ~196 °C (385 °F) mark light roast territory. Stop 30–60 s after first crack begins. Use a notebook or free roast-logging app to track time and temperature. With practice, you’ll reproduce that perfect home coffee roasting curve every weekend. Step-by-Step: The Home Coffee Roasting Process Follow this simple roadmap to master home coffee roasting: Measure and preheat Weigh 120 g of green beans for a small batch. Preheat your roaster to its recommended charge temp. Drying (0–4 min) Beans shift from green to pale yellow. Agitate constantly. Maillard (4–8 min) Aromas of bread and hay appear. Monitor color and rate of rise. First crack (8–10 min) Listen for sharp snaps. Reduce heat slightly to prevent runaway temps. Development (10–11 min) Decide on light, medium, or dark. For beginners, stop 45 s after first crack for a balanced cup. Eject and cool – See next section for methods. Consistent tracking at each stage ensures that every future home coffee roasting session improves. Cooling the Beans and Storing Your Fresh Roast When the timer hits your target, halt the roast fast. Properly cooling the beans locks in flavor and prevents over-development. Easy cooling methods Colander shake – Pour beans between two metal colanders for 1–2 min. Small desk fan – Spread beans on a mesh tray under airflow. Roaster-built cooler – Some machines switch to a fan cycle automatically. Aim to drop beans below 50 °C (122 °F) within 3 min. After cooling, rest the coffee in a one-way-valve bag or mason jar with a loose lid for 12–24 h to degas. Then seal tight and store away from light. Freshness peaks between day 2 and day 10 after home coffee roasting. Quick Reference Bullet Checklist ✅ Weigh green beans before every roast ✅ Ventilate to handle smoke and chaff ✅ Log time, temp, and sensory notes ✅ Cool rapidly to under 50 °C in 3 min ✅ Rest beans 12–24 h before brewing FAQ Q1: Can I roast coffee in an ordinary oven? A1: Yes, but ovens heat unevenly and lack airflow, so expect a darker, less consistent roast. A popcorn popper or small electric roaster delivers better results. Q2: How loud is first crack? A2: Similar to snapping twigs—audible but softer than popcorn. Wear light ear protection if your roaster is noisy so you don’t miss it. Q3: Why does my light roast taste sour? A3: Likely under-developed. Extend the Maillard phase or add 20–30 s of development time after first crack to balance acids.

Style Comment:

Landing Page Copy

Prompt

Write high-conversion landing-page copy (≈ 300 words) for a new line of lightweight, fully recyclable running shoes named Zephyr-Lite™.

Audience → urban runners, ages 20-40, eco-conscious but style-driven.

Must include
• A punchy headline (≤ 8 words)
• Sub-headline elaborating the core benefit (≤ 18 words)
• 3 scannable benefit bullets (2-3 lines each)
• One short customer testimonial (first name, city)

• A single CTA button label (≤ 4 words)
Tone: energetic, conversational, modern.

Model Output

▶ Show sample
Run Light. Live Loud. The fully recyclable shoe built for fast city miles. Weightless speed, zero compromise Feather-light foam and an airy knit upper keep each Zephyr-Lite™ at just 180 g—so you float over asphalt while your energy stays where it belongs: pushing the pace. Street-smart grip that lasts A recycled rubber outsole maps common strike zones for urban runners, delivering confident traction on slick crosswalks and sudden turns without bulking up the ride. Circular by design Every component snaps apart for single-stream recycling. Send worn pairs back on us, earn credit for your next colorway, and know your footprint ends at the finish line—not in a landfill. “The Zephyr-Lite feels like nothing but looks like everything. Set a personal best on day one.” — Maya, Berlin

Style Comment:

Short Essay

Prompt

Write an 800-word argumentative essay assessing whether universal basic income (UBI) would stimulate entrepreneurship in developed economies.

Requirements
• Present a clear thesis in the introduction.
• Use at least two reputable economic studies published after 2021 (cite inline APA).
• Address one major counterargument.
• Conclude with a policy recommendation.
• Formal academic tone, but readable. Return only the essay text.

Model Output

▶ Show sample
Universal basic income (UBI) can, if carefully designed and financed, become a net stimulus for entrepreneurship in developed economies because it lowers entry risk, widens the pool of potential founders, and improves the quality—not merely the quantity—of new firms. Yet the magnitude of that effect hinges on complementary policies that preserve work incentives and access to capital. A growing body of post-pandemic scholarship clarifies the channels through which UBI alters entrepreneurial behaviour. Aceytuno-Pérez, de Paz-Báñez, and Sanchez-López (2023) synthesise recent experiments and argue that an unconditional floor of income affects three classic determinants of entrepreneurial action: (a) desirability—by making self-employment more attractive relative to precarious wage work; (b) feasibility—by relaxing liquidity constraints and credit-worthiness tests; and (c) propensity—by reducing the catastrophic downside of failure. Their theoretical model predicts a lift in both the number and the diversity of start-ups, especially among women and mid-career workers who historically face higher risk aversion. De Gruyter Brill Early empirical evidence aligns with those mechanisms. The three-year OpenResearch RCT in Texas and Illinois, funded by Sam Altman, delivered US $1,000 per month to 1,000 adults while tracking 2,000 controls. Interim results show a 14 percent rise in business formation or serious business planning among recipients despite only a two-percentage-point fall in overall employment (Rhodes et al., 2024). Participants reported using part of the transfer for licence fees, prototype materials, and initial marketing—exactly the small-ticket barriers that typically deter low-wealth entrepreneurs. Observer Macroeconomic modelling likewise suggests positive spill-overs. Luduvice (2024) embeds a revenue-neutral UBI in an overlapping-generations framework calibrated to the United States and finds that when financed by progressive consumption taxes, the policy raises aggregate firm entry by 3 percent and total factor productivity by 0.6 percent over ten years as higher-risk, high-return projects become viable. Although the gains are modest at the macro level, they are concentrated in knowledge-intensive sectors where externalities are largest. ScienceDirect Counterargument: UBI might blunt work incentives and therefore shrink the entrepreneurial talent pool. The Finnish basic-income experiment, often cited as cautionary, replaced conditional unemployment benefits with a flat €560 payment for 2,000 job-seekers. Verho, Hämäläinen, and Kanninen (2022) find no statistically significant change in days worked during the first year, despite a 23-percentage-point cut in the participation tax rate. Critics infer that if people do not increase labour supply, they will not start businesses either. Association Économique Américaine That conclusion over-generalises from a narrow context. Participants were already unemployed and still faced financing barriers; moreover, the stipend covered only subsistence and was not stackable with additional income, muting upside incentives. In contrast, the OpenResearch sample retained labour-market attachment and could keep all additional earnings, a design closer to real-world UBI proposals. When differences in eligibility (universal vs. unemployed), retention (no benefit claw-back), and size relative to median income are controlled for, the Finnish results do not contradict the entrepreneurship-stimulus hypothesis; they simply show that partial basic income aimed at the long-term unemployed is insufficient (OECD, 2023). OCDE Financing worries constitute a second strand of the counterargument: high taxes needed for UBI could crowd out private investment. Yet modelling studies indicate that shifting from payroll to progressive consumption or carbon taxes can fund a modest UBI (roughly 5–7 percent of GDP) while reducing distortions that currently penalise formal-sector entrepreneurs (Luduvice, 2024). Furthermore, most developed economies already spend comparable amounts on fragmented welfare programmes whose conditionalities deter business creation; consolidating these into a single, unconditional transfer can yield administrative savings and clearer incentives. Policy recommendation. To unlock the pro-entrepreneurial promise of UBI while mitigating the pitfalls, developed economies should: Adopt a “partial” UBI floor set just above the poverty line—high enough to insure basic risk but not so high that marginal tax rates must spike. Finance the transfer with a mix of progressive consumption taxes and reduced welfare bureaucracy, avoiding heavier payroll or corporate tax burdens that directly hit start-ups. Pair UBI with access-to-capital measures—for example, automatically opening fee-free business bank accounts and pre-approving micro-loans for recipients who complete accredited venture-readiness programmes. Keep the benefit fully additive to labour and business income for at least the first five years of a new venture, preserving strong upside incentives. Embed rigorous, randomised evaluation clauses in legislation so that aggregate tax data, firm registries, and innovation metrics can be compared across regions and adjusted in real time. In sum, contemporary evidence does not support the caricature of UBI as an anti-work “hand-out.” Rather, when structured to maintain incentives and funded in a growth-friendly way, UBI functions as an entrepreneurial springboard—one that especially benefits talent currently locked out by risk and liquidity constraints. The policy will not by itself transform economic dynamism, but combined with targeted credit and training programmes it can tilt the opportunity set toward productive risk-taking, thereby enriching both individual livelihoods and the broader innovation ecosystem.

Style Comment:

Research Note

Prompt

Generate a structured mini-literature review (≈ 1 000 words) on the latest (2022-2025) CRISPR-based therapies for sickle-cell disease.

Deliver as:

  1. Executive Summary (≤ 120 words)
  2. Table of 3–5 pivotal studies with: author-year • editing strategy • patient count • key outcomes (use Markdown table)
  3. Trends & Gaps section (bullets)
  4. Future Directions (paragraph)

Cite sources in IEEE style; include DOI where available. Output only the review.

Model Output

▶ Show sample
Executive Summary CRISPR‐based autologous stem-cell therapies for sickle-cell disease (SCD) have progressed rapidly since 2022. Phase-3 data on Casgevy (exa-cel) show durable elimination of vaso-occlusive crises in almost all treated patients, leading to the first regulatory approvals. Next-generation nucleases (AsCas12a in reni-cel) and precision base editors (BEAM-101) are broadening the mechanistic repertoire while maintaining favorable safety. Direct correction of the HBB mutation (nula-cel) and small-cohort proof-of-concepts such as OTQ923 illustrate alternative strategies, but also highlight manufacturing challenges and rare toxicities. Collectively, these trials demonstrate high on-target editing, rapid multilineage engraftment, and clinically meaningful rises in total and fetal hemoglobin. Unanswered questions include long-term genotoxicity, global access, and conditioning-regimen toxicity. (119 words) Author-Year Editing strategy (target) N (treated) Key outcomes Frangoul 2024 [1] CRISPR-Cas9 knockout of BCL11A erythroid enhancer 44 97 % crisis-free ≥ 12 mo; mean Hb ↑ to 13.7 g/dL; no graft-failure The New England Journal of Medicine Hanna 2024 [2] AsCas12a editing of γ-globin promoters (HBG1/2) 18 100 % VOC-free; Hb normalized (≥12 g/dL) by mo 6; sustained HbF > 40 % editasmedicine.com BeamTx 2024 [3] Adenine base editing of HBG promoters (BEAM-101) 7 HbF > 60 %, HbS < 40 %; 0 VOCs; one busulfan-related death; rapid engraftment (neutrophils 17 d) BeamTX Investors Sharma 2023 [4] CRISPR-Cas9 editing of HBG1/2 (OTQ923) 3 Robust editing (74–87 %); Hb ↑ to 11 g/dL; single VOC per patient within 18 mo; no gene-editing-related SAE The New England Journal of Medicine Graphite 2022 [5] Cas9 homology-directed repair to correct HBB G6V (nula-cel) 1 (first-in-human) Successful dosing; trial paused in 2023 after marrow aplasia; program under strategic review Clinical Trials Arena Trends & Gaps Mechanistic convergence on fetal-hemoglobin re-induction. All late-stage programs except nula-cel boost γ-globin, reflecting its strong clinical correlation with milder phenotypes and the relative simplicity of enhancer disruption. Tool diversification. AsCas12a offers higher on-target specificity, and base editing avoids double-strand breaks, yet real-world comparative data are still limited. Conditioning toxicity persists. Busulfan remains the standard; the single death in BEAM-101 and cytopenias across trials underscore the need for non-genotoxic regimens. Manufacturing & access bottlenecks. Autologous ex vivo protocols demand apheresis, BSL-2/ GMP suites, and lengthy release testing, restricting scalability to high-income settings. Long-term genotoxicity unknown. Follow-up is ≤ 4 years; questions remain around off-target edits, clonal dominance, and germline transmission. Equity of enrollment. Despite disease burden in Africa and India, trials are concentrated in North America and Europe. Future Directions Next-generation SCD gene therapies will likely combine in-vivo delivery (lipid nanoparticles or viral–non-viral hybrids) with non-myeloablative or antibody-based conditioning, reducing toxicity and broadening access. Head-to-head studies comparing nuclease, base-editing, and prime-editing modalities could clarify optimal trade-offs between efficacy and safety. Multiplexed edits that simultaneously raise HbF and silence HbS, or insert anti-sickling β-globin transgenes, may further improve outcomes. Regulatory focus is shifting toward long-term follow-up frameworks and real-world evidence to monitor oncogenic risk. Finally, technology transfer partnerships and point-of-care manufacturing will be essential to translate these breakthroughs into population-level impact across endemic regions. References [1] H. Frangoul et al., “Exagamglogene Autotemcel for Severe Sickle-Cell Disease,” N. Engl. J. Med., vol. 390, no. 18, pp. 1649–1659, 2024, doi: 10.1056/NEJMoa2309676. The New England Journal of Medicine [2] R. Hanna et al., “Reni-cel, the first AsCas12a gene-edited cell therapy, led to hemoglobin normalization … interim RUBY trial results,” presented at the 29th EHA Congress, June 2024, abstract S2306. editasmedicine.com [3] Beam Therapeutics, “New Data from the BEACON Phase 1/2 Trial of BEAM-101 in Sickle-Cell Disease,” press release, Dec. 7 2024. BeamTX Investors [4] A. Sharma et al., “CRISPR-Cas9 Editing of the HBG1 and HBG2 Promoters to Treat Sickle-Cell Disease,” N. Engl. J. Med., vol. 389, pp. 820–832, 2023, doi: 10.1056/NEJMoa2215643. The New England Journal of Medicine The New England Journal of Medicine [5] ClinicalTrialsArena, “Graphite Bio Doses First Subject in Phase I/II CEDAR Trial of Nula-cel for SCD,” Dec. 9 2022. Clinical Trials Arena
 

Writing Style of OpenAI o3

o3’s style balances creative expression with rigorous factual grounding and structured clarity, particularly in non-fiction and research contexts.

SOTA Benchmark Performance

Sets new state-of-the-art on benchmarks including Codeforces, SWE-bench, and MMMU.

Advanced Visual Reasoning

Integrates chain-of-thought image analysis to answer complex visual queries.

Chain-of-Thought Explanations

Provides explicit reasoning chains for step-by-step problem solving.

OpenAI o3 Pricing

OpenAI o3 pricing is per million tokens for input, cached-input, and output tokens.

Input tokens

$10
per 1M tokens

Cached-input tokens

$2.50
per 1M tokens

Output tokens

$40
per 1M tokens

Input tokens: $10/M; Cached input: $2.50/M; Output tokens: $40/M.

Best Alternatives to OpenAI o3

If o3 doesn’t meet your requirements, consider these alternatives:

ChatGPT 4o

Best for: SEO-optimized articles and structured documents

Structured and SEO-optimized content with real-time web browsing.

Read Review

Gemini 2.5 Pro

Best for: Creative and long-form generation

Creative writing excellence with a 1 M-token context window.

Read Review

Claude 3.7 Sonnet

Best for: Business documents and sensitive data

Safety-focused summarization with 200 k context capacity.

Read Review

Frequently Asked Questions

Is o3 open source?

No. OpenAI o3 is proprietary and accessible via the OpenAI API.

What is the maximum context window?

Up to 131,072 tokens (128K).

How much does it cost to use?

Input: $10/M; Cached input: $2.50/M; Output: $40/M.

Does o3 support visual reasoning?

Yes, o3 integrates image analysis for complex visual queries.

Can I fine-tune o3?

No, fine-tuning is not currently available.

Final Verdict: Is OpenAI o3 Worth It?

High token costs are offset by o3’s superior accuracy and reasoning capabilities, making it a top pick for complex analytical content and multimodal tasks.