LLM Writing Leaderboard & Comparison

Here we rank today’s LLMs on real writing capability through the Chatbot Arena ranking & specs.

Model	Creative Rank	Δ Rank	Context (tokens)	Input $/M	Output $/M	Cache $/M

How We Ranked the LLM Models

To spot true writing talent, we combined objective signals with hands‑on tasks.

Chatbot Arena – Creative board
We start with LMSYS crowd‑votes for creative writing. This leaderboard captures real user preferences.
Delta vs. overall rank
A big positive gap shows a specialised writing strength worth noting.
Seven professional writing tasks
From SEO blog to scientific note, we test structure, style, and accuracy in the wild.

#1 Gemini 2.5 Pro

Creative Rank: #1 (Δ 0)
Context: up to 1 M tokens
Input: $1.25 / M (<200k)
Output: $10 / M

Blends climate data with glowing saplings in one breath—equally strong in fiction and research notes.

#2 ChatGPT‑4o

Creative Rank: #2 (Δ 0)
Context: 128k tokens
Input: $5 / M
Output: $20 / M

SEO king—hit 9/9 keywords yet still reads like a human. Fiction paragraphs flawless.

#3 Grok 3

Creative Rank: #2 cluster (Δ +1)
Context: ≈ 131k tokens
Input: $3 / M
Output: $15 / M

Raw, edgy voice—great for dystopian fiction and gritty copy. Rhyme still safe.

Why One‑Shot Prompts Aren’t Enough for Quality Writing

LLMs models are designed to spit out the most likely words and phrases. They say only what people want to hear.

These models draw from a vast, generalized dataset, which cannot align with a very distinct style or subject matter.

You might also notice that LLMS’ text feels repetitive and robotics. The AI often falls into patterns, using similar phrases and structures across different pieces of content. This makes your content sound monotonous and predictable.

As a result, your content’s engagement will suffer if you get out of the loop of the writing process. Readers today expect content that delivers unique expertise & style. They want to hear from strong experts and characters and learn about personal stories and experiences.

How to Prompt LLMs for Quality Writing

Give it editorial guidelines – share your publication’s voice, must‑hit arguments, and preferred tone.
Provide a writing sample – paste 2‑3 standout paragraphs so the model can mimic cadence and phrasing.
Make it build an outline first – ask for structure before prose to lock in logic and flow.
Infuse it with your ideas – inject anecdotes, data points, and opinions only you can bring.
Remove repetitions and add variety – run a final prompt to shorten cliché phrases, swap sentence lengths, and diversify openings.

Get the free e‑book

Frequently Asked Questions

Why care about creative rank?

Creative rank on Chatbot Arena is crowdsourced—real humans judging story quality, imagery, and flow. It’s a quick proxy for narrative skill.

Does a bigger context window always help?

Up to a point. Longer windows reduce chop when you paste large docs, but can increase latency and cost.

What’s a cache price?

Some providers discount repeat calls within minutes. If you loop over similar prompts, you pay the cache rate.