OpenAI Has Entered Its Most Dangerous Moment

OpenAI and Anthropic are now competing over frontier model intelligence

I used to think OpenAI did not have much to fear from other model companies.

The reason was simple: OpenAI still owned the most important asset in this market, model intelligence.

As long as the highest intelligence stayed in OpenAI's hands, many of its product decisions could still be read as commercial strategy. It could make public-facing models faster, cheaper, shorter, or even leave heavy users wondering whether the models had become a little worse. It could move capability across price tiers, route different tasks to different models, and bring out the stronger systems when it really needed to.

But if a competitor truly catches up at the top end of intelligence, the situation changes.

After using Fable 5, I felt that shift for the first time.

OpenAI may have entered the most dangerous moment in its history.

OpenAI's Old High Intelligence Was Almost Unusable, and Intelligence Was Nearly All It Had

My impression of o1 and o3 has always been strong.

They were genuinely smart, especially for code. They had a rare ability: they could read code without running it, spot bugs from the structure alone, and often be right.

Most models need execution, logs, tests, and repeated trial and error. o1 and o3 felt more like very strong engineers. They could trace the call chain, follow state changes, and infer where the fault was likely to be. That ability is precious. It is not just "being able to write code." It is judgment.

But the weakness was just as obvious.

They were absurdly expensive, painfully slow, and not especially good at lower-level implementation work, such as making a series of precise code edits according to instructions.

Anthropic's models felt like the opposite at the time.

They did not always have the extreme intelligence of o1 and o3. On tasks that required deep reasoning, or finding a bug purely by reading code, they were not as astonishing as OpenAI's best models. But when they understood the task, they worked quickly, steadily, and productively.

That matters enormously in real coding workflows.

Developers are not doing math olympiad problems every day. Most software tasks do not require the absolute highest intelligence. What you usually need is a model that can read context reliably, edit code safely, avoid basic mistakes, and keep working with you over a long session.

So even when OpenAI had the more intelligent model, many real workflows still moved toward Anthropic. The highest intelligence matters, but the highest intelligence is not the same thing as the highest productivity.

Speed, stability, price, editing ability, and context-following all matter. Together, they decide whether a model can actually get work done.

Why Fable 5 Feels Different

Fable 5 does not feel like a normal Claude upgrade.

It feels a little like the old o1 and o3, but not only that.

The old o1 and o3 were high-intelligence analysts. They could diagnose, judge, and explain. But they were not always natural at large-scale code modification, long editing sessions, or pushing a real project forward.

What surprised me about Fable 5 is that it keeps Claude's editing strength while making a qualitative jump in code analysis.

The way I described it to a friend was roughly this: Fable 5 feels like o1 and o3 in the old days, except o1 and o3 were better at analysis than editing, while Fable 5 can do both. It can understand the problem and then make the change. Its code intelligence may already be close to o1 and o3, and in some scenarios it may be better. At the same time, it inherits Claude's ability to handle long edits and execute steadily.

That combination is the key.

A model that can only analyze is closer to a consultant.

A model that can analyze, edit, and keep moving through a project starts to become a real engineering agent.

OpenAI's historical advantage was extreme intelligence. Anthropic's advantage was execution. If Fable 5 starts combining those two, OpenAI has a problem.

Because that does not attack a peripheral capability.

It attacks the root.

Codex Is Both a Blessing and a Trap

The more interesting part is that this dangerous moment has arrived right when OpenAI Codex has become one of OpenAI's biggest successes.

Codex was a very important step. It pulled the model out of the chat window and into a real engineering environment. It made the model read projects, edit files, run commands, fix tests, and deliver tasks. That is the move from "answering questions" to "doing work."

In theory, this should be a blessing for OpenAI.

But there is an old Chinese saying: fortune contains the seed of misfortune, and misfortune contains the seed of fortune.

The more successful Codex becomes, the more exposed OpenAI becomes to a harsher battlefield. Real developers will compare it directly with Claude, Fable, DeepSeek, and many other agent tools.

In the chat-model era, users compared whether an answer sounded smart.

In the coding-agent era, users compare whether the agent actually fixes the code, whether it can keep moving, whether it wastes money, and whether it creates rework.

That is where Fable 5 makes Codex's success feel awkward.

If Fable 5 is close to, or locally better than, o1 and o3 in code intelligence, while also being much better than the old o-series at editing code, then Codex has brought OpenAI into the exact arena where its model advantage is now being challenged.

The blessing is that Codex finally moved OpenAI into real productivity.

The trap is that the model layer now looks vulnerable. The current 5.5 intelligence feels clearly below Fable, while the old o-series intelligence advantage is being approached from the other side.

Frontier Models Are Still a Scale Game

Scale still matters at the frontier.

The first form of scale is parameter count.

OpenAI and Anthropic have not publicly disclosed the parameter counts of flagship models such as GPT-5.5 and Fable 5, so these numbers should not be treated as official facts. But external estimates commonly place this class of closed frontier model above the 6-trillion-parameter range. For comparison, DeepSeek V4 Pro is around 1.6 trillion parameters.

Those estimates may not be precise. Different methods can disagree by a lot. But the direction is clear: the flagship models from OpenAI and Anthropic are no longer tens-of-billions-parameter systems. They are competing in the trillion-scale era, with scale continuing to show up in parameters, post-training, and inference-time compute.

And scale is not just parameter count.

Training data, post-training, inference-time compute, context length, safety evaluation, and serving reliability all become cost.

Scaling laws already showed an empirical power-law relationship among model performance, model size, data scale, and training compute. Chinchilla later clarified that in compute-optimal training, model size and training tokens both need to grow. Today we have MoE, RL, inference-time compute, and many system optimizations. But these do not eliminate the logic of scale. They change its shape.

So when Fable 5 doubles its price, the commercial signal is consistent with the technical story: frontier intelligence keeps getting more expensive because it depends on more compute, more training, more post-training, and more service cost.

This is where OpenAI's danger becomes clear.

If the highest intelligence can only be produced by OpenAI, OpenAI has pricing power even when the model is expensive. But if Anthropic can also produce an o1/o3-like model that can find problems just by reading code, and if that model also keeps Claude's editing and execution strengths, then OpenAI's intelligence moat starts to loosen.

DeepSeek Is Pressuring the Other End of the Market

OpenAI's pressure is not only coming from the high end.

The low and middle ends of the market are being compressed by cost-performance models.

On June 11, 2026, I opened the OpenRouter leaderboard with agent-browser and also captured the frontend API call to /api/frontend/rankings/models. The "This Week LLM Leaderboard" showed the following:

Rank	Model	Weekly token usage
1	DeepSeek V4 Flash	4.34T tokens
2	Hy3 preview	3.79T tokens
3	MiniMax M3	3.38T tokens
4	MiMo-V2.5	2.89T tokens
5	DeepSeek V4 Pro	2.06T tokens
9	Claude Opus 4.8	1.32T tokens

On the same page, model-author market share looked like this:

Rank	Model author	Token usage	Share
1	deepseek	4.07T	17.3%
2	anthropic	3.83T	16.3%
7	openai	1.65T	7.0%

The measurement matters. The number-one model on OpenRouter was not "DeepSeek V4" in general. It was DeepSeek V4 Flash. DeepSeek V4 Pro ranked fifth. But by model author, DeepSeek was number one.

That is not a slogan.

That is usage.

Users can say they love the strongest model, but token bills make people honest.

On OpenRouter, DeepSeek V4 Flash was priced at $0.0983 per million input tokens and $0.1966 per million output tokens. DeepSeek V4 Pro was $0.435 input and $0.87 output. Fable 5 was $10 input and $50 output.

Comparison	Fable 5 input price multiple	Fable 5 output price multiple
Versus DeepSeek V4 Flash	About 102x	About 254x
Versus DeepSeek V4 Pro	About 23x	About 58x
Versus Claude Opus 4.8	2x	2x

That gap is too large to ignore.

If a team only consumes tens of thousands of tokens a day, the difference may not feel dramatic. But if it runs tens of millions of tokens a day, a 10x price gap becomes a budget issue. A 100x price gap becomes a business-model issue.

So DeepSeek V4 Flash ranking first does not prove that it is the smartest model in the world. It proves something else: users are starting to divide intelligence by price.

That is also pressure on OpenAI.

At the high end, Fable 5 is closing the intelligence gap. At the low and middle ends, DeepSeek is pushing price down. In the middle, enterprise customers are starting to calculate ROI.

OpenAI has to fight all three battles at once.

Enterprises Are Starting to Count Token Costs

Another clear shift in 2026 is that the AI industry is moving from tokenmaxxing to token budgeting.

In the previous phase, the default mindset was to use as much AI as possible. The more a company used AI, the more advanced it appeared. Internal leaderboards, AI coding enthusiasm, and long-running agents all pushed token consumption upward.

Then the bills arrived.

Business Insider has reported several representative examples.

Coinbase CEO Brian Armstrong said the company is routing suitable prompts to cheaper models so that, even as token usage grows exponentially, cost can stay roughly flat. He also predicted that over the next 12 to 18 months, 80% of workloads will run on models that are 99% cheaper, while the newest models will be reserved for "IQ maxing" scenarios.

Another report said that in the first half of 2026, OpenAI, Anthropic, and GitHub were moving more customers away from near-unlimited monthly plans and toward token-based billing. Companies such as Walmart, Amazon, Uber, Salesforce, and Coinbase were all starting to pay closer attention to budgets, limits, output, and ROI.

Consumer AI companies face an even more direct problem. Inworld's CEO said that for many consumer AI applications, inference costs can consume 70% to 90% of the operating budget. The more users love the product, the worse the margin can become.

AI commercialization is entering a new stage.

At first, people asked whether the model was smart.

Then they asked whether employees were actually using it.

Next, they will ask whether each dollar of token cost produces enough return.

Once the market reaches that point, launches and benchmarks are no longer enough.

For OpenAI, this rational phase is not easy. It has to preserve the highest intelligence, control cost, and support the massive usage of Codex, ChatGPT, and the API.

Above it, Anthropic is chasing intelligence with Fable 5.

Below it, DeepSeek is attacking usage with price.

In the middle, enterprise customers are counting ROI.

That is the real discomfort.

The Endgame Is Model Routing

The real dividing line for AI products will not be which company connects to the strongest model.

It will be which company knows how to route models.

Simple summarization, classification, rewriting, and extraction should use cheap models.

Ordinary coding, ordinary documentation, and ordinary research should use mid-tier models.

Complex codebase migrations, long-running autonomous agents, high-risk contract review, enterprise strategy analysis, and scientific reasoning should call high-end models such as Fable 5, Opus, or GPT-5.5 Pro.

High-end models should not stay on all the time, but they must have a place in the system. Cheap models should not be dismissed either, because the largest token volume often comes from repetitive, low-risk, ordinary tasks.

Mature AI products will schedule models the way cloud platforms schedule compute resources. Small tasks use small models. Big tasks use big models. Low-value tasks optimize for cost. High-value tasks preserve quality.

That is why OpenRouter, Vercel AI Gateway, model-routing layers, and token observability tools are becoming more important.

People once thought the core of AI applications was the prompt.

Then they discovered it was the agent.

Looking further ahead, the core may be the scheduling system.

Whoever can coordinate models across different prices, capabilities, latencies, and risk profiles will have a better chance of surviving at the AI application layer.

Final Thought

One year in AI feels like a century in the human world. It is easy to forget that only three years have passed since 2023.

OpenAI's danger is not that it has suddenly become weak. It is that the market is no longer one-dimensional.

The frontier is now being measured by intelligence, execution, cost, and routing at the same time.

For years, OpenAI could win by owning the smartest model.

Now it has to win the whole system.

That is why this moment is so dangerous.