Breaking News: GPT-4.5, the New Kid on the Block

How it Differs from Other Language Models

Unlike reasoning models such as o1 and o3, which work through answers step by step, most large language models like GPT-4.5 spit out the first response they come up with. But GPT-4.5 is more general-purpose. Tested on SimpleQA, a kind of general-knowledge quiz developed by OpenAI last year that includes questions on topics from science and technology to TV shows and video games, GPT-4.5 scores 62.5% compared with 38.6% for GPT-4o and 15% for o3-mini.

A Step Ahead in Accuracy

What’s more, OpenAI claims that GPT-4.5 responds with far fewer made-up answers (known as hallucinations). On the same test, GPT-4.5 made up answers 37.1% of the time, compared with 59.8% for GPT-4o and 80.3% for o3-mini.

A Mixed Bag of Results

But SimpleQA is just one benchmark. On other tests, including MMLU, a more common benchmark for comparing large language models, GPT-4.5 beat OpenAI’s previous models by a smaller margin. And on standard science and math benchmarks, GPT-4.5 scores worse than o3-mini.

Conversational Skills: The New Frontier

GPT-4.5’s special charm seems to be its conversational skills. Human testers employed by OpenAI say they preferred GPT-4.5 to GPT-4o for everyday queries, professional queries, and creative tasks, including coming up with poems.

For example, tell it that you’re going through a rough patch and GPT-4.5 might offer a few words of sympathy before saying: “Want to talk about what happened, or do you just need a distraction? I’m here either way.” GPT-4o is less good at reading social cues and might try to fix the problem whether you asked it to or not, hitting you with a bullet point list of ways to cheer yourself up.

The Verdict

Waseem Alshikh, cofounder and CTO of Writer, a startup that develops large language models for enterprise customers, says, “The focus on emotional intelligence and creativity is cool for niche use cases like writing coaches and brainstorming buddies. But GPT-4.5 feels like a shiny new coat of paint on the same old car. Throwing more compute and data at a model can make it sound smoother, but it’s not a game-changer.”

He concludes, “The juice isn’t worth the squeeze when you consider the energy costs and the fact that most users won’t notice the difference in daily use. I’d rather see them pivot to efficiency or niche problem-solving than keep supersizing the same recipe.”

Conclusion

GPT-4.5, the latest language model from OpenAI, is a significant step forward in terms of accuracy and conversational skills. However, its limitations have been pointed out by experts, who argue that it’s not a game-changer. While it may excel in certain areas, it’s up to the users to decide whether the benefits are worth the energy costs and potential limitations.

FAQs

* What is GPT-4.5?
+ GPT-4.5 is a large language model developed by OpenAI.
* How does it differ from other language models?
+ GPT-4.5 is more general-purpose and responds with fewer made-up answers (hallucinations).
* What are its strengths and weaknesses?
+ Strengths: conversational skills, accuracy in general-knowledge quizzes. Weaknesses: limited performance in standard science and math benchmarks.
* What do experts think about GPT-4.5?
+ Some experts see it as a step forward, while others view it as a refinement of existing technology rather than a game-changer.