GPT-5 is here... Can it win back programmers?

Fireship

8 Aug 202504:19

Summary

TLDRIn a critical review of OpenAI's GPT-5, the video uncovers the hype and potential limitations of the new AI model. While GPT-5 boasts advanced features like unified models for diverse tasks and cost-effective pricing, it faces criticism for inaccuracies, such as flawed benchmark graphs and hallucinations in coding tasks. Despite outperforming some previous models, GPT-5 fails to surpass benchmarks like Grock's ARC AGI and is questioned for its actual impact on jobs. The video concludes that while GPT-5 is impressive, its true potential is realized only when combined with existing tech tools.

Takeaways

😀 GPT-5 marks the end of the human intelligence monopoly, with OpenAI's release claiming it outperforms humans on certain benchmarks, but is it truly revolutionary?
🤔 OpenAI's GPT-5 was touted as a major advancement, but it actually ranks fifth place on the ARC AGI benchmark and failed to beat Grock.
💸 Despite the hype, GPT-5 underperformed in betting markets, with OpenAI no longer the favorite to have the best model in 2025.
📊 GPT-5's benchmark charts had major issues, including a misleading y-axis, raising questions about OpenAI's accuracy and transparency.
🧠 GPT-5 focuses on unifying different models for fast reasoning and routing, allowing it to choose the best tool for each task without user input.
💡 GPT-5 is seen as a consolidation and cost-reduction effort, rather than an exponentially smarter or larger model compared to its predecessors.
💰 GPT-5's pricing is highly competitive, at $10 per million output tokens, making it more affordable than Claude Opus 4.1, which costs $75 per million tokens.
📉 Despite claims of improved performance, GPT-5 displayed errors like using runes incorrectly when generating code, leading to functional but flawed outputs.
👨‍💻 GPT-5 successfully generated code for a spelt 5 app, but it still suffered from hallucinations when using runes inappropriately.
🎮 GPT-5 was tested on creating a flight simulator game with 3JS, but the result was subpar, despite praise from some industry professionals for its potential.
🛠 GPT-5 is seen as a valuable tool when integrated with existing technologies like DreamFlow, which enables easier full-stack AI development with cross-platform app support.

Q & A

What makes GPT-5 different from previous versions of GPT?
-GPT-5 is not just a larger or smarter model; it consolidates multiple models such as fast reasoning and routing. This allows GPT-5 to choose the best tool for a given task without requiring the user to think about it, making it more efficient and versatile.
Why was GPT-5's benchmark score controversial?
-The benchmark score was controversial because it was rumored that GPT-5 actually ranked fifth in some critical benchmarks, such as the ARC AGI, instead of taking the top spot as initially suggested. Furthermore, some of OpenAI's charts were criticized for having misleading y-axis scales.
What is the 'Simple Bench' benchmark and how did GPT-5 perform on it?
-The 'Simple Bench' is a performance benchmark that evaluates AI models' abilities in various tasks. GPT-5 was originally claimed to have surpassed all human benchmarks, but later it was revealed that GPT-5 didn't actually top the 'Simple Bench' leaderboard.
What is the significance of GPT-5's pricing model?
-GPT-5 is priced at $10 per million output tokens, which is considered highly affordable compared to competitors like Claude Opus 4.1, which costs $75 per million output tokens. This pricing model makes GPT-5 accessible for a wider range of users, including those in development and research.
How did OpenAI's graphics and charts impact the perception of GPT-5?
-OpenAI's graphics and charts were criticized for presenting data with a misleading y-axis, which raised doubts about the transparency and accuracy of the company's claims. This caused skepticism about GPT-5's true capabilities.
What was the main issue when GPT-5 generated code for a spelt 5 app with runes?
-GPT-5 generated code that looked promising, but when run, it produced a 500 error due to using a rune in the template where it wasn’t allowed. Although GPT-5 used the correct syntax, it 'hallucinated' its own rules for rune usage, which led to the error.
How did GPT-5 handle error correction in the generated code?
-After identifying the error in the generated code, GPT-5 was able to fix the issue by itself. This demonstrated its ability to self-correct, which is a key feature of its design to minimize hallucinations.
What are some of the practical applications of GPT-5's capabilities?
-GPT-5's ability to code efficiently and fix its own errors makes it useful for software development, especially in generating functional apps with clean UI and handling complex tasks like game development, as shown in the example of a flight simulator game.
What was the result when GPT-5 generated a flight simulator game with 3JS?
-The flight simulator game that GPT-5 generated using 3JS turned out poorly. This highlights that while GPT-5 excels at certain tasks, its performance can still be inconsistent, especially in creative or highly complex tasks.
What does the commentary suggest about the future impact of AI like GPT-5 on jobs?
-The commentary suggests that while GPT-5 is a powerful tool, it's not likely to replace human jobs or lead to an AI-driven apocalypse. The real value of GPT-5 comes from combining it with existing technologies, which enhances productivity without fully automating human roles.