GPT-5 Fails. AGI Cancelled. It's all over...

Wes Roth

8 Aug 202516:24

Summary

TLDRGPT-5's release has sparked mixed reactions, with some praising its coding and game development abilities, while others express disappointment due to issues like hallucinations and poor math performance. The model's routing system, intended to direct tasks to the appropriate sub-model, has faced criticism for misdirecting requests, but OpenAI is working on improvements. Despite its shortcomings, GPT-5 has proven impressive in creating custom software solutions and handling complex coding tasks, with future optimizations expected to enhance its performance. While not the AGI breakthrough some hoped for, it’s an incremental but powerful advancement in AI capabilities.

Takeaways

😀 GPT-5's launch has sparked mixed reactions, with some experts like Gary Marcus expressing disappointment, arguing that it's far from AGI and falls short of expectations.
😀 Despite high expectations, GPT-5 has faced criticism for its inconsistent performance, including issues with model routing and the inability to meet basic tasks at times.
😀 A common frustration is the model’s failures in simple reasoning tasks and math problems, such as incorrect calculations like '69 equals 30'.
😀 GPT-5 uses an auto-routing system to determine which model best handles a user's query, but this system was initially broken, causing subpar performance in some cases.
😀 Many users have had varying experiences with GPT-5, with some highlighting its impressive coding abilities, especially for tasks like developing games or creating small software tools.
😀 GPT-5’s improved ability to assist with coding tasks—such as creating a 3D city-building game—has been praised, with some calling it a huge step forward in software development tools.
😀 There is ongoing debate between users about whether GPT-5's impressive abilities are tied to using the best models (e.g., GPT-5 Pro), which offer superior reasoning compared to lower-cost options.
😀 While GPT-5 excels at creating small, bespoke applications (such as 3D simulations or expense analysis tools), it struggles with more basic cognitive tasks like simple arithmetic or logical reasoning.
😀 The model’s performance in specific tasks is highly dependent on the model used, with GPT-5's Pro version outperforming cheaper variants, particularly for high-complexity tasks.
😀 OpenAI has acknowledged some issues with GPT-5's routing system and has promised improvements to enhance model selection and transparency, aiming for a more streamlined experience going forward.

Q & A

What is the main critique of GPT-5 according to Gary Marcus?
-Gary Marcus criticizes GPT-5 for being a disappointment, calling it more of a marketing hype than a significant step towards AGI. He mentions that it does not live up to expectations and that OpenAI might be falling behind in the AI race.
How did GPT-5 perform with math-related tasks?
-GPT-5's performance in math tasks was criticized for being unreliable. It was expected to perform better than human baseline on Simple Bench, but it ranked in fifth place. Additionally, there were strange outputs in basic math, such as '69 equals 30' and '69 is less than 52.'
What is the issue with GPT-5's model routing?
-The issue with GPT-5's model routing is that it often directs users to cheaper, less capable models instead of the more advanced ones. This was attributed to cost-saving measures, and it led to disappointing results for many users. OpenAI acknowledged the problem and said it would be fixed soon.
What are the advantages of GPT-5 when using the max reasoning model?
-When using the maximum reasoning model in GPT-5, the results were reported as impressive. Tasks that required complex reasoning, such as code development and more intricate inquiries, saw better performance when the model was set to its highest capacity.
How does GPT-5's ability to create software compare to its performance in reasoning tasks?
-GPT-5 excels at creating bespoke software solutions, such as building tools or automating tasks. For tasks like generating code for creating charts or 3D buildings, it performs very well. However, its performance in reasoning tasks, particularly in verbal or word-based inquiries, is not as strong.
Why are some users disappointed with GPT-5's performance in everyday tasks?
-Users were disappointed with GPT-5's performance in everyday tasks because it often failed at basic requests or gave incorrect responses, especially when not directed to the best model. Some users feel that the AI does not live up to the advanced promises made during its release.
What was the issue with GPT-5's auto-switching feature?
-GPT-5's auto-switching feature, which determines which model to use for a specific task, was broken for a period after release. This led to subpar performance and caused confusion, as the wrong model would often be used for complex tasks.
What is the 'vibe coding' concept mentioned in the transcript?
-'Vibe coding' refers to the quick-paced and iterative process of coding with GPT-5. It allows developers to test and update their projects rapidly, receiving real-time feedback as they work, leading to a fluid and efficient development experience.
How does GPT-5 compare to Claude 4.1 and Gemini in coding tasks?
-GPT-5 has been reported to outperform Claude 4.1 and Gemini in coding tasks, particularly when used with the Pro version or maximum reasoning effort. It excels at developing tools, automating tasks, and generating complex code quickly, which is essential for developers looking to streamline their work.
What are some potential applications of GPT-5's capabilities in coding and software development?
-GPT-5's ability to create custom software solutions for tasks like 3D modeling, game development, and business analytics is highly valuable. It can build applications for creating 3D cities, generating Gantt charts, and automating calculations, making it a useful tool for developers, analysts, and businesses.