My GPT-5 First Reaction - It's smarter but there's a few things missing...

David Shapiro
7 Aug 202513:31

Summary

TLDRThe live stream review of GPT-5 reveals both excitement and disappointment. While the model shows significant improvements in reasoning, context handling, and task autonomy, the expected focus on multimodal capabilities (video and image) and agentic behavior was largely absent. The live stream, aimed at a general audience, focused heavily on coding capabilities. Despite these shortcomings, GPT-5โ€™s advancements in handling complex tasks and its reduction in hallucinations were highlighted, offering a glimpse of its potential impact on industries like software development and project management.

Takeaways

  • ๐Ÿ˜€ GPT-5 is available in the API, but not yet on the main interface.
  • ๐Ÿ˜€ The live stream was heavily focused on coding, with little attention given to multimodal capabilities or agentic behavior.
  • ๐Ÿ˜€ Despite high expectations, GPT-5 didn't prioritize multimodality (voice, video, images) or autonomous agentic behavior as anticipated.
  • ๐Ÿ˜€ GPT-5 shows a significant leap in reasoning, context size, and reduced hallucination, making it more reliable for autonomous tasks.
  • ๐Ÿ˜€ The model's success rate for completing autonomous tasks has increased significantly, with GPT-5 achieving up to 95% success on longer tasks.
  • ๐Ÿ˜€ The hallucination rate in GPT-5 is less than 1%, a significant improvement from earlier models like GPT-3, which had rates of 5-7%.
  • ๐Ÿ˜€ GPT-5 features a context window of 400,000 tokens, improving its ability to handle large datasets and perform better with complex tasks.
  • ๐Ÿ˜€ The speaker found the live stream to be overly general and not exciting, describing it as more of a PR event for a broader audience.
  • ๐Ÿ˜€ GPT-5's coding capabilities are improved, but not groundbreaking compared to previous models, despite its ability to handle more complex problems.
  • ๐Ÿ˜€ The PLE (Post Labor Economics) bench showed GPT-5's ability to provide insightful, high-level strategic thinking, suggesting its potential in abstract and conceptual problem-solving.

Q & A

  • What was the most significant disappointment mentioned in the live stream of GPT-5's release?

    -The most significant disappointment was the lack of focus on multimodality, as it wasn't even a major topic in the presentation. The live stream focused heavily on coding and did not discuss advancements in voice, video, or image capabilities.

  • Was GPT-5 expected to be an agentic or multimodal model, and how does it compare to those expectations?

    -GPT-5 was predicted to be either a multimodal or agentic model, but it did not live up to those expectations. While it showed improvements in reasoning, context size, and hallucination reduction, it was not framed as an agentic or multimodal-first model.

  • What is the major improvement of GPT-5 compared to previous models, such as GPT-3?

    -The major improvement of GPT-5 is a significant reduction in hallucinations, down to less than 1% in some measurements, which was previously around 5 to 7% in GPT-3. This improvement, combined with better instruction following and a larger context window of 400,000 tokens, enhances its usability.

  • How did GPT-5 perform in autonomous task completion, and what are the implications?

    -GPT-5 showed a significant improvement in completing autonomous tasks. The model achieved a 95% success rate for tasks ranging from 9 minutes to 1.06 hours, which demonstrates exponential progress. This suggests that GPT-5 can handle increasingly longer tasks autonomously, potentially completing tasks lasting over a day within a year.

  • What was the impact of GPT-5 on the productivity of software development?

    -GPT-5 is expected to dramatically increase the amount of code written by AI, which could lead to AI contributing to two-thirds of code written globally. This is a significant improvement, as AI already contributes about one-third of code, and this trend is set to grow exponentially.

  • What is the significance of the 400,000-token context window in GPT-5?

    -The 400,000-token context window is a substantial leap from previous models, enabling GPT-5 to process larger bodies of information. This increase in token window size, paired with better attention mechanisms, allows the model to tackle more complex tasks and produce more accurate responses.

  • Why did the live stream seem 'watered down' and unexciting, according to the transcript?

    -The live stream was criticized for being overly focused on coding, with much of the content being too basic and not offering much new excitement. The presentation felt like a PR exercise aimed at general audiences, lacking a deeper exploration of innovative features or groundbreaking advancements.

  • How did GPT-5's performance in reasoning and context size compare to other models?

    -GPT-5 showed a quantum leap in reasoning and context size. It outperformed earlier models like GPT-3 in terms of handling more complex problems and providing more accurate results with a larger context window, improving performance on tasks that require understanding over long sequences of information.

  • What were some of the key insights provided by GPT-5 when tested on the PLE (Post-Labor Economics) framework?

    -When tested with the PLE framework, GPT-5 offered valuable feedback on what was missing in the current model, such as a single normative anchor, institutional designs, and a more detailed geopolitical framework. It provided strategic and abstract suggestions, highlighting areas like governance, innovation, and the balance between abundance and equity.

  • What is the significance of the reduction in sycophancy in GPT-5?

    -The reduction in sycophancy is significant because it means GPT-5 is less likely to provide overly flattering or excessively agreeable responses. This leads to more reliable, fact-based answers, which is important for building trust and making the model's output more useful in real-world applications.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
GPT-5AI reviewmultimodal AIagentic behaviorpost-labor economicsAI improvementscoding advancementsautonomous taskshallucination reductionreasoning leapAI benchmarks