Has OpenAI Secretly Released GPT 4.5? (Writing Test)

The Nerdy Novelist
1 May 202413:40

TLDRIn this video, Jason, a novelist and AI writing expert, discusses the sudden appearance of a new chatbot on the LMS Y platform, which is speculated to be an updated version of the GPT models, potentially GPT 4.5 or even GPT 5. The chatbot, labeled as GPT2, has demonstrated significantly improved reasoning and math skills, leading to widespread speculation about its true identity. Jason shares his experience testing the chatbot's capabilities in writing-related tasks, noting its superior performance to GPT 4 in many aspects. He provides a step-by-step guide on how viewers can access and test the chatbot themselves, either through direct chat on LMS Y or by using the Arena Battle feature. Jason also shares a document containing prompts and responses from the chatbot, highlighting its ability to generate detailed and specific story outlines and prose, which he found to be of higher quality than other models. He concludes by encouraging viewers to share their experiences and thoughts on the chatbot's performance.

Takeaways

  • 🤖 A new chatbot, labeled as GPT-2, has mysteriously appeared on a platform called LMS Y, which is used for comparing language models.
  • 🧐 This GPT-2 chatbot has shown significantly improved reasoning and math skills, leading to speculation that it might actually be GPT 4.5 or even GPT 5.
  • 🔥 Sam Altman, known for his work with OpenAI, has tweeted positively about GPT-2, fueling the speculation about the new model's identity.
  • 📈 The original GPT-2 is considered an older, less capable model, and has been largely superseded by GPT 3.5.
  • 🚀 The new GPT-2 model demonstrated superior performance in writing-related activities, providing more depth and consistency in its responses.
  • 💻 To access the new GPT-2 model, users can visit chat.lmsy.org and select 'ppt2 chatbot' under the direct chat section.
  • 🕹️ An alternative way to access the model is through Arena Battle on the same website, which allows for blind testing of different models.
  • 📝 The chatbot was tested with various writing prompts, including brainstorming ideas for a Sci-Fi Beach Romance and creating an outline based on the 'Save the Cat' beats.
  • 📚 The responses from the chatbot were detailed and showed a better understanding of story structure and character development compared to other models.
  • ✍️ When tasked with writing the first 500 words of a scene, the chatbot provided a narrative with a strong sense of setting and character perspective, although it was somewhat verbose.
  • 🔍 The chatbot's performance in writing prose was found to be slightly better in terms of showing versus telling and understanding conflict depth, compared to GPT-4.
  • 🔄 The final assessment of the chatbot's capabilities will have to wait until it is fully released, allowing for more comprehensive testing and comparisons.

Q & A

  • What is the topic of discussion in the video?

    -The video discusses the possibility that a new chatbot, labeled as GPT2, might be an updated version of the GPT models, possibly GPT 4.5 or even GPT 5, and its capabilities in writing-related tasks.

  • What is LMS Y used for?

    -LMS Y is a platform primarily used to compare different language models against each other to see which is better at specific tasks in an objective manner.

  • Why is there speculation that the new GPT2 chatbot might be GPT 4.5?

    -The speculation arises because the new GPT2 chatbot has demonstrated significantly better reasoning and math skills compared to the original GPT2, leading people to believe it could be an updated version.

  • What did Sam Altman's tweet about GPT2 suggest to people?

    -Sam Altman's tweet, in which he mentioned having a soft spot for GPT2, fueled speculation that the new GPT2 chatbot might be an indication of something more, possibly an updated version of the model.

  • How can one access the new GPT2 chatbot for testing?

    -To access the new GPT2 chatbot, one can visit the website chat.lmsy.org, go to direct chat, and select the GPT2 chatbot from the list of models. Alternatively, one can use Arena Battle to blind test different models, including the new GPT2.

  • What was the first writing-related task the video presenter used to test the new GPT2 chatbot?

    -The first writing-related task was a brainstorming prompt to generate 10 ideas for a Sci-Fi Beach Romance.

  • How did the new GPT2 chatbot perform in the brainstorming task?

    -The new GPT2 chatbot performed well, providing ideas with inherent conflict and depth, which were more consistent and story-like compared to typical outputs from other models.

  • What was the presenter's chosen idea from the brainstorming session?

    -The presenter chose the idea titled 'Sand Castles of Time', where a couple discovers a beach where building sand castles can alter reality, transporting them to different historical epics and parallel universes.

  • How did the new GPT2 chatbot handle the outline prompt based on the 'Sand Castles of Time' idea?

    -The chatbot provided a detailed outline following Blake Snyder's 'Save the Cat' beats, with specific scenes and character interactions that were more concrete and less generic than other models.

  • What was the task given to the new GPT2 chatbot in the first scene writing prompt?

    -The task was to write the first 500 words of the first scene in a Sci-Fi Beach romance book, focusing on the protagonist Lily's point of view in the first person, with an emphasis on showing rather than telling.

  • What were the presenter's observations about the depth and quality of the prose generated by the new GPT2 chatbot?

    -The presenter found that the prose generated by the new GPT2 chatbot had a better grasp of the depth of conflict and story, with a more intuitive understanding of what makes a good scene, including a balance of showing versus telling.

  • What is the general consensus on the new GPT2 chatbot's performance in writing tasks compared to GPT 4?

    -The new GPT2 chatbot, suspected to be GPT 4.5, showed improvements in reasoning, math skills, and writing tasks, with more depth and specificity in its responses compared to GPT 4.

Outlines

00:00

🤖 Introduction to the New GPT Model

The video introduces a mysteriously appeared chatbot, speculated to be an updated version of the GPT models, possibly GPT 4.5 or GPT 5. The host, Jason, is a novelist who has been teaching writers to use AI and writing principles together. He discusses the platform LMS Y, used for comparing language models. The new 'gpt2' chatbot has shown better reasoning and math skills, leading to speculations that it could be GPT 4.5. The video also mentions Sam Altman's tweet, which has fueled further speculation. Jason shares his experience trying to access the model on the LMS Y website and suggests an alternative method through Arena Battle for a better chance of testing the model. He also shares his findings from testing the model on writing-related activities.

05:01

📚 Creative Writing Prompts and Outlines

Jason provides a detailed account of using the new chatbot for creative writing tasks. He starts with a brainstorming prompt for a Sci-Fi Beach Romance and finds that the chatbot's responses are more consistent and contain inherent conflict, making them story-like rather than generic. He selects one idea, 'Sand Castles of Time,' and asks the chatbot to expand it into a full outline using Blake Snyder's 'Save the Cat' beats. The outline provided by the chatbot is detailed and specific, showing a good understanding of story structure. Jason then asks the chatbot to write the first 500 words of the scene from the outline, focusing on the protagonist's point of view. The resulting text is rich in detail and emotional depth, although it contains some AIisms. Jason concludes that while the prose quality is not significantly better than GPT 4, the chatbot demonstrates a better grasp of story and conflict.

10:01

🌐 Testing and Future Prospects

The video concludes with Jason's reflections on testing the new chatbot and its potential. He notes that the chatbot's responses often had more weight and a better sense of conflict compared to other models. He acknowledges that the chatbot's performance could improve with better prompt adjustments. However, he emphasizes the need to wait for the full release of the model to conduct thorough tests. Jason invites viewers to share their thoughts and experiences with the new chatbot in the comments section and expresses hope that the video was informative.

Mindmap

Keywords

GPT 4.5

GPT 4.5 refers to a hypothetical update or intermediate version of OpenAI's language models, positioned between GPT-4 and a potential GPT-5. In the video script, the speaker investigates rumors about a new model that exhibits superior capabilities in reasoning and mathematics, which users speculate could be GPT 4.5. This reflects the continuous evolution and improvement in AI language models that OpenAI might be testing.

LMS Y

LMS Y is presented as a platform used to compare different language models objectively. It allows users to assess which model performs better in various tasks. The speaker uses LMS Y to test a new chatbot which is speculated to be an advanced version of the GPT models, highlighting the platform's role in facilitating direct interaction with and evaluation of different AI models.

Chatbot

A chatbot is a software application designed to simulate conversation with human users, especially over the Internet. In the script, the speaker tests a chatbot labeled as GPT2 on the LMS Y platform, discovering its unexpectedly high performance, which leads to speculation that it might be a more advanced model disguised under an old label.

Arena Battle

Arena Battle is described as a feature within the LMS Y platform where users can conduct blind tests between two different models by entering a prompt and comparing the responses. This method helps the speaker and users evaluate the models without bias, enhancing the credibility of the assessment by focusing purely on performance rather than the model's identity.

Direct Chat

Direct Chat is a feature mentioned in the video script as part of the LMS Y platform. It allows users to interact directly with various language models by selecting them and entering chat sessions. The speaker's experience with this feature demonstrates the high demand and interest in testing new AI capabilities, as he encounters issues due to server overload.

Save the Cat

Save the Cat is a screenwriting method that structures a narrative in specific plot beats. The speaker uses this method to request the chatbot, speculated to be GPT 4.5, to expand a story idea. The chatbot's response, which aligns well with the structured approach of Save the Cat, underscores its advanced narrative understanding and generation capabilities.

Show vs Tell

Show vs Tell is a writing technique discussed in the video. The speaker evaluates the chatbot's ability to 'show' rather than 'tell' in its prose, emphasizing descriptive and immersive writing. This evaluation is part of testing the chatbot's sophistication in handling nuanced writing styles, which is crucial for engaging storytelling.

Conflict

In storytelling, conflict is a fundamental element that drives the narrative. The speaker notes that the new GPT model exhibits an intuitive grasp of creating inherent conflict in its responses, which makes the generated stories more compelling and story-like compared to other models.

Prose

Prose refers to written or spoken language in its ordinary form, without structured verse. The speaker analyzes the prose generated by the chatbot, particularly in crafting the first scene of a sci-fi beach romance story. He assesses its depth, use of metaphor, and emotional impact, noting improvements over previous models in creating evocative and meaningful text.

Sand Castles of Time

Sand Castles of Time is a story idea generated by the chatbot during a brainstorming session in the video. It involves a couple who discovers that building sand castles can alter reality. This concept exemplifies the chatbot's creative capabilities, as it not only suggests an imaginative plot but also integrates thematic depth and complexity, showcasing the advanced potential of the speculated GPT 4.5 model.

Highlights

A new chatbot has appeared, possibly an updated version of the GPT models, speculated to be GPT 4.5 or GPT 5.

The platform LMS Y is used for comparing language models to determine which is better at specific tasks.

The new GPT2 chatbot has demonstrated improved reasoning and math skills, outperforming GPT 4 in several benchmarks.

Sam Altman's tweet about having a soft spot for GPT2 has fueled speculation about the new model's identity.

The original GPT2 is considered an older, less capable model not in widespread use.

The new GPT2 chatbot is superior to GPT 4 in many ways, including writing-related activities.

The website chat.LMS Y.org allows users to test different models, including the new GPT2 chatbot.

Due to high demand, the site was overwhelmed, making it difficult to get responses from the new model.

Arena Battle is an alternative method to access and compare the new GPT2 model against others.

The Llama 370b parameter model performed well, often chosen over GPT 4 in various tests.

The new GPT2 provided detailed and imaginative story prompts for a Sci-Fi Beach Romance.

The model's responses were more consistent and contained inherent conflict, indicating a better grasp of storytelling.

An outline generated from the GPT2 model using Blake Snyder's 'Save the Cat' beats was detailed and specific.

The GPT2 model's prose writing delved deeper into the protagonist's point of view, showing more than telling.

The model demonstrated an understanding of the depth of conflict and story, providing weight to the narrative.

Despite some issues, the GPT2 model showed potential for improvement with refined prompts.

The full capabilities of the model will not be known until it is fully released for testing.

The video invites viewers to share their experiences and findings with the new GPT2 model.