New ChatGPT o1 VS GPT-4o VS Claude 3.5 Sonnet - The Ultimate Test

Skill Leap AI

19 Sept 202416:05

Summary

TLDRIn this video, the presenter compares the new OpenAI Chat GPT-01 model with the GPT-40 model across 10 different prompts. They also test a custom GPT built with Chain of Thought prompting and a Claude project by Anthropic. The test aims to see if GPT-01 can outperform not only GPT-40 but also these other AI models. The video includes tests on letter counting, logical reasoning, and coding challenges. The GPT-01 model shows promising results, particularly in coding and logical reasoning, suggesting it may be superior to GPT-40 and the other models tested.

Takeaways

🤖 The video compares the new Chat GP01 model from OpenAI with the older GPT-40 model.
🔍 The test includes 10 different prompts to evaluate the models' performance.
💡 The creator also built a custom GPT model using Chain of Thought prompting to replicate the 01 model's capabilities.
🌐 The test incorporates prompts from OpenAI and Matthew Burman's video for a comprehensive comparison.
🍓 The first prompt asks about the number of 'R's in 'strawberry', which all models answered correctly.
🐣 The 'chicken or the egg' question was used to test the models' ability to provide scientific explanations.
📊 A math question about comparing numbers (9.11 vs. 9.9) was used to assess the models' numerical reasoning.
🎱 A logic puzzle about a marble and a glass杯 was used to test the models' spatial reasoning.
📝 A word count test was used to evaluate the models' ability to perform simple counting tasks.
🕵️‍♂️ A 'hallucination test' was conducted to see if the models would make up information about non-existent mango cultivars.
💻 A coding test to create a game of chess in Python was used to assess the models' programming capabilities.
🏆 The Chat GP01 model outperformed GPT-40, the custom GPT, and Claude in the overall test.

Q & A

What is the main focus of the video?
-The main focus of the video is to compare the performance of the new OpenAI chat GPT-01 model with the GPT-40 model and other AI models on various prompts.
How many different prompts were used in the test?
-The video mentions that 10 different prompts were used in the test.
What is the purpose of testing against a custom GPT model built by the video creator?
-The purpose of testing against a custom GPT model is to see if it can replicate the Chain of Thought prompting that the GPT-01 model is believed to use, and to compare its performance.
Which AI model is also tested in the video besides the custom GPT and GPT-40?
-In addition to the custom GPT and GPT-40, the video also tests against a Claude project powered by Claude 3.5 Sonnet.
What is the first test question mentioned in the video?
-The first test question is 'How many Rs are in a strawberry?'
What is the significance of the chicken or the egg question in the video?
-The chicken or the egg question is used to test the AI models' ability to provide scientifically accurate answers and their reasoning capabilities.
How does the video creator improve the test to make it more scientific?
-The video creator improves the test by using prompts from OpenAI and Matthew Burman's video, which are designed to effectively compare the models.
What is the outcome of the marble in the glass cup test?
-The GPT-01 model correctly identifies that the marble is left on the table when the glass is moved to the microwave, while GPT-40 and the custom GPT models incorrectly place the marble inside the microwave.
Which model performs the best in the coding test of creating a game of chess in Python?
-The GPT-01 model performs the best in the coding test, providing a functional chess game that is closer to a complete game than the other models.
What is the final verdict of the video regarding the performance of the AI models?
-The final verdict is that the GPT-01 model outperforms the GPT-40, the custom GPT, and the Claude project in the tests conducted.
What additional information does the video provide about updates to the AI course and community platform?
-The video mentions that updates are being made to the AI course and community platform to include information related to the new GPT model, with over 20 courses and an active community for questions.