GPT-5 Pro vs Claude 4.1 vs Grok 4 vs Gemini 2.5 Pro — Research & Reasoning Face-Off!

Bijan Bowen

22 Aug 202525:22

Summary

TLDRIn this video, the presenter tests various AI models—Gemini 2.5 Pro, Claude 4.1 Opus, ChatGPT5 Pro, and Grock Deep Search—to estimate auction prices for cars. The test involves three vehicles: a 1990 Buick Riotta convertible, a vintage Rolls-Royce, and a non-running Dodge Stealth. The models' estimations vary widely, with Gemini 2.5 Pro proving most accurate for the Buick and Stealth, while ChatGPT5 Pro excelled with the Rolls-Royce. The video highlights the effectiveness and limitations of these AI tools in real-world scenarios, particularly in estimating auction values based on detailed vehicle descriptions.

Takeaways

😀 The video tests different AI models (Gemini 2.5 Pro, Claude 4.1 Opus, ChatGPT5 Pro, and Grock Deep Search) for their ability to estimate auction prices for cars.
😀 The test included three cars: a 1990 Buick Riotta convertible, a vintage Rolls-Royce, and a non-running Dodge Stealth.
😀 Gemini 2.5 Pro performed the best in estimating auction prices, especially for the Buick Riotta and Dodge Stealth.
😀 ChatGPT5 Pro came closest to predicting the sale price of the Rolls-Royce, though it was less accurate with the other cars.
😀 Grock Deep Search consistently overestimated the values of the cars, with estimates much higher than the actual hammer prices.
😀 Claude 4.1 Opus provided mid-range estimates that were sometimes close but still didn't outperform Gemini or ChatGPT5 Pro.
😀 The Buick Riotta went for $2,300 at auction, and Gemini 2.5 Pro was within $175 of the actual price.
😀 The vintage Rolls-Royce was auctioned for $9,000, with ChatGPT5 Pro providing the closest estimate of $10,000.
😀 The Dodge Stealth, a non-running car, sold for $4,500, with Gemini 2.5 Pro estimating it would go for $4,653, very close to the final hammer price.
😀 The models were tested not just for accuracy but also for their ability to interpret detailed descriptions of each car's condition and estimate its value accordingly.
😀 Overall, Gemini 2.5 Pro won two out of three rounds, showing its superior performance in estimating prices for auctioned cars.

Q & A

What was the main purpose of the test in the video?
-The main purpose of the test was to evaluate the ability of different AI models (Gemini 2.5 Pro, Claude 4.1 Opus, Chat GPT5 Pro, and Grock Deep Search) to estimate car auction prices based on available descriptions, data, and car conditions, such as engine issues or high mileage.
Which AI model performed best overall in estimating car prices?
-Gemini 2.5 Pro performed the best overall, being within $150 of the actual hammer prices for both the Buick Riotta and the Dodge Stealth.
How did Chat GPT5 Pro perform in the test?
-Chat GPT5 Pro performed well but was not as consistent. It was closest in estimating the Rolls-Royce auction price, but it was far off with the Dodge Stealth, predicting a price in the low $3,000s instead of the actual $4,500.
What was the discrepancy between the AI's price estimations for the vintage Rolls-Royce?
-The price estimations for the vintage Rolls-Royce varied significantly, with Gemini 2.5 Pro estimating a price of just under $3,000, while Chat GPT5 Pro was closer at $10,000. The final auction price was $9,000, with Chat GPT5 Pro being the closest.
What was the estimated value of the non-running Dodge Stealth and how did AI models perform?
-The non-running Dodge Stealth was hammered at $4,500. Gemini 2.5 Pro was the closest with an estimate of $4,653, while the other models were significantly off, with estimates ranging from $7,500 to $8,000.
Why did Gemini 2.5 Pro’s estimate for the Rolls-Royce come in so low?
-Gemini 2.5 Pro gave a low estimate of just under $3,000 for the Rolls-Royce, possibly due to concerns about the car’s hidden issues, such as the new tires possibly masking deeper mechanical problems. This negative assessment contributed to its inaccurate low estimation.
How did the auction prices compare to the AI models' estimates for the Buick Riotta?
-The Buick Riotta auctioned for $2,300. Gemini 2.5 Pro gave an estimate within $175 of the final auction price, which was the most accurate. Other models, including Claude 4.1 Opus and Chat GPT5 Pro, were much higher in their estimations.
What was the general trend in how each AI model performed?
-Gemini 2.5 Pro tended to give the most accurate estimates, often within a small margin. Chat GPT5 Pro was more varied, sometimes being close and other times far off, particularly for the Dodge Stealth. Grock Deep Search generally overestimated car values, and Claude 4.1 Opus was more balanced, but still not as precise as Gemini.
What does this experiment suggest about the use of AI for estimating auction prices?
-This experiment suggests that AI models can be useful for estimating auction prices, but the accuracy largely depends on the model’s ability to assess conditions like car issues or repairs. Some models, like Gemini 2.5 Pro, excelled in using the provided descriptions to draw conclusions, while others were more speculative.
What is the significance of the buyer premium in these AI price estimates?
-The buyer premium, a 10% addition to the auction price, was factored into the models' estimates for some cars. However, in the results, the buyer premium was removed from the final estimates to maintain accuracy when comparing with the actual sale prices. This ensures a more accurate comparison of the raw auction estimates.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Browse More Related Video

Google Gemini vs Gemini Advanced: Which is Better? (2025)

Gemini Flash Thinking 2.0, o3-mini-high, and DeepSeek-R1 Solve My Humanity's Last Exam Entry Problem

Ranking: Which LLMs are the BEST FOR 2025? (Ranking Every LLM Released in 2024!)

Top 5 des Meilleures IA en 2025 🚀 | Comparatif & Test en Direct

Grok 3 vs Meta AI: Which is Smarter? Intelligence Test, Image Creation, and Data Analysis

Google I/O '24 in under 10 minutes

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Related Tags

AI ModelsCar AuctionsPrice EstimationAI AccuracyAuction ResultsGemini ProChatGPT5 ProGrock Deep SearchClaude 4.1Vintage CarsTechnology Testing