Best LLM at Predicting Car Auctions - Part 2 (Results)
A surprise winner?
As a recap for the laggards who missed the previous post, I’m benchmarking the Top 5 LLM models to determine which ones are best at predicting car auction prices.
The results are in and Gemini crushed the field - #1 in almost all metrics.
I’ll add some commentary after the visuals.
Thoughts
#1 Google (Gemini 3.5 Flash) - This model was clearly the best. Usually I can find a combination of models (ensemble) that work better than any single model, but not this time — single Gemini is king.
#2 Z.AI (GLM-5) - I was shocked that GLM-5 came in 2nd. It also had the lowest bias of any model, which makes me think the high scores aren’t a fluke. A solid contender, especially for cost-sensitive tasks.
#3 Anthropic (Opus 4.8) - Pretty average by every metric. Not a lot to write about. Fable 5 wasn’t released when I started the test, but I’ll give it a shot in the next run.
#4 OpenAI (GPT 5.5) - GPT consistently overestimated the sales price, especially for lower-value cars. It would be pretty competitive without that segment.
#5 xAI (grok-4.3) - Grok had the opposite problem, it consistently underestimated the final price. For some prediction tasks, the weirdness of Grok is additive to an ensemble — not here though, it’s just bad.
Conclusion
Overall, I was quite impressed with the out-of-the box accuracy of these models.
Gemini doesn’t get the buzz of the other models because it doesn’t have a good coding harness like Claude Code or Codex, but it’s great at these sort of tasks.
Let me know if there are other auctions types you’d like tested? Furniture auctions? Real estate auctions?









