LLM Vision Battle: GPT vs. Deepseek vs. Gemini & More – Who Sees Best?

sandy · January 31, 2025, 9:05pm

Since my project needs some serious vision intelligence, I figured—why not throw a bunch of AI models into the ring and see who actually has eyes?

So, I put GPT (o1 and 4o), Deepseek, Gemini, Grok, Qwen, Doubao, Yiyan, and Xinghuo to the test with an image recognition challenge. The results? Well… let’s just say some AIs need glasses.

I’ve attached the test image and result screenshots for you to judge.

TL;DR:

GPT gave the most structured answer—like a top student who actually studied.
Gemini… well, let’s just say it felt uninspired. Come on, buddy, put in some effort!
The rest? Pretty much on the same level—decent, but no standout performers.
Possible reason? GPT might’ve had an advantage with higher-end models. But still, Gemini, you can do better.

What do you think? Have you tried these models for image recognition? Let’s discuss!