UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Ігор Вишневський AI Eng
19 November 2025, 09:08
2025-11-19
"The Gemini paradox is that in real life it doesn't perform very well compared to Claude and ChatGPT. And I don't understand why, given these figures." Experts on Google's new Gemini 3 model
Oleksandr Krakovetsky, an artificial intelligence expert and author of a number of books and courses on AI, evaluating Google’s new Gemini 3 model, called it a breakthrough, if we rely on benchmarks.
Oleksandr Krakovetsky, an artificial intelligence expert and author of a number of books and courses on AI, evaluating Google’s new Gemini 3 model, called it a breakthrough, if we rely on benchmarks.
«For example, one of the most difficult benchmarks, Humanity’s Last Exam, Gemini 3 passes by 37,5%, and in the tool-based format, by an incredible 45,8%. In GPT-5.1, it is „only“ 26,5%, in Gemini 2.5 Pro, it is 13,7%,» he noted in a post on his Facebook.
He also noted the improved MRCR benchmark, which assesses work with a long context. «It was 58%, it became 77%, this is a cool result,» he stated.
At the same time, Krakovetsky also expressed his subjective opinion about the Gemini models, noting that he still feels a significant difference between benchmarks and real-world use.
«Subjectivity: the Gemini paradox is that in real life Gemini doesn’t perform very well compared to Claude and ChatGPT. And I don’t really understand why, given such indicators,» the expert added.
In turn, AI expert Alexey Minakov also emphasized that Gemini 3.0 Pro outperforms GPT-5.1 in almost all benchmarks.
«You can already test it for free in Google AI Studio. For example, I asked it a test question — how can Ukraine defeat Russia in a full-scale war. I would like to separately note the marketing of this model — on the eve of the event, they „accidentally“ leaked (posted on the website) the results of taking tests with this model. To stir up interest, they actually announced it in this way,» he wrote on his FB page.
Alexey Minakov stated that if you believe the benchmarks, then this is currently the best model among all when used specifically for complex and large tasks that require calculations and elements of logic and reasoning.