UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Олег ОнопрієнкоAI Eng
17 July 2025, 15:47
2025-07-17
ChatGPT and other popular AI models will not be able to pass the external assessment test — Ukrainian researchers
Ukrainian researchers tested AI on external examination tasks. None of the models scored 70% of correct answers. The best result — 67.5% — was achieved by Gemini Pro.
Ukrainian researchers tested AI on external examination tasks. None of the models scored 70% of correct answers. The best result — 67.5% — was achieved by Gemini Pro.
A team of Ukrainian researchers presented ZNOVision — the first multi-format test that tests the capabilities of artificial intelligence to work with the Ukrainian language, educational content, and national culture. The results showed that even the most powerful models, such as GPT4o or Claude 3.5, would not have passed the Ukrainian ZNO.
The idea behind ZNOVision is simple — if a model can pass a test created for applicants to Ukrainian universities, it really “understands” something.
How it was tested: 13 subjects, thousands of questions, GPU cluster
ZNOVision consists of over 4,300 tasks divided into 13 categories: from physics and mathematics to history and literature. More than half of them contain a visual component - schemes, diagrams, maps, drawings. Some questions require logical deduction (reasoning), others - accurate interpretation of instructions in Ukrainian.
Six main models were involved in the testing: GPT4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen2VL72B, Paligemma3B, and a retrained version of PaligemmaFT. To process questions and deploy models, the team used the De Novo cloud infrastructure, which provided access to GPU clusters in a private cloud certified according to state requirements of the KSZI.
None of the models scored 70% correct answers. The best result — 67.5% — was obtained by Gemini Pro. Claude 3.5 — 64.3%, Qwen2VL — 51.2%, GPT4o — 47%. For comparison, a random selection would have given ≈ 22%. Errors most often occurred in complex visual-text tasks: the models did not recognize Ukrainian words in images, confused units of measurement, and ignored part of the wording. In the VQAUA set (visual questions), the models gave: Claude — 26.7%, GPT4o — 29%, Qwen2VL — 34.4%. This is significantly lower than the English results (> 60%) and indicates the lack of support for the Ukrainian language at the level of multimodal representations.
How to use it: a product and infrastructure perspective
ZNOVision is not just a research tool. It is a practical tool for testing Ukrainian-language AI solutions in education, automated support, content moderation, and localization. Startups can use it as a base for fine-tuning their own models, and EdTech platforms can use it to build adaptive tests. De Novo’s cloud infrastructure became a cornerstone in the implementation of this project. The company’s resources allowed researchers to deploy multiple models simultaneously, conduct large-scale tests, and obtain representative data.
“Artificial intelligence should not be the monopoly of a few languages. Ukrainian should sound as confident in the systems of the future as English. And we at De Novo believe that we can create the technological foundation for this here, in Ukraine,” notes Maksym Ageyev, CEO of De Novo.
Recently, dev.ua, in an article about the Ukrainian large language model , talked about the Bulgarian LLM, which was created for the Ukrainian language. by the state and citizens. MamayML showed the best results in the ZNO benchmark among models of similar size, while outperforming much larger models, including Gemma2 27B, Llama 3.1 70B and Qwen 2.5 72B.
Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON
“This is a large enough resource for which we have no opportunity to get money. And Kyivstar proactively volunteered to help.” Bornyakov explained the partnership to create a national LLM