Олег Онопрієнко AI Eng 17 July 2025, 15:47

ChatGPT and other popular AI models will not be able to pass the external assessment test — Ukrainian researchers

Ukrainian researchers tested AI on external examination tasks. None of the models scored 70% of correct answers. The best result — 67.5% — was achieved by Gemini Pro.

How it was tested: 13 subjects, thousands of questions, GPU cluster

ZNOVision consists of over 4,300 tasks divided into 13 categories: from physics and mathematics to history and literature. More than half of them contain a visual component - schemes, diagrams, maps, drawings. Some questions require logical deduction (reasoning), others - accurate interpretation of instructions in Ukrainian.

Six main models were involved in the testing: GPT4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen2VL72B, Paligemma3B, and a retrained version of PaligemmaFT. To process questions and deploy models, the team used the De Novo cloud infrastructure, which provided access to GPU clusters in a private cloud certified according to state requirements of the KSZI.

None of the models scored 70% correct answers. The best result — 67.5% — was obtained by Gemini Pro. Claude 3.5 — 64.3%, Qwen2VL — 51.2%, GPT4o — 47%. For comparison, a random selection would have given ≈ 22%. Errors most often occurred in complex visual-text tasks: the models did not recognize Ukrainian words in images, confused units of measurement, and ignored part of the wording. In the VQAUA set (visual questions), the models gave: Claude — 26.7%, GPT4o — 29%, Qwen2VL — 34.4%. This is significantly lower than the English results (> 60%) and indicates the lack of support for the Ukrainian language at the level of multimodal representations.

How to use it: a product and infrastructure perspective

ZNOVision is not just a research tool. It is a practical tool for testing Ukrainian-language AI solutions in education, automated support, content moderation, and localization. Startups can use it as a base for fine-tuning their own models, and EdTech platforms can use it to build adaptive tests. De Novo’s cloud infrastructure became a cornerstone in the implementation of this project. The company’s resources allowed researchers to deploy multiple models simultaneously, conduct large-scale tests, and obtain representative data.

“Artificial intelligence should not be the monopoly of a few languages. Ukrainian should sound as confident in the systems of the future as English. And we at De Novo believe that we can create the technological foundation for this here, in Ukraine,” notes Maksym Ageyev, CEO of De Novo.

Recently, dev.ua, in an article about the Ukrainian large language model , talked about the Bulgarian LLM, which was created for the Ukrainian language. by the state and citizens. MamayML showed the best results in the ZNO benchmark among models of similar size, while outperforming much larger models, including Gemma2 27B, Llama 3.1 70B and Qwen 2.5 72B.

Read the country's main IT news in our Telegram

Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON

Oleksandr Bornyakov announced that the Ukrainian large language model will be increased to 11 billion tokens

"This is a large enough resource for which we do not have the opportunity to receive money. And Kyivstar proactively volunteered to help." Bornyakov explained the partnership to create a national LLM