Реклама партнера — Название партнёра
UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉

ChatGPT and other popular AI models will not be able to pass the external assessment test — Ukrainian researchers

Ukrainian researchers tested AI on external examination tasks. None of the models scored 70% of correct answers. The best result — 67.5% — was achieved by Gemini Pro.

Leave a comment
ChatGPT and other popular AI models will not be able to pass the external assessment test — Ukrainian researchers

Ukrainian researchers tested AI on external examination tasks. None of the models scored 70% of correct answers. The best result — 67.5% — was achieved by Gemini Pro.

A team of Ukrainian researchers presented ZNOVision — the first multi-format test that tests the capabilities of artificial intelligence to work with the Ukrainian language, educational content, and national culture. The results showed that even the most powerful models, such as GPT4o or Claude 3.5, would not have passed the Ukrainian ZNO.

The idea behind ZNOVision is simple — if a model can pass a test created for applicants to Ukrainian universities, it really “understands” something.

How it was tested: 13 subjects, thousands of questions, GPU cluster

ZNOVision consists of over 4,300 tasks divided into 13 categories: from physics and mathematics to history and literature. More than half of them contain a visual component - schemes, diagrams, maps, drawings. Some questions require logical deduction (reasoning), others - accurate interpretation of instructions in Ukrainian.

Six main models were involved in the testing: GPT4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen2VL72B, Paligemma3B, and a retrained version of PaligemmaFT. To process questions and deploy models, the team used the De Novo cloud infrastructure, which provided access to GPU clusters in a private cloud certified according to state requirements of the KSZI.

None of the models scored 70% correct answers. The best result — 67.5% — was obtained by Gemini Pro. Claude 3.5 — 64.3%, Qwen2VL — 51.2%, GPT4o — 47%. For comparison, a random selection would have given ≈ 22%. Errors most often occurred in complex visual-text tasks: the models did not recognize Ukrainian words in images, confused units of measurement, and ignored part of the wording. In the VQAUA set (visual questions), the models gave: Claude — 26.7%, GPT4o — 29%, Qwen2VL — 34.4%. This is significantly lower than the English results (> 60%) and indicates the lack of support for the Ukrainian language at the level of multimodal representations.

How to use it: a product and infrastructure perspective

ZNOVision is not just a research tool. It is a practical tool for testing Ukrainian-language AI solutions in education, automated support, content moderation, and localization. Startups can use it as a base for fine-tuning their own models, and EdTech platforms can use it to build adaptive tests. De Novo’s cloud infrastructure became a cornerstone in the implementation of this project. The company’s resources allowed researchers to deploy multiple models simultaneously, conduct large-scale tests, and obtain representative data.

“Artificial intelligence should not be the monopoly of a few languages. Ukrainian should sound as confident in the systems of the future as English. And we at De Novo believe that we can create the technological foundation for this here, in Ukraine,” notes Maksym Ageyev, CEO of De Novo.

Recently, dev.ua, in an article about the Ukrainian large language model , talked about the Bulgarian LLM, which was created for the Ukrainian language. by the state and citizens. MamayML showed the best results in the ZNO benchmark among models of similar size, while outperforming much larger models, including Gemma2 27B, Llama 3.1 70B and Qwen 2.5 72B.

Read the country's main IT news in our Telegram
Read the country's main IT news in our Telegram
On the topic
Read the country's main IT news in our Telegram
Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON
Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON
On the topic
Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON
Oleksandr Bornyakov announced that the Ukrainian large language model will be increased to 11 billion tokens
Oleksandr Bornyakov reported that the Ukrainian large language model will be increased to 11 billion tokens
On the topic
Oleksandr Bornyakov reported that the Ukrainian large language model will be increased to 11 billion tokens
"This is a large enough resource for which we do not have the opportunity to receive money. And Kyivstar proactively volunteered to help." Bornyakov explained the partnership to create a national LLM
“This is a large enough resource for which we have no opportunity to get money. And Kyivstar proactively volunteered to help.” Bornyakov explained the partnership to create a national LLM
On the topic
“This is a large enough resource for which we have no opportunity to get money. And Kyivstar proactively volunteered to help.” Bornyakov explained the partnership to create a national LLM

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

Discussion
No comments yet.