Олег Онопрієнко AI Eng 3 October 2025, 12:09

Ukrainian researchers present Lapa LLM — the first national AI model for reasoning

Lapa LLM is positioned as the first Ukrainian large language model specifically tuned for reasoning, aligned with national values. According to internal benchmarks, the model already demonstrates better results than Gemma 3.

Leave a comment

Ukrainian researchers present Lapa LLM — the first national AI model for reasoning

Lapa LLM is positioned as the first Ukrainian large language model specifically tuned for reasoning, aligned with national values. According to internal benchmarks, the model already demonstrates better results than Gemma 3.

A team of Ukrainian and Polish researchers from the Ukrainian Catholic University, Kyiv Polytechnic Institute, and the Academy of Mining and Metallurgy in Krakow (AGH) announced an ambitious project — Lapa LLM. The large language model was presented by Yuriy Paniv, a UCU graduate student and data scientist at Nortel, during a speech at IT Arena 2025.

The development of Lapa LLM is motivated by a number of critical problems that are not addressed by existing open models. The key goals of the project include:

National Security and Privacy: The model is being developed to work with sensitive data in the defense sector and in large companies that need to process information in a closed loop, without correcting it to cloud providers.
Cultural coherence: Lapa LLM is taught with a focus on Ukrainian values and context. Automatic data filtering is used to prevent Russian propaganda and disinformation, as well as to avoid “hallucinations” about knowledge about Ukraine.
High performance: The project seeks to solve the problem of unsatisfactory performance for the Ukrainian language, characteristic of many open LLMs.

Lapa LLM is based on Google's Gemma model, which has a size of 12 billion parameters. This choice provides a high balance between size and capabilities and allows the model to run on affordable devices.

A key technical breakthrough was the development of an advanced Ukrainian tokenizer. This resulted in a 1.5x reduction in the number of tokens when processing Ukrainian text, making Lapa LLM faster and cheaper to operate. According to internal benchmarks, the model already outperforms even the larger Gemma 3 on 27 billion parameters.

High-quality datasets and materials from the Harvard University library were used for training.

The release of Lapa LLM is scheduled for early October 2025. The team intends to release the model, datasets, and training scripts under the MIT license.

The project received significant support: computing resources (three-month access to eight nodes with H100) were provided by Common AI, and Hugging Face provided a free corporate subscription.

Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON

Ukrainian researchers tested 27 LLMs from different countries for "friendliness towards Ukraine": Canadian developments turned out to be the most pro-Ukrainian, the lowest indicator in China

“I studied Data Science since I was 4 years old. Every math lesson was important.” The story of a math genius from a small village in Volyn who is now creating an LLM for Lyft Reface and AppFlame