UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Олег ОнопрієнкоHot News
18 December 2025, 13:19
2025-12-18
Researchers from UCU and KNU have significantly improved the quality of Ukrainian speech synthesis
The development is aimed at solving one of the most difficult problems in Ukrainian language processing — the correct reproduction of words with the correct accents depending on the content of the sentence.
The development is aimed at solving one of the most difficult problems in Ukrainian language processing — the correct reproduction of words with the correct accents depending on the content of the sentence.
A team of scientists from the Ukrainian Catholic University and Taras Shevchenko National University of Kyiv presented a new comprehensive solution for Text-to-Speech systems.
Researchers Anastasia Senyk, Mykhailo Lukyanchuk, Valentina Robeiko, and Yuriy Paniv have developed an innovative approach that combines context-sensitive stress prediction with a new phonemizer. Yuriy Paniv announced the new breakthrough in speech synthesis on his blog.
In particular, the researchers did the following work:
A manually marked-up benchmark of text highlighting methods and, accordingly, measurements of existing highlighting methods.
A model for recognizing stress in text for auto-tagging homographs.
A model that places accents in context, a hybrid approach with a dictionary — this is now SOTA.
A phonemizer based on the method from Moisienko's "Modern Ukrainian Literary Language: Lexicology. Phonetics", the code for which was made by Mykhailo Lukyanchuk under the supervision of Valentyna Robeiko.
The main obstacle to creating a natural Ukrainian “voice” for robots has always been the complex phonology and non-deterministic stress system. In the Ukrainian language, there are homographs — words that are spelled the same but have different meanings and sounds depending on the stress (for example, “зАмок” and “зАмОк”, “дорОга” and “дорогА”).
Previous systems often made mistakes because they relied solely on dictionaries without understanding context, or used rules that overgeneralized pronunciation. The researchers' new approach is the first to offer a model that analyzes entire sentences to determine the correct stress and phonemes.
The technical solution is based on a hybrid architecture that uses the ByT5 neural network for context analysis and detailed linguistic rules for converting letters into sounds. In addition to the model itself, the team created and made publicly available the first benchmark for evaluating stress prediction systems, consisting of over 1,000 annotated sentences.
Experiments have shown that the new phonemizer achieves a low error rate (WER) of just 1.23% on the test dataset, and the combined stress prediction system outperforms existing neural network counterparts, achieving an accuracy of 92.5%.
For end users, these changes mean a significant improvement in the sound quality of Ukrainian-language virtual assistants, navigation systems, and screen readers.
Thanks to the new technology, the synthesized voice will sound more natural and "human", correctly intoning complex sentences and rare words. The authors have made all the developments, including code and data, publicly available , which will allow other developers to integrate these improvements into their products.
As we recall, Yuriy Paniv, in an exclusive interview for dev.ua about the development of a large language model for the Ukrainian language Lapa LLM, said that the model is one and a half times faster than Gemma 3.
Ukrainian answer ChatGPT. How Kyivstar and the Ministry of Digital Economy will build a national LLM for Ukraine: insights and international AI experience VEON