🚀💳 Trustee Plus - більше ніж криптогаманець з європейською платіжною карткою. Спробуй 👉
Наталя ХандусенкоAI Eng
15 April 2025, 17:13
2025-04-15
Overtraining an LLM can lead to reduced productivity, new study finds
For the past few years, it has been assumed that the more an AI model is trained, the better its results will be. But a group of researchers from several US universities may now challenge that.
For the past few years, it has been assumed that the more an AI model is trained, the better its results will be. But a group of researchers from several US universities may now challenge that.
Artificial intelligence researchers at Carnegie Mellon University, Stanford, Harvard, and Princeton universities have found that overtraining large language models can negatively impact their performance.
They came to this conclusion when they tested the AI performance of two different versions of the LLM OLMo-1B: one model was trained using 2.3 trillion tokens, the other using 3 trillion tokens. They then tested them using several benchmarks, such as ARC and AlpacaEval. They found that the second AI model performed 3% worse than the first, Tech Xplore reports .
Surprised by their findings, the researchers ran more tests and got similar results, suggesting that there is a point at which more training starts to make the models less “intelligent.” The research team calls this “catastrophic overtraining” and suggests that it is due to what they describe as “progressive sensitivity.”
They also suggest that as the number of tokens increases, the model becomes more fragile. This means that fine-tuning, which can be thought of as adding noise, begins to negate the improvements previously observed.
To test their theory, they added Gaussian noise to some of the models and found that it led to the same type of performance degradation they had witnessed before. They called the point of no return the “inflection point.” They hypothesize that after this point, any further training will reduce the stability of the model, making it harder to tune it to be useful for a desired set of applications.
In conclusion, the researchers suggest that in the future, developers of LLM models may have to assess whether the level of training is sufficient or seek other methods for additional training to avoid reaching the point of no return.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.