Наталя Хандусенко AI Eng 7 April 2025, 16:05

DeepSeek is developing a new method to improve LLM reasoning capabilities: it will help guide AI models to human preferences

DeepSeek, together with researchers at Tsinghua University, developed a technique that combines two methods: generative reward modeling (GRM) and self-tuning criticism. This dual approach can help LLM provide better and faster results for common queries.

Leave a comment

DeepSeek is developing a new method to improve LLM reasoning capabilities: it will help guide AI models to human preferences

DeepSeek, together with researchers at Tsinghua University, developed a technique that combines two methods: generative reward modeling (GRM) and self-tuning criticism. This dual approach can help LLM provide better and faster results for common queries.

The new method aims to help artificial intelligence models better follow human preferences by offering rewards for more accurate and understandable answers, according to a paper published on the arXiv platform.

Reinforcement learning has proven effective in speeding up AI tasks in narrow domains and areas. But extending it to more general models has proven challenging—and that’s exactly the problem the DeepSeek team is trying to solve with what it calls self-tuning criticism .

Generative Reward Modeling (GRM) is a process that guides LLMs toward human preferences and will become a key component of future AI models.

According to the article, the new strategy outperformed existing methods and models in various tests, and the result showed better performance with fewer computational resources.

According to the researchers, DeepSeek intends to make GRM models open, but they do not say when exactly this will happen.

In January, Chinese startup DeepSeek introduced a new version of AI — DeepSeek-R1, which beat its main competitor ChatGPT in the US App Store by almost a week. With its popularity, DeepSeek-R1 caused a drop in the stock prices of technology companies, including top graphics processor manufacturer Nvidia (its CEO Jensen Huang's fortune decreased by more than $20 billion).

Later, OpenAI accused the Chinese company DeepSeek of using American AI models to train a chatbot, and various countries began to block it.

The ChatGPT and Nvidia Killer. How China's DeepSeek Perfectly Applied the "Cheap and Angry" Principle to an AI Model, Shaking Up Global Markets

DeepSeek announced that they were affected by a large-scale cyberattack and restricted user registration

Domains with the same name as DeepSeek have been registered in Ukraine and are already advertising casinos