Наталя Хандусенко AI Eng 15 January 2026, 13:41

AI models are beginning to solve complex mathematical problems, surprising even the world's leading scientists

Over the weekend, Neil Somani, a software engineer and former quantum mathematics researcher, decided to test what a new OpenAI model could do in mathematics. The result was unexpected: he uploaded a problem to the chat and gave the AI 15 minutes to think about it. Neil checked the resulting solution using the Harmonic tool — the proof was flawless and passed formal verification.

Leave a comment

AI models are beginning to solve complex mathematical problems, surprising even the world's leading scientists

Over the weekend, Neil Somani, a software engineer and former quantum mathematics researcher, decided to test what a new OpenAI model could do in mathematics. The result was unexpected: he uploaded a problem to the chat and gave the AI 15 minutes to think about it. Neil checked the resulting solution using the Harmonic tool — the proof was flawless and passed formal verification.

“I was interested in determining the baseline: what complex mathematical problems large language models can already handle and what they still can’t,” Somani said. The surprise was that with the use of the latter model, the frontier of AI capabilities began to shift somewhat forward.

ChatGPT's "chain of reasoning" is even more impressive: it confidently operates with such mathematical axioms as Legendre's formula, Bertrand's postulate, and the Star of David theorem, writes TechCrunch.

In the process, the model came across a 2013 Math Overflow post by Harvard professor Noam Elkis solving a similar problem. But the proof that ChatGPT published was not a simple copycat — it was fundamentally different from Elkis's version. What's more, the AI provided a comprehensive answer to a version of the problem by the legendary Pál Erdős, whose vast collection of unsolved problems became a veritable testing ground for AI.

This success is a real challenge for skeptics who do not believe in the possibilities of AI, and such cases are becoming more and more common. Artificial intelligence is everywhere in mathematics today: from the specialized Aristotle model for formal proofs to Deep Research for searching for scientific papers. But it was with the release of GPT 5.2 — which Somani calls significantly smarter than previous versions — that the volume of AI problems solved became so large that it is impossible to ignore. This raises new questions about the ability of LLM to expand the boundaries of human knowledge.

Somány analyzed the Erdős problems, a collection of more than 1,000 conjectures by the Hungarian mathematician that are publicly available. These problems, which vary greatly in subject matter and complexity, have become a tempting target for AI mathematics. While Google Gemini’s AlphaEvolve model showed initial success in November, Somány and his colleagues have recently found that GPT 5.2 is remarkably adept at solving high-level mathematical problems.

Since Christmas, the status of 15 problems on the Erdős website has been changed from "open" to "solved" — and in 11 cases, the notes to the solutions explicitly stated that artificial intelligence models were used in the process.

Renowned mathematician Terence Tao provides more modest statistics on GitHub . He highlights eight examples where AI has made progress on Erdős problems on its own, and six more where neural networks have helped by finding and refining old scientific papers. The time when AI will be able to do mathematics completely without human help is still a long way off, but the role of large models in this field is becoming increasingly important.

On Mastodon, Tao suggested that the scalability of AI systems makes them "better suited for systematic application to the 'long tail' of little-known Erdös problems, many of which actually have simple solutions."

“Thus, many of these simpler Erdős problems are now more likely to be solved solely by AI-based methods than by human or hybrid means,” Tao noted.

Another driving force is the recent shift toward formalization, a labor-intensive process that makes it easier to test and develop mathematical reasoning. Formalization does not necessarily require the use of AI or even computers, but the advent of new automated tools has greatly simplified the task. The open-source “proof assistant” Lean, developed at Microsoft Research in 2013, has become widely used in the field as a means of formalizing proofs, and AI tools such as Harmonic’s Aristotle promise to automate much of this work.

For Harmonic founder Tudor Achim, the sudden surge in Erdös problems is less important than the fact that the world’s top mathematicians are starting to take these tools seriously. “I’m more concerned about the fact that math and computer science professors are using [AI tools],” Achim said. “These people care about their reputation, so when they say they use Aristotle or ChatGPT, that’s the real evidence.”

A Ukrainian woman has received an international mathematics prize that is awarded to only one person in the world each year. What is her discovery?

A math teacher from Prykarpattia created an app for studying math in high school. How Matematikum works

“I studied Data Science since I was 4 years old. Every math lesson was important.” The story of a math genius from a small village in Volyn who is now creating an LLM for Lyft Reface and AppFlame

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: ai, mathematics

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment