Наталя Хандусенко AI Eng 23 July 2025, 11:32

Can AI chatbots overestimate their own abilities? Two-year study shows yes

Over the course of two years, researchers evaluated the ability of four LLMs to determine whether they were right. The study found that the AI was not yet adept at self-analysis.

Leave a comment

Can AI chatbots overestimate their own abilities? Two-year study shows yes

Over the course of two years, researchers evaluated the ability of four LLMs to determine whether they were right. The study found that the AI was not yet adept at self-analysis.

In addition to artificial intelligence, the study also involved humans. They were all asked how confident they felt in their ability to answer common questions, predict the results of NFL games or the Academy Awards, or play a picture recognition game like Pictionary, Tech Xplore reports .

Both humans and LLMs were overconfident about how well they could hypothetically answer correctly. However, after the results, only humans were able to admit that they had overestimated their abilities.

"People told us they would get 18 questions right, and they ended up getting 15. Typically, people's later estimate was about 16 correct answers. So they were still a little overconfident, but not as much as the AI."

One of the strengths of the study was that data was collected over a two-year period, which meant using continuously updated versions of the LLM models, namely ChatGPT, Gemini, Sonnet, and Haiku.

If you ask an AI about the population of London, it will give you an accurate answer based on data from the internet. However, when asked about future events, such as who will win an Oscar, the researchers found that chatbots are weak in being aware of their own thought processes.

Sonnet was less confident than the others. ChatGPT-4 performed similarly to humans on the Pictionary task: it correctly identified 12.5 out of 20 hand-drawn images. Gemini, on the other hand, was able to identify only 0.93 sketches on average.

Furthermore, Gemini predicted that he would perform an average of 10.03 correct sketches, and even after he answered less than one of the 20 questions correctly, the AI estimated that he answered 14.40 correctly, demonstrating a lack of self-awareness.

"Gemini was just really bad at Pictionary. But what's worse, he didn't know he was bad at Pictionary," the researchers note.

For regular chatbot users, the most important takeaway from the study is that it's worth remembering that LLMs are not inherently correct, and that it might be a good idea to ask them how confident they are when answering important questions.

Generative AI has no impact on wages and employment — economists' study

Only 26% of Ukrainians have experience in practical use of AI — study. Who uses the technology most often

Overtraining LLMs may lead to reduced productivity, new study shows

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: ai, study

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment