Валентин Шнайдер AI Eng 4 March 2026, 15:52

Checking chatbots for fakes: a comparison of ChatGPT, Gemini, and Claude on the same prompts showed who most often makes up details in the news

Tom’s Guide tested three popular chatbots on seven identical queries about military news surrounding the strikes on Iran and checked how the models behaved in a critically important topic, where some messages change every hour, and some may be a hoax.

Leave a comment

Checking chatbots for fakes: a comparison of ChatGPT, Gemini, and Claude on the same prompts showed who most often makes up details in the news

Tom’s Guide tested three popular chatbots on seven identical queries about military news surrounding the strikes on Iran and checked how the models behaved in a critically important topic, where some messages change every hour, and some may be a hoax.

According to Tom’s Guide, the editorial team conducted 7 tests under various risks: hallucinations, overconfidence without confirmation, confusion in numbers, and willingness to answer questions that may cross the line of responsible public information. The general summary of the material is as follows: Claude won in all seven rounds, Gemini lost due to the largest number of invented details, and ChatGPT usually kept the correct frame, but periodically «finished» unverified elements.

In the first test on «breaking news» (a summary of 48 hours surrounding the reports of Ali Khamenei’s death and the reaction of state media), the publication writes that ChatGPT gave a detailed answer, but added speculative elements, in particular about the mechanisms of succession, which were not in the verified references. Gemini also answered very confidently, but was mistaken in some precise details, while Claude, in the editorial assessment, stuck to the confirmed reports and did not invent specifics.

In a military-technical inquiry into how Iranian air defenses and radars affected the first wave of strikes, Tom’s Guide notes that ChatGPT explained the principles of air defense systems but added unconfirmed claims about specific targets hit, while Gemini provided a «ready-made story» with details that were not substantiated by sources. Claude received a higher score for sticking to confirmed claims and not «filling in the gaps» with speculation.

In the section on geopolitics and the Iranian axis of allies, the publication directly writes that Gemini fabricated a critical detail by giving the wrong date for the fall of the regime in Syria. ChatGPT showed a stronger analysis, but in places uncertainly interpreted the status of individual events, and Claude, according to the editorial board, was the best at «grounding» conclusions in sources and carefully separating fact from assumption.

Separately, the authors noted a request that could be turned into an instruction for hitting targets. Here, Claude refused to give a step-by-step «technical» hint, explaining the limits of a safe answer. At the end, there was a test for a fake: the «Geneva Agreement», which did not exist. All models rejected the fictional premise, but Claude, according to the publication, explained best why it was fake, and most accurately recreated the real course of negotiations without adding fictional facts.

The test results are as follows: Claude was the most reliable in news, ChatGPT was in the middle, and Gemini most often added details that were not in the sources.

In its conclusion, the tech publication emphasizes that the most dangerous mistake chatbots make in news is not «ignorance,» but the confident filling of gaps with plausible fabrications. The editorial team also noted that it has reached out to Google for comment and plans to update the publication after a response.

Previously, dev.ua wrote about how journalists from the publication Texty.org.ua analyzed 595 videos that were generated by AI and used images of famous women, including news anchors.

Google suspends monetization of false content about the war with Ukraine

Musk is smarter than Einstein and stronger than Tyson. AI Grok does not skimp on absurd compliments for its owner

OpenAI teaches ChatGPT to admit to cheating and violating instructions

Read the country's main IT news in our Telegram

Leave a comment

Text: Валентин Шнайдер Photo: Macaron Source: Tom’s Guide Tags: gemini, claude, chatgpt, chatbots, fake, test, ai, artificial intelligence

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment