Наталя Хандусенко AI Eng 12 March 2025, 15:46

Artificial intelligence search engines fail accuracy test: study finds 60% of errors

The US-based Tow Center for Digital Journalism examined eight AI search engines, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot. The study tested each for accuracy and recorded how often the tools refused to respond.

How was the research conducted?

The researchers randomly selected 200 news articles from 20 news outlets (10 from each), TechSpot reports . They made sure that each article ranked in the top three results in a Google search when a quoted excerpt from the article was used.

Then, they ran the same query on each AI search engine and scored accuracy based on whether the search correctly cited A) the article, B) the news organization, and C) the URL.

The researchers also rated each search by its degree of accuracy, from "absolutely correct" to "absolutely incorrect."

What did the results show?

As you can see from the chart below, with the exception of both versions of Perplexity, AI did not perform well. Overall, AI search engines get it wrong 60% of the time.

ChatGPT seems to be programmed to respond to every user input at all costs. ChatGPT Search was the only AI tool that answered all 200 article queries. However, it only showed 28% accuracy and was completely inaccurate 57% of the time.

Both versions of Grok AI from X performed poorly, with Grok-3 Search proving to be 94% inaccurate.

Microsoft's Copilot wasn't much better, considering it refused to answer 104 of the 200 queries. Of the remaining 96, only 16 were "completely correct," 14 were "partially correct," and 66 were "completely incorrect," representing a roughly 70% inaccuracy.

Perhaps the craziest thing about this situation is that the companies that create these tools don't advertise this lack of accuracy, yet charge users anywhere from $20 to $200 per month for access to their latest AI models.

Moreover, Perplexity Pro ($20/month) and Grok-3 Search ($40/month) gave slightly more correct answers to queries than their free versions (Perplexity and Grok-2 Search), but had a significantly higher error rate.

The researchers also noted that 5 of the 8 chatbots tested in this study (ChatGPT, Perplexity and Perplexity Pro, Copilot, and Gemini) made public the names of their scanners, giving publishers the opportunity to block them, while the scanners used by the other three (DeepSeek, Grok 2, and Grok 3) are unknown.

You can read more about the research in the Tow Center, published in the Columbia Journalism Review.

Leave a comment

Text: Наталя Хандусенко Photo: Built In Tags: ші, штучний інтелект, онлайн-пошук

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Штучний інтелект DALL-E навчився домальовувати картини. Як це виглядає

Штучний інтелект почав озвучувати фільми на MEGOGO

3 comments

Штучний інтелект навчився реставрувати старі фотографії, перетворюючи їх на якісні зображення: відео

2 comments

«Чи є у мене талант, якщо комп’ютер може імітувати мене?». Штучний інтелект пише книги авторам Amazon Kindle. The Verge поспілкувався з авторами та виявив багато цікавого

Письменники-романісти використовують штучний інтелект для створення своїх творів. Видання про технології The Verge поспілкувалося з письменницею Дженніфер Лепп, яка випускає нову книгу кожні дев’ять тижнів, й дізналося про те, як працює штучний інтелект для написання романів. Наводимо адаптований переклад статті.

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment