Олександр Кузьменко AI Eng 17 June 2025, 16:22

ChatGPT and its competitors have “polluted the internet” with information noise, which is already hampering the development of future AI models

The rapid growth of AI models from OpenAI and its competitors has flooded the internet with low-quality information that enters the training datasets of new AI models and causes errors.

Leave a comment

ChatGPT and its competitors have “polluted the internet” with information noise, which is already hampering the development of future AI models

The rapid growth of AI models from OpenAI and its competitors has flooded the internet with low-quality information that enters the training datasets of new AI models and causes errors.

This is reported by Futurism, citing The Register, which compares the situation to the demand for «low-background steel,» which was produced before the detonation of the first nuclear bombs, starting in July 1945. These explosions released radionuclides and other particles that seeped into virtually all steel produced thereafter.

This makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. A significant source of low-background steel even today is the battleships of World Wars I and II, including the huge naval fleet that was sunk in 1919.

«This allowed us to have an almost infinite supply of low-background steel. If it weren’t for that, we would be stuck. But if you collect data until 2022, you can be pretty sure that it has minimal, if any, contamination from generative AI. Everything before that date is ‘safe, good, clean,’ everything after that is ‘dirty,’» the scientist noted.

In 2024, Chiodo co-authored a paper arguing that there should be a source of «clean» data, not only to prevent model collapse but also to ensure fair competition among AI developers. He argues that early adopters have an advantage because they are the only ones who benefit from a cleaner source of training data before AI pollutes the internet.

Scientists currently disagree on whether AI models collapse due to data pollution, but many researchers have been sounding the alarm for years.

«It’s unclear right now to what extent model collapse will be a problem, but if it is a problem, and we have polluted this data environment, cleanup will be extremely expensive, probably impossible», — says Chiodo.

There are already issues with «augmented search generation,» which AI models use to supplement their old training data with information gleaned from the internet in real time. But this new data is not guaranteed to be free of AI fakes, and some studies have shown that it leads to chatbots giving far more «dangerous» answers.

After OpenAI and others reported diminishing returns from their latest models in late 2024, some experts said scaling had hit a ceiling. And if that data becomes increasingly laden with «information noise,» that hurdle will become even more insurmountable.

Chiodo suggests that stricter rules, such as labeling AI content, could help label some of this «noise,» but this will be difficult to implement, especially given companies’ resistance to government regulation of the AI space.

Read the country's main IT news in our Telegram

Microsoft is now 30% more polluting than it was in 2023. The main reason is its obsession with AI development

Nightshade tool helps artists protect their images from AI by infecting them with invisible pixels that poison the training data

Chinese programmers travel with AI in suitcases to bypass US chip restrictions

Leave a comment

Text: Олександр Кузьменко Photo: Fierce Network Source: Futurism Tags: ai, chatgpt

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment