UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Наталя ХандусенкоAI Eng
27 March 2025, 18:11
2025-03-27
Meta may have trained its AI models on unpublished books: how did this happen?
Last week, the Atlantic reported that Meta had used millions of pirated books to train Llama 3. The article also included a handy search bar where authors could enter their names to see if Meta had used their work to train its AI. Author Maris Kreizman used this tool to find her book, which is due out this summer.
Last week, the Atlantic reported that Meta had used millions of pirated books to train Llama 3. The article also included a handy search bar where authors could enter their names to see if Meta had used their work to train its AI. Author Maris Kreizman used this tool to find her book, which is due out this summer.
The Atlantic article said that Meta was using LibGen, a Russian pirate file-sharing site that purportedly aimed to make academic works more accessible worldwide. The 7.5 million books in question.
When Maris used the tool, she found her previous book for 2015 and also a new book due out on July 1.
When I searched, I found my previous book, and in the grand scheme of things, it was a shrug-worthy one. It was published in 2015, sold about 100 copies, and is now out of print. But my upcoming collection of essays isn’t due out until July 1st, and yet Meta has somehow already gotten access to it to train its AI. Digital gallery back copies are mostly only legally available on NetGalley and Edelweiss, and both of those services have strict terms about what users can do with unpublished work (not much!). How the hell did LibGen, and by extension Meta (and possibly OpenAI as well), get their hands on unpublished work?
«But my upcoming collection of essays won’t be published until July 1st, and yet somehow Meta has already gained access to it to train its AI. Pre-prints of digital galleries are mostly legally available only on NetGalley and Edelweiss, and both of those services have strict terms about what users can do with unpublished works (not much!). How did LibGen, and therefore Meta (and possibly OpenAI), gain access to unpublished works?» the author noted .
She also added, «I haven’t even gotten any pre-publication reviews yet, but my work already belongs to Meta.»
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
25 серпня вийшла чергова серія популярного подкасту The Joe Rogan Experience, гостем якого став глава компанії Meta Марк Цукерберг. Розповідаємо про головне з майже 3-годинного інтерв’ю.