UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Олександр КузьменкоHot News
14 January 2025, 13:48
2025-01-14
Meta faces lawsuit accusing it of training AI models on books from Russian pirate library LibGen. The company tried to keep the revelations to a minimum.
The court rejected Meta’s request to suppress information that the company used Library Genesis (LibGen), a Russian pirate book library, to train its generative language artificial intelligence models.
The court rejected Meta’s request to suppress information that the company used Library Genesis (LibGen), a Russian pirate book library, to train its generative language artificial intelligence models.
The case, «Kadrey et al. v. Meta Platforms,» was one of the first copyright infringement lawsuits filed against a technology company over its AI training practices. Its outcome, along with the outcome of dozens of similar cases pending in U.S. courts, will determine whether tech companies can legally use creative works to train AI, WIRED reports.
Last week, a judge in the Northern District of California district court ordered the parties to submit the full documents, calling Meta’s approach to redacting them «absurd.» He noted that «there is nothing in these materials that should be sealed.» The judge said Meta insisted on redacting the materials not to protect its business interests, but to «avoid negative publicity.»
«If information appears in the media that we used a dataset that we know is pirated, such as LibGen, it could undermine our position in negotiations with regulators on these issues,» the judge quoted a Meta employee as saying.
Writers Richard Kadri and Christopher Golden, as well as comedian Sarah Silverman, first filed a class-action lawsuit against Meta in July 2023, alleging that it trained its language models using their copyrighted works without permission.
Meta argued that using publicly available materials to train AI tools was covered by the «fair use» doctrine, which says that using copyrighted works without permission is legal in certain cases, one of which, according to the company, is «using text for statistical language modeling and generating original utterances.»
Meta previously revealed in a research paper that it trained its large Llama language model on snippets from Books3, a dataset of about 196,000 books scraped from the internet. However, it had not previously publicly stated that it was loading the data directly from LibGen.
The documents revealed that Meta employees were hesitant to access the LibGen data because «a torrent from a corporate laptop doesn’t seem right.» They also claim that internal discussions about using the LibGen data were relayed to Meta CEO Mark Zuckerberg, and that Meta’s AI team received «permission to use» the pirated material.
LibGen, an archive of books uploaded to the Internet that emerged in Russia around 2008, is one of the largest and most controversial «shadow libraries» in the world. In 2015, a New York judge granted a preliminary injunction against the site, which was theoretically intended to temporarily shut down the archive, but its anonymous administrators simply changed its domain. In September 2024, another New York judge ordered LibGen to pay copyright owners $30 million for violating their copyrights, despite not knowing who actually ran the pirate hub.
Recall that in 2023, several writers, including John Grisham («The Firm») and George R. R. Martin («Game of Thrones»), sued OpenAI over its chatbot ChatGPT. They believe that the company unknowingly used their works to train its AI.
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
25 серпня вийшла чергова серія популярного подкасту The Joe Rogan Experience, гостем якого став глава компанії Meta Марк Цукерберг. Розповідаємо про головне з майже 3-годинного інтерв’ю.