Meta Platforms, the parent company of Facebook and Instagram, is facing a consolidated lawsuit from prominent authors, including Sarah Silverman and Michael Chabon, alleging copyright infringement. The core accusation is Meta’s alleged unauthorized use of thousands of copyrighted books for training its artificial intelligence language model, Llama.
Despite Meta’s legal team issuing stern warnings about the legal risks associated with using pirated books for AI training, the company reportedly proceeded with the contentious dataset. The situation gained complexity as evidence from chat logs surfaced, revealing Meta-affiliated researcher Tim Dettmers discussing the dataset’s procurement in a Discord server.
Dettmers, according to the chat logs, engaged with Meta’s legal department, expressing concerns about the legality of utilizing book files for training data. The legal team advised against immediate use, citing issues related to “books with active copyrights.” The chat participants debated whether training on such data could be justified under the fair use doctrine, a U.S. legal principle protecting specific unlicensed uses of copyrighted works.
Originally initiated over the summer, the lawsuit recently consolidated two separate legal actions against Meta. Last month, a California judge dismissed part of the Silverman lawsuit, prompting authors to seek amendments to their claims, signaling a dynamic legal situation.
The implications of this legal battle extend beyond Meta, potentially affecting the broader AI industry. Success in these lawsuits could elevate the cost of developing data-intensive AI models, subjecting companies to increased scrutiny and compensation demands from content creators. Additionally, new regulatory rules in Europe may compel AI companies, including Meta, to disclose the data used to train their models, exposing them to additional legal risks.
At the heart of the controversy are Meta’s Llama models, especially the latest version, Llama 2, released in the summer. While the first version used the “Books3 section of ThePile” for training, Meta has not disclosed details about the training data for Llama 2, a potential disruptor in the generative AI software market.