Five Publishers and Scott Turow Just Sued Meta for Stealing 267 Terabytes of Books to Train Llama
Mark Zuckerberg personally told his team to stop licensing negotiations and take the content instead. The lawsuit was filed in May. Meta says it is fair use.

The lawsuit arrived on May 5, 2026. Six copyright holders filed suit against Meta Platforms and Mark Zuckerberg personally in the Southern District of New York. The plaintiffs include five of the largest publishers in academic and trade publishing alongside Scott Turow, one of the most recognized names in legal fiction. The allegation: Meta torrented 267 terabytes of books and other written works from pirate repositories to train its Llama artificial intelligence.
This is not a generalized argument about whether AI training constitutes fair use. It is a lawsuit about piracy, about who gave the order, and about what a language model can produce once it has been trained on stolen content.
AT A GLANCE • Filed: May 5, 2026, Southern District of New York • Plaintiffs: Hachette, Macmillan, McGraw Hill, Elsevier, Cengage, Scott Turow • Defendant: Meta Platforms, Inc. and Mark Zuckerberg personally • Alleged: 267 terabytes of books pirated from LibGen and other repositories • Zuckerberg: personally instructed team to stop licensing talks and use pirated content instead (April 2023) • Llama output: "verbatim and near-verbatim copies, replacement chapters, summaries and alternative versions" • Relief sought: Monetary damages, injunction, full accounting of all materials used • Meta's position: Fair use; will "fight aggressively" • Legal context: June 2025 court found fair use in a separate authors' suit against AI companies |
What the Complaint Alleges
The lawsuit is specific about method. Meta identified pirate repositories including LibGen, built torrent streams of the content, and downloaded hundreds of terabytes of books, academic texts, and other written works. That material was then used to train Llama, Meta's open-source large language model.
The suit claims that Llama, as a direct result, can now produce "verbatim and near-verbatim copies" of books, generate replacement chapters of academic textbooks, and output summaries and alternative versions of published works. The plaintiffs argue this amounts to systematic copyright infringement at scale, and that Meta knows it.
|
Why Zuckerberg Is Named Personally
The most significant element of this complaint is not the 267-terabyte figure. It is the chain of command. The lawsuit alleges that in April 2023, Meta was in active licensing negotiations with publishers over access to their content. Those negotiations were abandoned, the suit says, because Zuckerberg personally instructed his team to stop negotiating and use the pirated repositories instead.
That directive removes the most common defense available in AI copyright cases. A company can argue that its engineers made decisions the executive team did not fully understand. That argument is not available here. If the lawsuit's version of events holds, this was a knowing, executive-level decision, which is why Zuckerberg appears alongside Meta Platforms as a named defendant.
The Plaintiffs
Hachette Book Group, Macmillan Publishers, McGraw Hill, Elsevier, and Cengage are not boutique literary houses. They represent much of the academic, scientific, professional, and trade publishing market in the United States. Elsevier alone publishes thousands of peer-reviewed journals and is considered the dominant force in scientific publishing globally.
The sixth plaintiff is Scott Turow. He wrote Presumed Innocent, the legal thriller that established much of the modern template for the genre. He also served two terms as president of the Authors Guild and has spent decades working on copyright and author rights. His presence in this lawsuit is a signal, not just a credential. He is the most prominent individual author to join a publisher-led AI copyright action.
Meta's Position and the Legal Background
Meta said that AI training constitutes fair use and that it will fight the suit. The company has invoked transformative use arguments that have circulated through AI copyright litigation since 2023. Those arguments found some judicial support in June 2025, when a federal court found fair use in a similar lawsuit brought by authors against AI companies.
But this complaint was drafted specifically to distinguish itself from that ruling. A training-data fair use argument depends in part on whether the data was lawfully obtained. If the source is a pirate repository, that threshold is in question before the transformative use analysis even begins. The plaintiffs are betting that piracy changes the equation. Meta's legal team will argue otherwise. Neither side will reach a verdict quickly.
WHAT WRITERS CAN TAKE FROM THIS • This case focuses on academic and professional publishing, but the legal precedent will extend to fiction and trade. • Zuckerberg's personal authorization matters because it forecloses the "we did not know" defense that companies typically use. • The piracy angle is legally distinct from a standard training-data fair use argument. LibGen is an unauthorized source. • The Authors Guild recommends adding AI opt-out language to all new publishing contracts. • The June 2025 fair use ruling in a prior authors' suit is not the final word. This case was built to challenge it. |
The resolution of this case will take years. What it does not answer is the immediate question of which tools you trust with your unpublished work right now. A platform that keeps smart snapshots of every draft, lets you restore any version, and has no AI pipeline attached to your content gives you something no lawsuit can. WriteO's Version History is built exactly for that.
Sources: Publishers Weekly, The Authors Guild (authorsguild.org), Variety


