Meta AI Model Reproduces Almost Half of Harry Potter Book

Introduction to the Legal Battle

The Google Books precedent probably can’t protect Meta against this second legal theory because Google never made its books database available for users to download—Google almost certainly would have lost the case if it had done that. In principle, Meta could still convince a judge that copying 42 percent of Harry Potter was allowed under the flexible, judge-made doctrine of fair use. But it would be an uphill battle.

The Fair Use Analysis

“The fair use analysis you’ve gotta do is not just ‘is the training set fair use,’ but ‘is the incorporation in the model fair use?’" Lemley said. "That complicates the defendants’ story.” This means that the court will have to consider not only whether the training data used to create the model was fair use, but also whether the model’s output is fair use.

The Danger of Open-Weight Models

Grimmelmann also said there’s a danger that this research could put open-weight models in greater legal jeopardy than closed-weight ones. The Cornell and Stanford researchers could only do their work because the authors had access to the underlying model—and hence to the token probability values that allowed efficient calculation of probabilities for sequences of tokens. Most leading labs, including OpenAI, Anthropic, and Google, have increasingly restricted access to these so-called logits, making it more difficult to study these models.

The Impact of Closed-Weight Models

Moreover, if a company keeps model weights on its own servers, it can use filters to try to prevent infringing output from reaching the outside world. So even if the underlying OpenAI, Anthropic, and Google models have memorized copyrighted works in the same way as Llama 3.1 70B, it might be difficult for anyone outside the company to prove it. Moreover, this kind of filtering makes it easier for companies with closed-weight models to invoke the Google Books precedent.

The Perverse Outcome

“It’s kind of perverse,” Mark Lemley told me. “I don’t like that outcome.” This is because copyright law might create a strong disincentive for companies to release open-weight models. On the other hand, judges might conclude that it would be bad to effectively punish companies for publishing open-weight models. “There’s a degree to which being open and sharing weights is a kind of public service,” Grimmelmann told me. “I could honestly see judges being less skeptical of Meta and others who provide open-weight models.”

Conclusion

The legal battle surrounding Meta’s use of copyrighted material in its AI model is complex and ongoing. While the Google Books precedent may not provide a clear defense, the company may still be able to argue that its use of the material is fair use. However, the fact that the model is open-weight may put it at greater risk of legal jeopardy. Ultimately, the outcome of this case will have significant implications for the development of AI and the use of copyrighted material in machine learning models.

FAQs

What is the Google Books precedent and how does it relate to Meta’s case?
The Google Books precedent is a legal decision that allowed Google to digitize and make available snippets of copyrighted books without obtaining permission from the copyright holders. However, this precedent may not apply to Meta’s case because Google did not make its books database available for users to download.
What is the difference between open-weight and closed-weight models?
Open-weight models make the underlying model and its weights available to the public, while closed-weight models keep this information proprietary. This can affect the ability of researchers to study and understand the models, as well as the company’s ability to invoke the Google Books precedent.
What is the potential outcome of this case and how will it impact the development of AI?
The outcome of this case will have significant implications for the development of AI and the use of copyrighted material in machine learning models. If the court rules in favor of the copyright holders, it could create a strong disincentive for companies to release open-weight models, which could stifle innovation and progress in the field.