Reddit sues Anthropic over AI data scraping

Introduction to the Lawsuit

Reddit is accusing Anthropic of building its Claude AI models on the back of Reddit’s users, without permission and without paying for it. Anyone who uses Reddit, even a web-crawling bot, agrees to the site’s user agreement. That agreement is clear: you cannot just take content from the site and use it for your own commercial products without a written deal. Reddit claims Anthropic’s bots have been doing exactly that for years, scraping massive amounts of conversations and posts to train and improve Claude.

The Allegations Against Anthropic

What makes this lawsuit particularly spicy is the way it goes after Anthropic’s reputation. Anthropic has worked hard to brand itself as the ethical, trustworthy AI company, the “white knight” of the industry. The lawsuit, however, calls these claims nothing more than “empty marketing gimmicks”. For instance, Reddit points to a statement from July 2024 where Anthropic claimed it had stopped its bots from crawling Reddit. The lawsuit says this was “false”, alleging that its logs caught Anthropic’s bots trying to access the site more than one hundred thousand times in the following months.

User Privacy Concerns

But this isn’t just about corporate squabbles; it directly involves user privacy. When you delete a post or a comment on Reddit, you expect it to be gone. Reddit has official licensing deals with other big AI players like Google and OpenAI, and these deals include technical measures to ensure that when a user deletes content, the AI company does too. According to Reddit’s lawsuit, Anthropic has no such deal and has refused to enter one. This means if their AI was trained on a post you later deleted, that content could still be baked into Claude’s knowledge base, effectively ignoring your choice to remove it.

Evidence Against Anthropic

The lawsuit even includes a screenshot where Claude itself admits it has no real way of knowing if the Reddit data it was trained on was later deleted by a user. This raises serious concerns about how Anthropic handles user data and whether it respects users’ decisions to delete their content.

What Reddit Wants

So, what does Reddit want? It’s not just about money, although they are asking for damages for things like increased server costs and lost licensing fees. They are asking the court for an injunction to force Anthropic to stop using any Reddit data immediately. Furthermore, Reddit wants to prohibit Anthropic from selling or licensing any product that was built using that data. That means they’re asking a judge to effectively take Claude off the market.

The Broader Implications

This case forces a tough question: Does being “publicly available” on the internet mean content is free for any corporation to take and monetise? Reddit is arguing a firm “no,” and the outcome could change the rules for how AI is developed from here on out. The case has significant implications for the future of AI development and how companies use user-generated content.

Conclusion

In conclusion, the lawsuit between Reddit and Anthropic raises important questions about user privacy, data ownership, and the ethics of AI development. The outcome of this case could have far-reaching implications for the tech industry and how companies use user-generated content. As AI continues to evolve and become more integrated into our lives, it’s essential to establish clear guidelines and regulations around data use and ownership.

FAQs

Q: What is the lawsuit between Reddit and Anthropic about?
A: The lawsuit is about Anthropic allegedly using Reddit data without permission to train its Claude AI models.
Q: What does Reddit want from the lawsuit?
A: Reddit wants Anthropic to stop using any Reddit data, pay damages, and be prohibited from selling or licensing any product built using Reddit data.
Q: Why is this lawsuit important?
A: The lawsuit raises important questions about user privacy, data ownership, and the ethics of AI development, with potential implications for the future of AI and how companies use user-generated content.