Reddit Sues Perplexity for Data Theft for AI Training

According to Reddit, Perplexity has no license to copy and use data for its AI models.

Reddit has filed a lawsuit in the United States against Perplexity, claiming the company illegally copied data from the platform to train its AI search engine. The case was filed Wednesday in a federal court in New York.

Allegations of Large-Scale Data Scraping

In the lawsuit, Reddit alleges that Perplexity and three other companies (Oxylabs, AWMProxy and SerpApi) circumvented the platform’s security measures to gain access to billions of posts on Reddit. This data was allegedly used to train Perplexity’s engine.

According to Reddit, AI companies want to deliver high-quality human content, leading them to “launder data”. Reddit says it has granted licenses to companies like Google and OpenAI, but that Perplexity had no permission to use its data.

Response from Involved Parties

Perplexity calls its approach “principled and responsible” and says it will defend itself in court. SerpApi states it “strongly disagrees” with the allegations, while Oxylabs declared being “shocked and disappointed” that Reddit never reached out for consultation.

Reddit claims that after a letter in 2024, Perplexity increased the number of Reddit references in its answers fortyfold. The platform demands a ban on further use of its data and financial compensation.

Last year, Reddit announced that search engines are also not allowed to display their content for free as results. Reddit therefore signed a deal worth sixty million dollars per year with Google. This gives Google permission to display Reddit posts in search results and train Gemini models on those posts.

Itdaily - Reddit Sues Perplexity for Data Theft for AI Training

Allegations of Large-Scale Data Scraping

Response from Involved Parties