December 22, 2025
Tech

Perplexity Responds to Reddit Lawsuit Over Data Access

  • October 24, 2025
  • 0

AI search startup Perplexity has formally responded to a lawsuit filed by Reddit in New York federal court, denying allegations that it illegally accessed Reddit’s content and bypassed

Perplexity Responds to Reddit Lawsuit Over Data Access

AI search startup Perplexity has formally responded to a lawsuit filed by Reddit in New York federal court, denying allegations that it illegally accessed Reddit’s content and bypassed technical protections to obtain large volumes of data.

The case raises major questions about how AI assistants source, summarize, and cite online content — and could set a precedent for how generative AI companies interact with community-driven platforms.


The Lawsuit: Reddit vs. Perplexity and Data Partners

Reddit’s lawsuit names Perplexity AI alongside three data-scraping firms — Oxylabs UAB, AWMProxy, and SerpApi — accusing them of circumventing access controls to gather Reddit content at scale.

The complaint alleges that:

  • Perplexity purchased or used services from SerpApi and others to extract data,
  • The scraping allegedly bypassed Reddit’s protections, and
  • Some data was obtained indirectly via Google search results.

Reddit claims this activity violated its terms of service and technical barriers, describing it as a “systematic effort to evade access restrictions.”


Perplexity’s Response: “We Summarize With Citations”

In a public statement, Perplexity defended its practices, saying it merely summarizes Reddit discussions with proper citations and does not train its AI models on Reddit content.

“We summarize Reddit discussions, and we cite Reddit threads in answers — just like people share links to posts all the time,”
the company wrote in its response.

Perplexity’s defense hinges on a key distinction: it claims to generate summaries using available data, not to copy or reproduce Reddit’s content for training or commercial reuse.

However, Reddit’s filing alleges that the AI company’s behavior went beyond simple summarization, involving direct data access through unauthorized means.


Reddit’s Allegations: Evidence From a Hidden Test Post

Reddit’s legal filing includes a technical experiment designed to prove unauthorized scraping.

According to the complaint:

  • Reddit created a hidden test post that was visible only to Google’s crawler and not accessible publicly.
  • Within hours, the content from that hidden post appeared in Perplexity’s results, suggesting data access via scraping or API circumvention.

The lawsuit also claims that after Reddit issued a cease-and-desist letter, Perplexity’s citations to Reddit increased forty-fold, implying continued data access despite prior warnings.


Pattern of Allegations from Publishers

Reddit’s lawsuit follows similar accusations from media organizations in recent months:

  • Forbes accused Perplexity of republishing an exclusive story and threatened legal action.
  • Wired reported that Perplexity allegedly used undisclosed IPs and spoofed user-agent strings to bypass robots.txt restrictions, based on internal technical logs.
  • Cloudflare, a web infrastructure provider, said its tests indicated that Perplexity used undeclared crawlers that ignored “no-crawl” directives.

These incidents collectively highlight an industry-wide gray area over how AI assistants collect and process online data.


Perplexity’s Broader Position: Transparency and Negotiation

In previous disputes, Perplexity attributed such issues to “rough edges” on early products, emphasizing that it has since improved attribution clarity and added citations in responses.

The company has also pushed back against media licensing demands, arguing that some organizations are attempting to “control publicly reported facts.”

In its latest statement regarding Reddit, Perplexity suggested the lawsuit was part of broader negotiations around AI training data, framing Reddit’s legal action as a pressure tactic.

“We summarize Reddit discussions… We won’t be extorted, and we won’t help Reddit extort Google,”
Perplexity said.

This marks a notably defiant tone compared to its previous conciliatory messaging, signaling growing tension between AI startups and content platforms seeking compensation for data usage.


Why This Lawsuit Matters

The implications of Reddit’s case against Perplexity extend beyond just these two companies. It strikes at the core of how AI assistants operate — particularly when their value lies in summarizing web-based conversations and articles.

Key legal questions the courts will likely consider:

  • Did Perplexity or its partners bypass technical access controls (e.g., APIs, robots.txt, or authentication barriers)?
  • Does summarizing protected content constitute infringement, even with citations?
  • Can AI companies be held responsible for actions taken by third-party data providers (like SerpApi or Oxylabs)?
  • Is forum content considered “public data” for fair use under AI summarization?

Depending on the outcome, this case could reshape how AI systems are trained, how they cite or link to user-generated content, and what licensing frameworks become necessary in the future.


What We Know — And What We Don’t

Known Facts:

  • Reddit’s lawsuit was filed in a New York federal court.
  • It names Perplexity AI, Oxylabs UAB, AWMProxy, and SerpApi.
  • The case alleges bypassing of technical barriers and unauthorized content scraping.
  • Perplexity maintains that it summarizes, not scrapes, Reddit data.
  • The company has denied training its AI models on Reddit posts.

Unknowns:

  • Which specific vendors provided Reddit data to Perplexity.
  • Whether Perplexity directly orchestrated scraping or relied entirely on partners.
  • How much of Reddit’s content was actually processed or stored by Perplexity’s systems.
  • Whether the case will lead to injunctions, settlements, or long-term data-sharing agreements.

Industry Context: Reddit’s Broader Licensing Push

This lawsuit follows Reddit’s growing effort to monetize access to its data, especially as large AI companies seek massive datasets for model training.

Earlier this year, Reddit signed a $60 million annual data licensing deal with Google, giving it sanctioned access to Reddit posts for AI model development.

Perplexity’s alleged actions — if proven true — could be seen as a way of accessing Reddit data outside such agreements, explaining the platform’s strong legal response.


The Bottom Line

The Reddit vs. Perplexity lawsuit is shaping up to be a pivotal case in the ongoing debate over AI data rights and fair use.

If the court sides with Reddit, AI companies may face stricter boundaries around web content access and increased pressure to license training data.
If the court favors Perplexity’s position, it could validate summarization-based AI systems that rely on publicly viewable discussions with attribution.

Either way, this legal battle underscores the growing friction between content platforms and AI developers — both vying to define the future of information on the internet.


Key Takeaways:

  • Reddit alleges Perplexity bypassed technical restrictions to access content.
  • Perplexity insists it summarizes with citations and doesn’t train on Reddit data.
  • The case could set precedent for how AI assistants handle web-based sources.
  • Broader implications extend to licensing, fair use, and content ownership in the AI era.

Leave a Reply

Your email address will not be published. Required fields are marked *