🚨 Cloudflare: Perplexity bots are scraping websites despite owner restrictions

cryptoenthusiast

leonardo.osnova.webp
Cloudflare has reported that web crawlers from the AI search startup Perplexity are bypassing website restrictions — even when explicitly blocked by site owners.

️‍️ Here’s what’s happening:

Since July 1, 2025, Cloudflare began automatically blocking AI crawlers on customer websites. But many site admins noticed that Perplexity bots were still getting through, despite being denied access via robots.txt and Web Application Firewall (WAF) settings.

Upon investigation, Cloudflare discovered that:

🔍 Perplexity disguises its bots as real users — by spoofing browser headers like Chrome on macOS.

📶 They rotate IP addresses and ASN identifiers, allowing them to operate outside known ranges.

🐢 When disguised, the bots slow down crawling speed — from 20–25 million requests per day to 3–6 million — making them harder to detect.

🧩 If blocked completely, Perplexity tries to reconstruct page data from third-party sources, even if those sources are outdated or inaccurate.

️ The good news:

Cloudflare has rolled out new protections against stealth crawlers — even on free-tier plans. Users just need to enable the feature in their dashboard.

Also worth noting: ChatGPT bots from OpenAI were found to respect website rules and don’t violate crawling policies, Cloudflare confirmed.

Cloudflare reminds AI crawler operators:

Stay transparent, ethical, and responsible.
🔒 Don’t overload websites
🙅‍♂️ Don’t harvest personal data
🏷️ Always identify your bot clearly.

Bottom line: If you're running a site and want to protect your content from unauthorized AI scraping, now’s the time to double-check your Cloudflare settings. The AI web crawler war is heating up — and staying one step ahead is key.

Rimon Khan

This is a huge red flag for web transparency and control. If site owners are explicitly blocking crawlers via robots.txt or other means — and those instructions are still being bypassed — that’s not just a tech glitch, it’s a trust issue.Cloudflare’s involvement makes it even more complex, because they’re often seen as the protectors of web infrastructure. If platforms like Perplexity are getting through, intentionally or not, it raises serious questions about consent, enforcement, and the future of content ownership in the age of AI. ️🧠

Nahid Hossen

Scraping the web is nothing new, but ignoring explicit opt-outs is where it crosses a line. If bots are bypassing standard blocks, that’s not “indexing” — that’s digital trespassing.The irony is that these AI models depend on the open web, yet risk poisoning that same ecosystem by overreaching. Platforms need to be held accountable before this becomes the norm, not the exception. Respect to the post for calling this out — these are the conversations we need to have now, not later. ️

Earn up to 50 UDS per post

Spin your Wheel of Fortune!

Paired Staking

Buy UDS!

INFLUENCER LEVEL

MULTIPLIER

Post links to Undeads Forum messages or Undeads products to receive additional rewards

🚨 Cloudflare: Perplexity bots are scraping websites despite owner restrictions