Logoiscrawlable
  • Features
  • Pricing
  • Blog
  • Docs
Logoiscrawlable

Public AI crawler-readiness check — initial results in 5 seconds, no signup.

GitHubX (Twitter)
Product
  • Features
  • Pricing
  • FAQ
Resources
  • Blog
  • Documentation
  • Changelog
  • Roadmap
Company
  • About
  • Contact
  • Waitlist
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 iscrawlable. All Rights Reserved.

What we can and can't verify

iscrawlable runs a public, unauthenticated crawler-readiness check across robots.txt, HTTP responses, indexability headers, sitemap, llms.txt, and WAF/CDN signals. Some answers a public scan can give with high confidence. Others need a connected scan with read-only access to your CDN. This page lays out which is which.

Public scan vs connected scan

A public scan probes your site the way an external crawler would. It cannot see configuration that lives behind your CDN dashboard — for example, the Block AI Bots toggle in Cloudflare's AI Crawl Control panel, or custom WAF rules that match on attributes a public probe cannot reproduce.

A connected scan asks for a read-only API token from your CDN provider so we can read those settings directly. We use it only to read configuration, never to change it. Connected scan is a Pro feature.

User-agent simulation, not source IP

Our public scan sends requests with the published user agent strings of major AI crawlers. We do not originate from the IP ranges those crawlers actually use, and we do not impersonate verified-bot identities. Sites that gate access by source IP or by verified-bot signature may treat our probe differently from a real crawler. That is a known limit of any public crawler-readiness check.

Verified bot IP limits

OpenAI, Anthropic, Perplexity, and Google publish IP ranges for some of their crawlers. We compare the public response a site returns to our probe against the documented behavior of those crawlers, but we cannot fully replicate IP-based allow-lists from outside the perimeter. If your access policy depends on IP attestation, a connected scan is the only way to verify the rule end-to-end.

Cloudflare connected scan

If you connect Cloudflare with a read-only API token, we can additionally inspect:

  • Block AI Bots toggle state
  • AI Crawl Control rule set
  • Managed robots.txt overrides
  • Bot Fight Mode setting
  • Custom WAF rules that match AI crawler user agents

What we still cannot see, even with a Cloudflare token: rules at the origin server level (nginx / Apache / application code), and policies at any other layer in front of Cloudflare. We also do not modify any settings — this scan is read-only by contract.

Perplexity declared-crawler caveat

We only check declared Perplexity user agents and public access signals. We cannot verify stealth, third-party, or undeclared crawlers from a public scan. Perplexity's user-triggered agent (Perplexity-User) is shown as auxiliary context only and never counts against the main pass/fail.

What each result status means

Pass
Allowed by the public checks we can verify.
Fail
Blocked or disallowed by at least one public signal.
Warning
Mixed signals — one layer looks open, another looks restricted or ambiguous.
Unknown
We could not verify this from public checks.
Needs connected scan
Public checks are not enough. Connect Cloudflare read-only access to inspect WAF and AI Crawl Control settings.

Pass means crawlers appear allowed by public checks. It does not guarantee citation in ChatGPT, Claude, Perplexity, or Google AI results.