FAQ
If AI platforms are not citing your content, the cause is usually one of several access or indexing issues. Cloudflare bot blocking, robots.txt rules, JavaScript rendering, login walls, and noindex tags are the most common culprits.
There is an important distinction between AI crawlers not being able to reach your site and AI crawlers reaching your site but choosing not to cite it. The issues below address access problems. If crawlers can reach your content but you are still not appearing in AI responses, the issue is more likely content quality, entity authority, or how AI platforms evaluate your brand overall.
Cloudflare Bot Fight Mode and WAF rules
Cloudflare’s Bot Fight Mode and Super Bot Fight Mode classify many AI crawlers as automated bots and respond with a JavaScript challenge or a 403 error instead of your page content. AI crawlers that cannot execute JavaScript fail the challenge silently and move on. If you have Bot Fight Mode enabled, check whether it is set to challenge “definitely automated” bots or all bots. You can create a custom WAF rule in Cloudflare to explicitly Allow specific user agents like OAI-SearchBot, PerplexityBot, and ClaudeBot while still challenging others.
Even without Bot Fight Mode, Cloudflare WAF rules, rate limiting, and IP reputation filtering can block AI crawlers. Check your Firewall Events log in the Cloudflare dashboard and filter by the relevant user agent strings to see whether requests from AI crawlers are being blocked and under which rule.
robots.txt blocking AI user agents
Check your robots.txt file at yourdomain.com/robots.txt for Disallow rules targeting GPTBot, CCBot, ClaudeBot, PerplexityBot, OAI-SearchBot, or a wildcard User-agent: * that disallows everything. A blanket User-agent: * with Disallow: / blocks all crawlers including search engines and AI bots.
There is also a meaningful distinction between training crawlers and retrieval crawlers. Training crawlers like GPTBot and CCBot collect content for model training. Retrieval crawlers like OAI-SearchBot, PerplexityBot, and ClaudeBot index content to serve in AI-generated answers. Blocking retrieval crawlers in robots.txt removes your content from those platforms’ answers, which may not be the intended result. See How to Structure a Page for Both SEO and AEO for a robots.txt template that distinguishes between the two types.
JavaScript rendering and single-page applications
Most AI crawlers do not execute JavaScript when fetching pages. If your site is a single-page application built with React, Vue, Angular, or a similar framework, crawlers may receive an empty HTML shell with little or no visible text. The crawler sees nothing worth indexing and moves on.
The fix is server-side rendering (SSR) or static site generation (SSG), which delivers fully rendered HTML in the initial response without requiring JavaScript execution. Alternatively, a pre-rendering service can serve a rendered HTML snapshot to non-browser user agents while serving the normal JavaScript bundle to browsers.
Login walls, paywalls, and cookie gates
AI crawlers cannot authenticate. Any content behind a login page, a paywall, or a cookie consent gate that hides content until the user clicks Accept is invisible to all crawlers. If your most valuable content is gated, it will not appear in AI-generated responses.
The only way to make gated content visible to AI systems is to expose an ungated summary or abstract, or to remove the gate on specific pages you want indexed. Some publishers use a “first click free” approach where crawlers receive the full content but returning visitors hit the gate.
noindex meta tags
A page with <meta name="robots" content="noindex"> is excluded from Google’s index and therefore cannot appear in Google AI Overviews, which are built on the same underlying index as traditional search. For ChatGPT, Perplexity, and Claude, the effect depends on whether those platforms rely on Google’s index or maintain their own crawl. Some AI platforms honor noindex as a publisher intent signal even if they have their own crawler.
Check whether noindex tags are present on pages you want AI systems to read. A common mistake is applying noindex site-wide during development and not removing it after launch, or including it in a template that gets applied to pages that should be indexed.
Site too new or not yet indexed
AI crawlers that rely on Google’s index cannot see pages Google has not yet crawled. New domains and new pages typically take days to weeks to appear, and some low-authority pages take longer. Submit URLs through Google Search Console to accelerate discovery. For AI crawlers that maintain their own index, re-crawl cycles may be less frequent than Googlebot’s, meaning new content takes longer to appear in AI responses than in traditional search results.
IP-based rate limiting or geo-blocking
Some sites apply aggressive rate limiting that triggers on the high-frequency crawling patterns AI bots use, or apply geo-restrictions that block IP ranges used by AI crawler infrastructure. Server-level rate limiting (not just Cloudflare) and hosting provider firewall rules can also interfere. If you see an unusual number of 429 or 403 responses in your server logs from unfamiliar user agent strings, this is likely the cause.
Crawlers can reach the site but content is not being cited
This is a different problem from access blocking. If AI crawlers can reach your content but your brand is not appearing in AI-generated responses, the issue is usually content quality, entity authority, or topical coverage rather than a technical access problem. AI platforms cite sources they consider credible, relevant, and specific to the query. Thin content, generic information available everywhere, and pages with no clear entity signals are less likely to be cited even when crawlers can read them.
A free AI brand visibility audit will show you exactly how ChatGPT, Claude, Gemini, and Perplexity currently describe your business and whether the barrier is technical access or content authority.
Key sources: RFC 9309, “Robots Exclusion Protocol” (IETF, September 2022); Cloudflare, Bot Fight Mode and WAF documentation; Cloudflare Radar AI crawler traffic analysis (May 2026); Google Search Central, robots.txt specification.
No credit card required
Free audit across ChatGPT, Claude, Gemini, and Perplexity. Results emailed within a few hours.
Run my free audit