How to Check if ChatGPT Can See Your Website (and Fix It if It Can't)
Most sites are accidentally invisible to AI answer engines. Here's how to test whether ChatGPT can actually fetch your pages — and the three-line fixes when it can't.
The problem nobody told you about
You spent months on SEO. Then ChatGPT started answering the questions your customers used to Google — and you found out you weren't in the answer. Worse: you might be blocking the very crawler that decides whether you get cited.
ChatGPT reaches your content two ways: GPTBot (the crawler that builds the training/retrieval index) and ChatGPT-User (the live fetch when a user's prompt needs your page right now). Block either and you vanish.
Step 1: Check your robots.txt
Open https://yourdomain.com/robots.txt and look for these agents:
# This BLOCKS ChatGPT — bad if you want citations
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
A surprising number of Next.js and WordPress templates ship a blanket Disallow: / for "AI bots" by default. To explicitly allow the ones you care about:
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
Note the three different OpenAI agents: GPTBot builds the model's knowledge, OAI-SearchBot powers ChatGPT search results, and ChatGPT-User is the on-demand fetch when a live prompt needs your page. You generally want all three. Blocking one but not the others is a common, confusing half-fix.
Step 2: Simulate the fetch
robots.txt is advisory; the real test is whether the page returns crawlable HTML. Curl as the bot:
curl -A "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)" \
-s -o /dev/null -w "%{http_code}\n" https://yourdomain.com
A 200 is good. A 403 means your CDN or WAF (Cloudflare's "Block AI Bots" toggle is a common culprit) is rejecting the agent before it ever reaches your app.
Step 3: Make sure there's HTML to read
The biggest silent killer is client-side rendering. GPTBot does not run JavaScript reliably. If your content only appears after hydration, the bot sees an empty shell. Check what's actually in the raw document:
curl -s https://yourdomain.com | grep -i "<h1\|<article\|<main"
If your headline isn't in there, neither answer engine can quote it.
Fixes for the empty-shell problem
- Use server-side rendering or static generation for content pages. In Next.js App Router, that's the default for Server Components — don't push your content behind a
"use client"boundary. - Render the meaningful text in the initial HTML response, not after a
useEffectfetch. - Avoid gating content behind a cookie wall, infinite-scroll, or "click to load."
A fast way to confirm what a non-JS client sees is to strip your own response down to text:
curl -s https://yourdomain.com | sed 's/<[^>]*>//g' | tr -s ' \n' | head -40
If your actual article copy shows up there, the answer engine can read it. If you get a near-empty result or just Loading..., your content is trapped behind JavaScript and you need server rendering.
Step 4: Give the answer engine something quotable
AI engines extract passages, not pages. Help them:
- Put a direct, one-sentence answer in the first paragraph under each
##heading. - Use real semantic HTML —
<article>,<h1>–<h3>,<ul>— not a soup of<div>s. - Add
ArticleorFAQPageJSON-LD structured data so the parser knows what's what.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "Can ChatGPT see my website?",
"acceptedAnswer": { "@type": "Answer", "text": "Yes, if GPTBot is allowed in robots.txt and the page serves crawlable HTML." }
}]
}
Quick checklist
-
robots.txtallowsGPTBotandChatGPT-User - CDN/WAF isn't returning
403to those agents - Core content is in the server-rendered HTML
- Headings and structured data are present
Scan it with Troja
Troja's AEO scanner runs the per-engine access check for you — GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot and Google-Extended — and tells you exactly which assistants can read your site and which are getting a 403. Point it at a URL and find out in 30 seconds.
Run the scan this post is about.
Free, no signup. See what's hiding inside your walls in ~30 seconds.