AEOJun 5, 2026·5 min read

How to Check if ChatGPT Can See Your Website (and Fix It if It Can't)

Most sites are accidentally invisible to AI answer engines. Here's how to test whether ChatGPT can actually fetch your pages — and the three-line fixes when it can't.

By The Troja Team

The problem nobody told you about

You spent months on SEO. Then ChatGPT started answering the questions your customers used to Google — and you found out you weren't in the answer. Worse: you might be blocking the very crawler that decides whether you get cited.

ChatGPT reaches your content two ways: GPTBot (the crawler that builds the training/retrieval index) and ChatGPT-User (the live fetch when a user's prompt needs your page right now). Block either and you vanish.

Step 1: Check your robots.txt

Open https://yourdomain.com/robots.txt and look for these agents:

# This BLOCKS ChatGPT — bad if you want citations
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

A surprising number of Next.js and WordPress templates ship a blanket Disallow: / for "AI bots" by default. To explicitly allow the ones you care about:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Note the three different OpenAI agents: GPTBot builds the model's knowledge, OAI-SearchBot powers ChatGPT search results, and ChatGPT-User is the on-demand fetch when a live prompt needs your page. You generally want all three. Blocking one but not the others is a common, confusing half-fix.

Step 2: Simulate the fetch

robots.txt is advisory; the real test is whether the page returns crawlable HTML. Curl as the bot:

curl -A "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)" \
  -s -o /dev/null -w "%{http_code}\n" https://yourdomain.com

A 200 is good. A 403 means your CDN or WAF (Cloudflare's "Block AI Bots" toggle is a common culprit) is rejecting the agent before it ever reaches your app.

Step 3: Make sure there's HTML to read

The biggest silent killer is client-side rendering. GPTBot does not run JavaScript reliably. If your content only appears after hydration, the bot sees an empty shell. Check what's actually in the raw document:

curl -s https://yourdomain.com | grep -i "<h1\|<article\|<main"

If your headline isn't in there, neither answer engine can quote it.

Fixes for the empty-shell problem

Use server-side rendering or static generation for content pages. In Next.js App Router, that's the default for Server Components — don't push your content behind a "use client" boundary.
Render the meaningful text in the initial HTML response, not after a useEffect fetch.
Avoid gating content behind a cookie wall, infinite-scroll, or "click to load."

A fast way to confirm what a non-JS client sees is to strip your own response down to text:

curl -s https://yourdomain.com | sed 's/<[^>]*>//g' | tr -s ' \n' | head -40

If your actual article copy shows up there, the answer engine can read it. If you get a near-empty result or just Loading..., your content is trapped behind JavaScript and you need server rendering.

Step 4: Give the answer engine something quotable

AI engines extract passages, not pages. Help them:

Put a direct, one-sentence answer in the first paragraph under each ## heading.
Use real semantic HTML — <article>, <h1>–<h3>, <ul> — not a soup of <div>s.
Add Article or FAQPage JSON-LD structured data so the parser knows what's what.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "Can ChatGPT see my website?",
    "acceptedAnswer": { "@type": "Answer", "text": "Yes, if GPTBot is allowed in robots.txt and the page serves crawlable HTML." }
  }]
}

Quick checklist

robots.txt allows GPTBot and ChatGPT-User
CDN/WAF isn't returning 403 to those agents
Core content is in the server-rendered HTML
Headings and structured data are present

Scan it with Troja

Troja's AEO scanner runs the per-engine access check for you — GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot and Google-Extended — and tells you exactly which assistants can read your site and which are getting a 403. Point it at a URL and find out in 30 seconds.

Run the scan this post is about.

Free, no signup. See what's hiding inside your walls in ~30 seconds.

Keep reading

All posts

AEOJun 5, 2026·7 min

What Is AEO? Answer Engine Optimization, Explained for 2026

AEO is the discipline of getting cited inside AI answers instead of just ranked on a results page. Here's how answer engines actually read your site — and how to be the source they quote.

Read