Technical Layer: Get Found by ChatGPT

Start with the uncomfortable truth

You can’t “optimize your way into” ChatGPT if you’re blocking access.

When ChatGPT uses search, it’s still retrieving pages from the web. If your content isn’t fetchable, it can’t be quoted, cited, or linked.

Know the bots: OAI-SearchBot vs GPTBot vs ChatGPT-User

OpenAI documents three different user agents with different purposes. [1]

OAI-SearchBot: used to surface websites in ChatGPT’s search features. [1]
GPTBot: used to crawl content that may be used for training foundation models. [1]
ChatGPT-User: used for certain user-initiated actions; robots.txt rules may not apply because actions are initiated by a user. [1]

Don’t treat these as one switch. They’re different levers.

Robots.txt: the fastest way to accidentally disappear

OpenAI states that to be included in ChatGPT Search results, it’s important to allow OAI-SearchBot to crawl your site. [2]

And the Publishers and Developers FAQ is even more direct: if you want your content included in summaries and snippets, don’t block OAI-SearchBot. [3]

A simple starting point for many sites:

robots.txt example (allow search, disallow training):

User-agent: OAI-SearchBot

Disallow:

User-agent: GPTBot

Disallow: /

Note: your needs may differ. But the point stands: don’t block the bot you want to be discovered by.

Indexing controls: robots.txt is not noindex

Here’s a subtle but critical difference:

robots.txt controls crawling (access).
noindex controls indexing (whether a page should appear in results).

OpenAI’s publisher guidance notes that if you don’t want a page surfaced, you should use the noindex meta tag – but the crawler must be allowed to crawl the page to read that meta tag. [3]

Google’s documentation makes the same point: a noindex directive is implemented with a meta tag or header and requires a crawler to see it. [4][5]

In plain English: if you block crawling, you may lose control over how (or whether) indexing directives are applied by systems that need to read them.

Schema: not a magic wand – an identity handshake

Schema markup doesn’t force a recommendation.

But it does reduce confusion. And confusion is the enemy of accurate answers.

Start with Organization schema on the homepage. Google explicitly frames Organization structured data as a way to disambiguate your organization. [6]

Then use sameAs links to point to authoritative profiles. Schema.org defines sameAs as a URL that unambiguously indicates identity. [7]

Minimum viable schema stack

Organization (homepage): legal name, logo, URL, contact point, sameAs, address, @id. [6][7]
Person (leadership + authors): consistent bios and stable titles.
WebSite (site-level identity): especially if you have search and site name preferences.
Article (blog posts): author, datePublished, dateModified, mainEntityOfPage.
FAQPage (when you actually have Q&A content).
BreadcrumbList (to clarify hierarchy and reinforce internal linking).

Structured signals that aren’t schema (but still matter)

Machines don’t just read JSON-LD. They read your structure.

Clean information architecture: category pages, not a graveyard of orphan posts.
Consistent headings: one H1, logical H2s, short sections.
Internal links that reinforce entity relationships (products -> use cases -> comparisons).
Fast, stable pages: if your site times out, a bot can’t retrieve it.

This is why “AI SEO” often looks suspiciously like “good SEO.” The fundamentals still pay rent.

Test like a grown-up

Check server logs for OAI-SearchBot and GPTBot hits (and verify with IP ranges where possible). [1]
Validate your schema with structured data testing tools (and fix warnings that indicate ambiguity).
Spot-check your most important pages with “view source” – is the core content present without executing 3MB of JavaScript?

About The Author

Dave Burnett

I help people make more money online.

Over the years I’ve had lots of fun working with thousands of brands and helping them distribute millions of promotional products and implement multinational rewards and incentive programs.

Now I’m helping great marketers turn their products and services into sustainable online businesses.

How can I help you?

See author's posts

Technical Layer: Get Found by ChatGPT

Start with the uncomfortable truth

Know the bots: OAI-SearchBot vs GPTBot vs ChatGPT-User

Robots.txt: the fastest way to accidentally disappear

Indexing controls: robots.txt is not noindex

Schema: not a magic wand – an identity handshake

Minimum viable schema stack

Structured signals that aren’t schema (but still matter)

Test like a grown-up

About The Author

Dave Burnett

Categories

RECENT POSTS

5 Things We Bet You Didn’t Know ChatGPT Can Do for You

AI Isn’t Taking Your Job. Someone Who Knows How to Use AI Might.

5 Best Transcription Tools With Free Plans: Features, Pricing, Pros, and Cons

Google Ads Services That Deliver Faster Leads and Better ROI

USA

CANADA