Technical Layer: Get Found by ChatGPT

Start with the uncomfortable truth

You can’t “optimize your way into” ChatGPT if you’re blocking access.

When ChatGPT uses search, it’s still retrieving pages from the web. If your content isn’t fetchable, it can’t be quoted, cited, or linked.

Know the bots: OAI-SearchBot vs GPTBot vs ChatGPT-User

Diagram explaining crawlability, schema markup, and structured signals

OpenAI documents three different user agents with different purposes. [1]

  1. OAI-SearchBot: used to surface websites in ChatGPT’s search features. [1]
  2. GPTBot: used to crawl content that may be used for training foundation models. [1]
  3. ChatGPT-User: used for certain user-initiated actions; robots.txt rules may not apply because actions are initiated by a user. [1]

Don’t treat these as one switch. They’re different levers.

See Also: How to Show Up in ChatGPT: Entity Signals, Content Structure & Citations

Robots.txt: the fastest way to accidentally disappear

OpenAI states that to be included in ChatGPT Search results, it’s important to allow OAI-SearchBot to crawl your site. [2]

And the Publishers and Developers FAQ is even more direct: if you want your content included in summaries and snippets, don’t block OAI-SearchBot. [3]

A simple starting point for many sites:

robots.txt example (allow search, disallow training):

User-agent: OAI-SearchBot

Disallow:

User-agent: GPTBot

Disallow: /

Note: your needs may differ. But the point stands: don’t block the bot you want to be discovered by.

Indexing controls: robots.txt is not noindex

Here’s a subtle but critical difference:

  • robots.txt controls crawling (access).
  • noindex controls indexing (whether a page should appear in results).

OpenAI’s publisher guidance notes that if you don’t want a page surfaced, you should use the noindex meta tag – but the crawler must be allowed to crawl the page to read that meta tag. [3]

Google’s documentation makes the same point: a noindex directive is implemented with a meta tag or header and requires a crawler to see it. [4][5]

In plain English: if you block crawling, you may lose control over how (or whether) indexing directives are applied by systems that need to read them.

See Also : Authority Layer: PR, Co-Citations, Reviews & Third-Party Validation

Schema: not a magic wand – an identity handshake

Illustration of how crawlability and structured data support AI visibility

Schema markup doesn’t force a recommendation.

But it does reduce confusion. And confusion is the enemy of accurate answers.

Start with Organization schema on the homepage. Google explicitly frames Organization structured data as a way to disambiguate your organization. [6]

Then use sameAs links to point to authoritative profiles. Schema.org defines sameAs as a URL that unambiguously indicates identity. [7]

See also : AI-Readable SEO: Schema and Technical Signals

Minimum viable schema stack

  1. Organization (homepage): legal name, logo, URL, contact point, sameAs, address, @id. [6][7]
  2. Person (leadership + authors): consistent bios and stable titles.
  3. WebSite (site-level identity): especially if you have search and site name preferences.
  4. Article (blog posts): author, datePublished, dateModified, mainEntityOfPage.
  5. FAQPage (when you actually have Q&A content).
  6. BreadcrumbList (to clarify hierarchy and reinforce internal linking).

Structured signals that aren’t schema (but still matter)

Machines don’t just read JSON-LD. They read your structure.

  • Clean information architecture: category pages, not a graveyard of orphan posts.
  • Consistent headings: one H1, logical H2s, short sections.
  • Internal links that reinforce entity relationships (products -> use cases -> comparisons).
  • Fast, stable pages: if your site times out, a bot can’t retrieve it.

This is why “AI SEO” often looks suspiciously like “good SEO.” The fundamentals still pay rent.

Test like a grown-up

  • Check server logs for OAI-SearchBot and GPTBot hits (and verify with IP ranges where possible). [1]
  • Validate your schema with structured data testing tools (and fix warnings that indicate ambiguity).
  • Spot-check your most important pages with “view source” – is the core content present without executing 3MB of JavaScript?

About The Author