Help & Guide
Making your site ready for AI agents
A practical guide to the files, markup, and habits that help AI agents read, understand, and safely act on your site — and the same checks AgentLitmus runs on every scan.
Last updated .
What is agent readiness and why it matters
Agent readiness describes how easily an AI agent — a search crawler, a chatbot answering questions about your business, or an autonomous tool booking, buying, or researching on a user's behalf — can fetch your pages, extract meaningful content, and discover the guides you've published for it.
Three things make the biggest difference:
- Readable without JavaScript. Most agents fetch raw HTML and never run your client-side scripts.
- Discoverable guides. Files like llms.txt and AGENTS.md tell an agent where to look instead of making it guess.
- Explicit access policy. A robots.txt that names AI crawlers removes the guesswork about whether they're welcome.
Run a scan from the AgentLitmus homepage to see how your site measures up against all of this, and see the full methodology for exactly how each signal is scored.
How to write a good llms.txt
llms.txt is a plain-text file at your site root that points AI agents to the pages that matter most. The convention is simple: a title heading, a short summary, and a list of markdown links.
A minimal example
Place this at /llms.txt:
# Acme Tools
> Acme Tools sells hand tools and accessories for woodworking and home repair.
## Key pages
- [Shop](https://acme.example/shop): Browse our full catalog.
- [Returns policy](https://acme.example/returns): How returns and refunds work.
- [Contact](https://acme.example/contact): Support hours and contact form.What earns full marks
- The file exists and returns a real body, not your app's HTML shell.
- It starts with a
# Titleheading and includes at least one markdown link. - Every link resolves to your own domain — agents trust llms.txt more when it doesn't send them off-site.
Once you've scanned your site, the report page can generate an llms.txt draft tailored to what it found — start from the homepage scan form.
robots.txt for AI agents
robots.txt is the first file most crawlers — including AI agents — check before fetching anything else. A missing file, or a blanket rule that disallows everyone, leaves AI crawlers unsure whether they're welcome.
Known AI agent user-agents
Some of the crawlers worth addressing explicitly:
| User-agent | Operator |
|---|---|
| GPTBot | OpenAI — training data crawler |
| ChatGPT-User | OpenAI — fetches pages a user asks ChatGPT about |
| ClaudeBot | Anthropic — training data crawler |
| Google-Extended | Google — controls use in Gemini / AI features |
| CCBot | Common Crawl — widely used as AI training data |
| PerplexityBot | Perplexity — search and answer crawler |
Example: allow AI agents, block one
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: CCBot
Disallow: /
Sitemap: https://acme.example/sitemap.xmlThe key rule: avoid Disallow: / under User-agent: * unless you genuinely want to block every crawler, then list any AI agents you want to treat differently by name.
A scan of your site can generate a robots.txt draft with these AI agent rules pre-filled — start from the homepage scan form.
AGENTS.md
AGENTS.md is a markdown file at your site root aimed at AI coding agents working on your codebase — think of it as a README written for a tool instead of a person. It describes how to set up the project, run tests, and which conventions to follow.
A minimal structure
# AGENTS.md
## Setup
Run `npm install` then `npm run dev`.
## Testing
Run `npm test` before committing. All tests must pass.
## Conventions
- TypeScript strict mode, no implicit any.
- Prefer editing existing files over creating new ones.The scanner looks for the file to exist and to contain at least two markdown headings — so even a short file organized into a couple of sections (like ## Setup and ## Testing above) earns full marks. See this site's own AGENTS.md for a real example.
Structured data & semantic HTML basics for agents
Structured data and semantic markup let an agent understand the role of each part of your page — what's a heading, what's a navigation menu, and what kind of entity the page describes — without guessing from visual layout.
JSON-LD structured data
Add a <script type="application/ld+json"> block declaring a recognized schema.org type, for example:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Acme Tools",
"url": "https://acme.example"
}
</script>Common recognized types include Organization, Product, Article, and WebSite.
Semantic HTML checklist
- Exactly one
<h1>per page. - Heading levels in order — don't jump from an
<h2>straight to an<h4>. - Landmark elements:
<main>,<nav>,<header>, and<footer>. - Real lists and tables —
<ul>,<ol>,<table>— instead of stacks of unstyled<div>elements. - Descriptive link text. A link that reads "our pricing page" tells an agent what's on the other end; a link that just reads "click here" doesn't.
Avoiding the "SPA shell" problem
Many modern frameworks ship a near-empty HTML document and build the page entirely with client-side JavaScript — something like <div id="root"></div>. A browser fills that in instantly, so a human visitor never notices. Most AI agents fetch the raw HTML and stop there — to them, that page is blank.
What agents look for
- A meaningful amount of visible text already present in the HTML response — not just a shell div.
- A healthy ratio of visible text to total HTML size, so the page isn't mostly script tags and empty markup.
- Signs of server-side rendering, such as a substantial server-rendered body or framework markers that indicate the page was rendered before it was sent.
How to fix it
Render your pages on the server — with server-side rendering or static generation — so the initial HTML already contains your headings, body text, and links. This page is itself server-rendered for exactly that reason: everything you're reading right now is present in the first response, with no JavaScript required.
Common adversarial & safety mistakes
Because agents read your raw HTML, anything hidden from human visitors with CSS is still fully visible to them. That mismatch is exactly what AgentLitmus's Adversarial Safety score checks for — separately from your letter grade, on every report.
What gets flagged
- Hidden text with instruction-like phrasing. A block of text set to zero opacity, zero font size, or moved off-screen that addresses an AI reader directly — for example, telling it to abandon the guidance it was given earlier, take on a different persona, or keep something secret from the person it's helping.
- Concealed links. An
<a href>hidden from sighted users (via its own styling or a hidden parent element) pointing to an off-site address or a script-scheme target — something an agent following links in the DOM might visit even though no visitor ever could. - Invisible form actions. A hidden
<form>that submits to an off-site destination. - Injection payloads in llms.txt or AGENTS.md. Since agents are told to trust these files, planting manipulative phrasing in them is especially risky.
How to stay clean
Don't use CSS to hide text or links that contain real instructions, tracking pixels disguised as links, or off-site redirects — if it's in your HTML, assume an agent will read it. If you need to visually hide something, make sure it's genuinely inert (no addressed-to-an-agent phrasing, no live links) or remove it from the DOM entirely. This page itself produces zero Adversarial Safety findings — there's nothing in this HTML that isn't also visible to you right now.
Scan your own site from the homepage to see your Adversarial Safety score, or read the methodology page for how every score is calculated.