Q: What structured data and semantic HTML basics matter most for agents?

Add a JSON-LD block declaring a recognized schema.org type like Organization, Product, Article, or WebSite. Pair it with semantic HTML: exactly one , headings that don't skip levels, landmark elements like , , , and , and real , , or elements instead of div-only layouts.

Question 1

What is agent readiness and why does it matter?

Accepted Answer

Agent readiness means an AI agent — a crawler, a chatbot answering questions about your site, or an autonomous tool acting on a user's behalf — can fetch your pages, understand your content from the raw HTML, and find structured guides describing what your site offers. Agents typically don't run JavaScript and don't click around the way a person would, so if your content only exists after the page loads in a browser, most agents never see it.

Question 2

How do I write a good llms.txt file?

Accepted Answer

Publish a plain-text file at /llms.txt that starts with a '# Site Name' heading and a short summary, followed by markdown links to your most important pages. Keep the links pointing to your own domain so agents can follow them with confidence.

Question 3

How should robots.txt handle AI agents like GPTBot and ClaudeBot?

Accepted Answer

Don't use a blanket 'Disallow: /' for User-agent: *, since that blocks every crawler including AI agents. Instead, add explicit rules for the AI crawlers you want to allow or block — for example GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, CCBot, and PerplexityBot — so your policy is unambiguous.

Question 4

What is AGENTS.md and how should I structure it?

Accepted Answer

AGENTS.md is a markdown file at your site root that tells AI coding agents how to work with your project — setup steps, how to run tests, and code conventions. Structure it with at least two markdown headings, such as '## Setup' and '## Testing', so an agent can scan the file and jump to the relevant section.

Question 5

What structured data and semantic HTML basics matter most for agents?

Accepted Answer

Add a JSON-LD

Question 6

How do I avoid the SPA shell problem?

Accepted Answer

Server-render or statically generate your pages so the initial HTML response already contains your real content — headings, paragraphs, and links — rather than an empty

that JavaScript fills in later. Most agents read the raw HTML and never execute your client-side bundle, so an empty shell looks like a blank page to them.

Question 7

What are the most common adversarial and safety mistakes?

Accepted Answer

The biggest mistake is leaving text, links, or form targets in your HTML that are concealed from human visitors with CSS — zero-opacity layers, off-screen positioning, or display:none — but still readable by an agent parsing the raw DOM. If that concealed content contains instruction-like phrasing aimed at an AI reader, or a link/form pointing somewhere a visitor never sees, it can hijack an agent's behavior. AgentLitmus's Adversarial Safety score checks every scan for exactly this.

Making your site ready for AI agents

What is agent readiness and why it matters

How to write a good llms.txt

A minimal example

What earns full marks

robots.txt for AI agents

Known AI agent user-agents

Example: allow AI agents, block one

AGENTS.md

A minimal structure

Structured data & semantic HTML basics for agents

JSON-LD structured data

Semantic HTML checklist

Avoiding the "SPA shell" problem

What agents look for

How to fix it

Common adversarial & safety mistakes

What gets flagged

How to stay clean

User-agent	Operator
GPTBot	OpenAI — training data crawler
ChatGPT-User	OpenAI — fetches pages a user asks ChatGPT about
ClaudeBot	Anthropic — training data crawler
Google-Extended	Google — controls use in Gemini / AI features
CCBot	Common Crawl — widely used as AI training data
PerplexityBot	Perplexity — search and answer crawler