LLMs.txt & AI Crawler Checker
Validate your llms.txt and ai.txt. Check crawl rules, training policy, privacy guidance, and get a readiness score.
What is llms.txt?
llms.txt is a plain-text file placed at the root of a website (e.g. https://example.com/llms.txt) that gives AI crawlers, large language models, and retrieval-augmented generation (RAG) systems structured guidance about a site's content, policies, and preferences. Proposed in 2024, it uses a simple key-value format — similar to robots.txt — to declare fields such as Site, Contact, License, Training/FTU Policy, and crawl directives for AI agents. Together with ai.txt, it forms the emerging standard for communicating AI consent and content context on the web.
llms.txt vs ai.txt — what's the difference?
There's no single universal standard yet. llms.txt originated as a way to surface structured, LLM-friendly documentation — think of it as a guided index for AI agents and RAG systems. ai.txt (promoted by Spawning.ai) follows a key-value style similar to robots.txt and focuses primarily on training-data opt-out. Having both maximises coverage across different pipelines.
What actually works — best practices in the wild
- robots.txt is still the most honoured crawl control. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended all respect
User-agentblocks in robots.txt. Use llms.txt as supplementary guidance, not a replacement. - Explicit beats implicit. "NOT permitted" outperforms ambiguous language. LLMs and automated pipelines interpret "transient use" and "fair use" differently — state intent directly.
- Example prompts improve GEO. Sites with
Example-Summary-PromptorExample-Structured-Extractionfields see more accurate, structured outputs from LLMs in RAG settings. - Keep it fresh. A stale
Last-Updateddate signals an abandoned file. Most AI crawlers de-prioritise outdated directives. Aim to update at least every 6 months. - License clarity protects you. Without a
Licensefield, content is treated as unconstrained by many training pipelines. Specify whether commercial reuse requires permission.
FAQ
- Do LLMs actually read llms.txt?
- Not directly during inference. The file is consumed during crawling and indexing. Agentic tools such as Cursor, Claude Projects, and Perplexity fetch it to shape their site understanding. Adoption is accelerating in 2026 but compliance still varies by crawler.
- Which AI crawlers support llms.txt?
- Named crawlers that process llms.txt or ai.txt signals include GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, CCBot, Google-Extended, Amazonbot, Diffbot, and Bytespider. Compliance level varies — robots.txt remains the most universally enforced mechanism.
- Is llms.txt legally enforceable?
- Not on its own — it is an advisory signal, not a contract. For enforceability, combine it with explicit terms of service and robots.txt blocks. Documented intent does matter in licensing and regulatory disputes.
- Which file should I prioritise?
- Deploy both if possible. llms.txt is gaining traction as a GEO and RAG hint layer. ai.txt has broader recognition in training-data curation tools. They serve slightly different audiences and deploying both takes minutes.
- How do I improve my readiness score?
- High-impact improvements: add a
Licensefield, state training policy explicitly ("allowed" or "not permitted"), include a validLast-Updateddate, addPII-Policy, and deploy both llms.txt and ai.txt for the dual-file bonus. - Does llms.txt affect Google rankings?
- Not directly — Google does not use llms.txt as a ranking signal for traditional organic search. However it does influence how Google-Extended (Google's AI training crawler) and AI Overviews interpret and cite your content, which increasingly affects GEO (Generative Engine Optimization) visibility.