Validate your llms.txt and ai.txt. Check crawl rules, training policy, privacy guidance, and get a readiness score.
llms.txt is a plain-text file placed at the root of a website (e.g. https://example.com/llms.txt) that gives AI crawlers, large language models, and retrieval-augmented generation (RAG) systems structured guidance about a site's content, policies, and preferences. Proposed in 2024, it uses a simple key-value format — similar to robots.txt — to declare fields such as Site, Contact, License, Training/FTU Policy, and crawl directives for AI agents. Together with ai.txt, it forms the emerging standard for communicating AI consent and content context on the web.
There's no single universal standard yet. llms.txt originated as a way to surface structured, LLM-friendly documentation — think of it as a guided index for AI agents and RAG systems. ai.txt (promoted by Spawning.ai) follows a key-value style similar to robots.txt and focuses primarily on training-data opt-out. Having both maximises coverage across different pipelines.
User-agent blocks in robots.txt. Use llms.txt as supplementary guidance, not a replacement. Example-Summary-Prompt or Example-Structured-Extraction fields see more accurate, structured outputs from LLMs in RAG settings. Last-Updated date signals an abandoned file. Most AI crawlers de-prioritise outdated directives. Aim to update at least every 6 months. License field, content is treated as unconstrained by many training pipelines. Specify whether commercial reuse requires permission. License field, state training policy explicitly ("allowed" or "not permitted"), include a valid Last-Updated date, add PII-Policy, and deploy both llms.txt and ai.txt for the dual-file bonus.