All terms
Glossary · llms.txt

llms.txt / ai.txt

Plain-text files at the site root that publish how large language models and AI crawlers may use the content they fetch.

Sitecheck Team

llms.txt and ai.txt are emerging conventions for publishing site-level policies aimed at large language models and AI crawlers. They sit alongside robots.txt at the root of a domain and declare crawling rules, training and fine-tuning permissions, privacy guidance, and preferred attribution in plain text.

Why it matters

AI crawlers increasingly fetch content for training, retrieval-augmented generation, and answer engines. Without a published policy, model operators have no machine-readable signal about what is allowed, and you have no documented record if you later object to use of your content. Clear declarations also help honest crawlers route requests efficiently, which can reduce wasted crawl budget and bandwidth on pages you do not want indexed by AI systems.

How to use

  • Serve the files at https://example.com/llms.txt and https://example.com/ai.txt with a 200 response and text/plain content type.
  • Include clear fields such as Site, Contact, Training/FTU Policy, PII-Policy, and Preferred-Attribution to reduce ambiguity for model operators.
  • Keep paths consistent with rules in robots.txt so signals do not contradict each other.
  • Reference your sitemap.xml so well-behaved AI crawlers can find canonical URL lists.
  • Combine with structured data on individual pages for richer machine context.
  • Review the policy whenever your terms of service or licensing change.

See also