llms.txt and ai.txt are emerging conventions for publishing site-level policies aimed at large language models and AI crawlers. They sit alongside robots.txt at the root of a domain and declare crawling rules, training and fine-tuning permissions, privacy guidance, and preferred attribution in plain text.
Why it matters
AI crawlers increasingly fetch content for training, retrieval-augmented generation, and answer engines. Without a published policy, model operators have no machine-readable signal about what is allowed, and you have no documented record if you later object to use of your content. Clear declarations also help honest crawlers route requests efficiently, which can reduce wasted crawl budget and bandwidth on pages you do not want indexed by AI systems.
How to use
- Serve the files at
https://example.com/llms.txtandhttps://example.com/ai.txtwith a200response andtext/plaincontent type. - Include clear fields such as
Site,Contact,Training/FTU Policy,PII-Policy, andPreferred-Attributionto reduce ambiguity for model operators. - Keep paths consistent with rules in robots.txt so signals do not contradict each other.
- Reference your sitemap.xml so well-behaved AI crawlers can find canonical URL lists.
- Combine with structured data on individual pages for richer machine context.
- Review the policy whenever your terms of service or licensing change.