robots.txt is a plain-text file served from the root of a host (for example https://example.com/robots.txt) that tells web crawlers which URL paths they are allowed to fetch. It uses simple directives — User-agent, Allow, Disallow, and Sitemap — and is defined by RFC 9309. The file is advisory: it relies on cooperative bots and is not an access-control mechanism.

Why it matters

A well-formed robots.txt keeps crawlers focused on URLs that should be indexed, which helps with crawl budget on large sites and prevents staging or admin paths from showing up in search. A broken or overly aggressive file can deindex an entire site overnight — a stray Disallow: / is one of the most common SEO regressions during a launch.

How to check

Fetch /robots.txt directly and confirm it returns 200 with Content-Type: text/plain.
Validate the file in Google Search Console's robots.txt report or Bing Webmaster Tools.
Use Disallow: to block crawl paths, but use noindex headers or meta tags to keep pages out of the index — robots.txt does not remove URLs that are already indexed.
Add a Sitemap: line that points to your sitemap.xml.
Never put secrets or auth-only paths behind robots.txt; protect them with authentication.
If you publish AI guidance, link to your llms.txt policy.

Why it matters

How to check

See also