robots.txt
Sitecheck Team
Small text file that tells crawlers which parts of a site to crawl or avoid.
robots.txt is a plain-text file placed at the root of a site (e.g., https://example.com/robots.txt) that instructs web crawlers about allowed and disallowed paths. It uses simple directives like User-agent, Allow, and Disallow. Remember that robots.txt is advisory — it relies on cooperative crawlers and is not an access-control mechanism.
Best practices:
- Keep it short and canonical.
- Point to your sitemap with
Sitemap:and reference sitemap.xml. - Avoid relying on it for sensitive content (use authentication instead).