smart_toy Robots.txt Generator

Create robots.txt files to control search engine and AI crawler access to your website

settings Configuration

One path per line. Use * as wildcard.
Override Disallow rules.
Google ignores this directive.

public Global Options

One sitemap URL per line. Helps search engines discover your pages.
Optional. Specifies the preferred domain (with or without www).

smart_toy AI Crawler Options

New
info AI companies use web crawlers to train their models. You can choose to block or allow these separately from search engines.

science Test URL Against Rules

About the Robots.txt Generator

Create properly formatted robots.txt files with our free online generator. Control how search engine crawlers and AI bots access your website, block sensitive directories, and specify your sitemaps — all with an easy-to-use visual interface.

What is robots.txt?

The robots.txt file is a standard used by websites to communicate with web crawlers and search engine bots. It's placed in the root directory of your website (e.g., https://example.com/robots.txt) and tells crawlers which pages or directories they should or shouldn't access.

Key Directives

Directive Description Example
User-agent Specifies which crawler the rules apply to User-agent: Googlebot
Disallow Blocks access to a path Disallow: /admin/
Allow Explicitly allows access (overrides Disallow) Allow: /admin/public/
Sitemap Points to your XML sitemap Sitemap: https://example.com/sitemap.xml
Crawl-delay Seconds between requests (not supported by Google) Crawl-delay: 10
Host Preferred domain (Yandex) Host: https://example.com

Wildcard Patterns

  • * — Matches any sequence of characters (e.g., /api/* blocks everything under /api/)
  • $ — Matches the end of a URL (e.g., /*.php$ blocks all .php files)
  • /folder/ — Trailing slash matches directories

Common User-Agents

Bot User-Agent Purpose
Google Googlebot Google Search indexing
Bing Bingbot Bing Search indexing
OpenAI GPTBot ChatGPT training data
Google AI Google-Extended Bard/Gemini training
Anthropic anthropic-ai, ClaudeBot Claude AI training
Common Crawl CCBot Open web archive / AI datasets

AI Crawler Blocking

Many website owners now choose to block AI training crawlers while still allowing search engine indexing. This generator makes it easy to do both — allowing Google and Bing to index your content for search, while preventing AI companies from using your content to train their models.

Best Practices

  • Don't block CSS/JS files — Search engines need these to render your pages correctly
  • Use robots.txt for crawl efficiency, not security — It's a guideline, not a security measure. Sensitive content should be password-protected.
  • Include your sitemap — Helps crawlers discover all your pages
  • Test before deploying — Use Google Search Console to test your robots.txt
  • Keep it simple — Complex rules can have unintended consequences
  • Monitor crawl stats — Check that important pages are being indexed

Important Notes

  • robots.txt is publicly accessible — don't include sensitive paths you're trying to hide
  • Changes take effect after crawlers re-fetch the file (can take days)
  • Not all bots respect robots.txt — malicious crawlers may ignore it
  • To remove already-indexed pages, use noindex meta tags or Google Search Console

How to Install

  1. Generate your robots.txt using the tool above
  2. Download the file or copy the contents
  3. Upload to your website's root directory (e.g., /public_html/robots.txt)
  4. Verify by visiting https://yoursite.com/robots.txt
  5. Test in Google Search Console under "robots.txt Tester"