smart_toy Robots.txt Generator

Create robots.txt files to control search engine and AI crawler access to your website

settings Configuration

User-Agent

block Disallow (blocked paths)

One path per line. Use * as wildcard.

check_circle Allow (explicitly allowed)

Override Disallow rules.

Crawl-delay (seconds)

Google ignores this directive.

smart_toy AI Crawler Options

New

info AI companies use web crawlers to train their models. You can choose to block or allow these separately from search engines.

Block GPTBot (OpenAI/ChatGPT training) Block Google-Extended (Bard/Gemini training) Block anthropic-ai & ClaudeBot (Anthropic/Claude) Block CCBot (Common Crawl / AI training datasets) Block PerplexityBot Block ChatGPT-User (ChatGPT browsing)

science Test URL Against Rules

About the Robots.txt Generator

Create properly formatted robots.txt files with our free online generator. Control how search engine crawlers and AI bots access your website, block sensitive directories, and specify your sitemaps — all with an easy-to-use visual interface.

What is robots.txt?

The robots.txt file is a standard used by websites to communicate with web crawlers and search engine bots. It's placed in the root directory of your website (e.g., https://example.com/robots.txt) and tells crawlers which pages or directories they should or shouldn't access.

Key Directives

Directive	Description	Example
`User-agent`	Specifies which crawler the rules apply to	`User-agent: Googlebot`
`Disallow`	Blocks access to a path	`Disallow: /admin/`
`Allow`	Explicitly allows access (overrides Disallow)	`Allow: /admin/public/`
`Sitemap`	Points to your XML sitemap	`Sitemap: https://example.com/sitemap.xml`
`Crawl-delay`	Seconds between requests (not supported by Google)	`Crawl-delay: 10`
`Host`	Preferred domain (Yandex)	`Host: https://example.com`

Wildcard Patterns

* — Matches any sequence of characters (e.g., /api/* blocks everything under /api/)
$ — Matches the end of a URL (e.g., /*.php$ blocks all .php files)
/folder/ — Trailing slash matches directories

Common User-Agents

Bot	User-Agent	Purpose
Google	`Googlebot`	Google Search indexing
Bing	`Bingbot`	Bing Search indexing
OpenAI	`GPTBot`	ChatGPT training data
Google AI	`Google-Extended`	Bard/Gemini training
Anthropic	`anthropic-ai`, `ClaudeBot`	Claude AI training
Common Crawl	`CCBot`	Open web archive / AI datasets

AI Crawler Blocking

Many website owners now choose to block AI training crawlers while still allowing search engine indexing. This generator makes it easy to do both — allowing Google and Bing to index your content for search, while preventing AI companies from using your content to train their models.

Best Practices

Don't block CSS/JS files — Search engines need these to render your pages correctly
Use robots.txt for crawl efficiency, not security — It's a guideline, not a security measure. Sensitive content should be password-protected.
Include your sitemap — Helps crawlers discover all your pages
Test before deploying — Use Google Search Console to test your robots.txt
Keep it simple — Complex rules can have unintended consequences
Monitor crawl stats — Check that important pages are being indexed

Important Notes

robots.txt is publicly accessible — don't include sensitive paths you're trying to hide
Changes take effect after crawlers re-fetch the file (can take days)
Not all bots respect robots.txt — malicious crawlers may ignore it
To remove already-indexed pages, use noindex meta tags or Google Search Console

How to Install

Generate your robots.txt using the tool above
Download the file or copy the contents
Upload to your website's root directory (e.g., /public_html/robots.txt)
Verify by visiting https://yoursite.com/robots.txt
Test in Google Search Console under "robots.txt Tester"