Skip to main content

The robots.txt file (SEO)

This article will explain what a robots.txt file is and how to create one.

Updated over 3 weeks ago

Robots.txt Implementation Guide

What is robots.txt?

The robots.txt file is a text file placed in your website’s root directory that tells web crawlers (bots) which pages or sections of your site they can or cannot access. It’s a standard used by responsible bots to understand your preferences.

Important: robots.txt is a voluntary standard. Well-behaved bots will respect it, but malicious bots may ignore it completely.

Where to Place Your robots.txt File

The file must be located at the root of your domain:

  • https://example.com/robots.txt

  • NOT in a subdirectory like /content/robots.txt

Base Template

Here’s a starting template that blocks known bad bots while allowing legitimate search engines:

# Allow major search engines
User-agent: Googlebot
User-agent: Bingbot
User-agent: Slurp
User-agent: DuckDuckBot
User-agent: Baiduspider
User-agent: YandexBot
User-agent: facebot
User-agent: ia_archiver
User-agent: Applebot
Allow: /

# Block known bad bots (SEO crawlers and scrapers)
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: MJ12bot
User-agent: DotBot
User-agent: BLEXBot
User-agent: PetalBot
User-agent: DataForSeoBot
User-agent: Serpstatbot
User-agent: SEOkicks
User-agent: AspiegelBot
User-agent: CCBot
User-agent: GPTBot
Disallow: /

# Default rule for all other bots
User-agent: *
Crawl-delay: 10
Disallow: /

# Sitemap location (update with your actual sitemap URL)
Sitemap: https://example.com/sitemap.xml

Customizing for Your Site’s Functionality

Adding Required Service Bots

Many websites use third-party services that require bot access to function properly. You’ll need to explicitly allow these bots.

Site Search Services

If you use AddSearch, Algolia, Swiftype, or similar:

# Allow site search bot
User-agent: AddSearchBot
Allow: /

Social Media Preview Bots

For proper link previews on social platforms:

# Social media preview bots
User-agent: Twitterbot
User-agent: LinkedInBot
User-agent: Slackbot
User-agent: facebookexternalhit
Allow: /

Monitoring and Analytics Services

If you use uptime monitoring or analytics:

# Monitoring services
User-agent: Pingdom
User-agent: UptimeRobot
Allow: /

AI Training Bots (Optional)

If you want to allow or block AI training bots:

# Block AI training bots
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: CCBot
User-agent: anthropic-ai
User-agent: Claude-Web
User-agent: Google-Extended
Disallow: /

# OR allow them
User-agent: GPTBot
Allow: /

Where to Add Your Custom Bots

Add service bots you need BEFORE the “User-agent: _” line.__

Here’s the correct placement:

# Allow major search engines
User-agent: Googlebot
Allow: /

# YOUR CUSTOM SERVICE BOTS GO HERE
User-agent: AddSearchBot
Allow: /

User-agent: Twitterbot
Allow: /

# Block known bad bots
User-agent: AhrefsBot
Disallow: /

# This must come LAST
User-agent: *
Disallow: /

Step-by-Step Customization Process

Step 1: Identify Required Bots

Make a list of all third-party services your website uses:

  • Site search (AddSearch, Algolia, etc.)

  • Social media platforms

  • Monitoring services

  • CDN services

  • Analytics tools

  • Marketing automation

  • Any other external services that crawl your site

Step 2: Find Bot User-Agent Names

For each service, find the bot’s User-agent name:

  • Check the service’s documentation

  • Search for “[Service Name] bot user-agent”

  • Check your server logs for the bot’s identifier

Common format: ServiceNameBot or ServiceName-Bot

Step 3: Add Bots to Your robots.txt

Add each required bot between the search engines section and the “User-agent: *” line:

User-agent: [BotName]
Allow: /

Step 4: Test Your Configuration

  1. Save your robots.txt file

  2. Upload it to your website cache in the root folder

  3. Test it at: https://example.com/robots.txt

  4. Use Google Search Console’s robots.txt Tester

  5. Monitor your site to ensure functionality isn’t broken

Advanced Customization

Blocking Specific Directories

To allow bots but block certain directories:

User-agent: Googlebot
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Allow: /

Allowing Only Specific Files

User-agent: Googlebot
Allow: /public-content/
Disallow: /

Uploading a robots.txt file

The robots.txt file needs to be accessible at the root of your website, e.g. www.example.com/robots.txt. Once you've created the file, you'll need to upload it to the root of your site's file cache.

You can then point search engines to this file. 

Sitemap

You can also add the location of your sitemap to the robots.txt file which will allow some search engine crawlers to automatically pick up the file. You'd need to upload your sitemap to the root of the file cache (the sitemap.xml file shown above) then add a line to your robots.txt file pointing to this file:

​User-agent: *  
Disallow:
​Sitemap: www.example.com/sitemap.xml

Did this answer your question?