How to Create a Robots.txt File

Your robots.txt file tells search engine crawlers which pages they're allowed to visit. Get it right and you direct crawl budget to your best content. Get it wrong — one bad line — and you can accidentally block Google from crawling your entire site. It happens more often than you'd think.

What Is a Robots.txt File?

Robots.txt is a plain text file placed at the root of your website (e.g., https://yourdomain.com/robots.txt). It uses the Robots Exclusion Protocol to communicate crawling instructions to search engine bots.

A basic robots.txt looks like this:

User-agent: * Disallow: /admin/ Disallow: /private/ Allow: / Sitemap: https://yourdomain.com/sitemap.xml

Every major search engine checks robots.txt before crawling a site. The file is public — anyone can view it by visiting yourdomain.com/robots.txt. It is a directive, not a security measure: it tells crawlers what to skip, but it doesn't prevent access by bad actors.

Key directives:

  • User-agent: — specifies which bot the rules apply to (* means all bots)
  • Disallow: — blocks the specified path from being crawled
  • Allow: — explicitly permits a path (used to override a broader Disallow)
  • Sitemap: — tells bots where to find your XML sitemap
  • Why It Matters for SEO

  • Crawl budget:: Google's crawl budget is the number of URLs it crawls on your site per day. Blocking low-value pages (login pages, admin areas, duplicate content) frees budget for pages you want ranked.
  • Preventing duplicate content:: Pages like ?sort=price, ?ref=partner, or print versions of pages can be blocked in robots.txt to prevent them from diluting your content quality signals.
  • Protecting sensitive areas:: Admin dashboards, staging subdirectories, and user account pages shouldn't be indexed. Robots.txt keeps them out of search results.
  • Sitemap discovery:: The Sitemap: directive in robots.txt is one of the ways Google discovers your sitemap — even if you haven't submitted it manually in Search Console.
  • How to Check Your Robots.txt

    Clarity SEO's free Robots Generator creates a valid, well-structured robots.txt for your site and checks your existing file for common errors.

    → Generate your robots.txt with Clarity SEO

    The full Report Card also checks for a missing or misconfigured robots.txt as part of its 29-point audit.

    → Get your free SEO Report Card

    You can also view your current robots.txt by visiting https://yourdomain.com/robots.txt in your browser.

    How to Fix It

    For HTML/Generic

    Step 1: Create the file.

    Create a plain text file named robots.txt (lowercase, no extension). It must live at the root of your domain — not in a subfolder.

    Step 2: Write the rules.

    Here's a solid starting template for most websites:

    # Allow all crawlers to access the entire site User-agent: * Disallow: /admin/ Disallow: /wp-admin/ Disallow: /wp-login.php Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /login/ Disallow: /private/ Disallow: /staging/ Allow: /wp-admin/admin-ajax.php # Block specific bots you don't want User-agent: AhrefsBot Disallow: / User-agent: SemrushBot Disallow: / # Point to your sitemap Sitemap: https://yourdomain.com/sitemap.xml

    Step 3: Upload the file.

    Upload robots.txt to your site's root directory via FTP, SFTP, or your hosting file manager. Verify it's live by visiting https://yourdomain.com/robots.txt.

    Step 4: Test it.

    Use Google Search Console's Robots.txt Tester to verify your rules work as intended before Googlebot encounters them.

    For WordPress

    WordPress automatically creates a virtual robots.txt if no physical file exists in the root directory. The default allows all crawlers.

    Method 1: Plugin (recommended)

    Using Yoast SEO:

  • Go to Yoast SEO → Tools → File editor.
  • Edit the robots.txt file directly in the browser.
  • Save — Yoast writes the physical file.
  • Using Rank Math:

  • Go to Rank Math → General Settings → Edit robots.txt.
  • Edit and save.
  • Method 2: Manual upload

  • Create your robots.txt file locally.
  • Upload it to your WordPress root directory (same folder as wp-config.php) via FTP or your hosting control panel.
  • A physical file takes precedence over WordPress's virtual one.
  • ⚠️ Critical warning: In WordPress → Settings → Reading, there is a checkbox: "Discourage search engines from indexing this site". If this is checked, WordPress adds Disallow: / to your robots.txt — blocking all crawlers from your entire site. Check this setting first if your site isn't being indexed.

    For Shopify

    Shopify automatically generates a robots.txt file for your store. You can customise it using a robots.txt.liquid template:

  • Go to Online Store → Themes → Edit code.
  • Under Templates, look for robots.txt.liquid.
  • If it doesn't exist, click Add a new templaterobots.txt.
  • Add custom rules using Liquid:
  • {% assign shopify_robots = true %} {{ content_for_header }} User-agent: * {% for rule in robots.default_groups %} {% for rule_item in rule.rules %} {{ rule_item.directive }}: {{ rule_item.value }} {% endfor %} {% endfor %} # Custom additions Disallow: /collections/*?sort_by= Disallow: /search? Sitemap: {{ routes.root_url }}sitemap.xml

    For Wix / Squarespace / Webflow

    Wix: Wix manages robots.txt automatically. Custom robots.txt editing requires a Business or higher plan. Go to Settings → SEO → Robots.txt to customise.

    Squarespace: Squarespace generates its own robots.txt and does not allow full custom editing on standard plans. You can customise some crawling behaviour via Settings → Advanced → External API Keys for Google Search Console integration.

    Webflow: Webflow generates robots.txt automatically. Custom robots.txt is only available on paid hosting plans. Edit via Project Settings → SEO → Robots.txt.

    Common Mistakes to Avoid

  • `Disallow: /` with no Allow rules:: This blocks all crawlers from your entire site. One of the most catastrophic SEO mistakes possible. Always verify your file doesn't contain this unless intentional.
  • Forgetting the Sitemap directive:: Missing the Sitemap: line means Google must find your sitemap via Search Console alone — a wasted opportunity.
  • Using robots.txt as a security tool:: Robots.txt is publicly visible. It tells crawlers to skip a path, but it doesn't prevent access. Use server authentication for sensitive files.
  • Blocking CSS and JavaScript:: Blocking /wp-content/ or similar directories prevents Google from rendering your pages properly, which hurts mobile usability assessments and Core Web Vitals scoring.
  • Case sensitivity errors:: The path in Disallow: /Admin/ is case-sensitive on some servers. /admin/ and /Admin/ are different paths on Linux servers.
  • Not testing after changes:: Always validate with Google Search Console's robots.txt tester after any changes.
  • FAQ

    What is a robots.txt file?

    A robots.txt file is a plain text file at the root of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to crawl and index. It follows the Robots Exclusion Protocol standard.

    Does robots.txt prevent pages from appearing in Google?

    Robots.txt prevents crawling, but not always indexing. If other sites link to a blocked page, Google can still index it (showing a URL with no description). To prevent indexing, use a noindex meta tag — and don't block those pages in robots.txt, because Googlebot won't be able to read the noindex tag on a blocked page.

    Where should the robots.txt file be located?

    The robots.txt file must be in the root directory of your domain — https://yourdomain.com/robots.txt. A robots.txt at https://yourdomain.com/blog/robots.txt is not valid and will be ignored by crawlers.

    Can I have different rules for different search engines?

    Yes. Use separate User-agent: blocks for each bot. For example, User-agent: Googlebot applies rules only to Google's crawler. User-agent: * applies to all crawlers not covered by a specific rule.

    What happens if I don't have a robots.txt file?

    Crawlers will crawl your entire site by default. Most crawlers check for robots.txt first — if none exists, they proceed without restrictions. This isn't inherently bad, but it means you're not directing crawl budget and potentially wasting it on admin, login, and cart pages.

    Summary

    A well-structured robots.txt file takes less than ten minutes to create and helps Google spend its crawl budget on the pages that actually matter for your rankings. Block admin areas, duplicate parameter pages, and staging content — and always point to your sitemap.

    Generate a clean, validated robots.txt file now with the free Clarity SEO tool.

    → Get your free SEO Report Card

    Related Tools