How to Stop Search Engines from Crawling your Website

How to Stop Search Engines from Crawling your Website

I’m this tutorial, I will show you how to stop search engines from crawling your website.

As a website owner, you are familiar with search engine bots and how these search engine bots help your web pages get indexed and appear on the search engines like Google, Bing, etc. by crawling on your website.

However, if your website is under construction and you are making some changes to your website, you may not want these search engine bots to crawl your website and you can do these with the robots.txt files. But when you are done working on your website, ensure to check and set the robot.txt files to crawl the website.

Let’s get started.

Stop Search Engines from Crawling your Website

Control search engine crawler access via robots.txt file

You can control how search engine bots crawl your website using robots.txt files.

User-agent: rule specifies which User-agent the rule applies to, with * acting as a wildcard matching any User-agent.

Disallow: sets the files or folders that are not allowed to be crawled.

Setting a Crawl Delay for All Search Engines

If you had 1,000 pages on your website, a search engine could index your entire site in a few minutes. However, this could lead to high system resource usage due to the pages being loaded in a short time.

  • Setting a Crawl-delay of 30 seconds would allow crawlers to index your entire 1,000-page website in about 8.3 hours.
User-agent: *
Crawl-delay: 30

  • Setting a Crawl-delay of 500 seconds would allow search engine crawlers to index your entire 1,000-page website in about 5.8 days.
User-agent: *
Crawl-delay: 500

Here are Examples of robots.txt Configurations

  • Allow all search engines to crawl your website:
User-agent: *
Disallow:
  • Disallow all search engines from crawling your website:
User-agent: *
Disallow: /
  • Disallow one particular search engine (e.g., Bingbot) from crawling your website:
User-agent: bingbot
Disallow: /
  • Disallow all search engines from specific folders (e.g., `/cgi-bin/`, `/private/`, `/tmp/`):
User-agent: *
Disallow: /cgi-bin/
Disallow: /private/
Disallow: /tmp/
  • Disallow all search engines except one (e.g., allowing only Googlebot to access the `/private/` directory):
User-agent: *
Disallow: /private/
User-agent: Googlebot
Allow: /private/

Tips for Using robots.txt

  • Test Your robots.txt File: Use tools provided by search engines (e.g., Google Search Console) to test your robots.txt file and ensure it behaves as expected.
  • Keep it Simple: Complex robots.txt rules can lead to unintended consequences. Stick to clear, simple rules whenever possible.
  • Regularly Review: Update and review your robots.txt file periodically to adapt to changes in your website structure or SEO strategy.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply