Robots.txt file is in the root directory of your website. For example, on www.example.com website, robots.txt file address will look like https://www.shoutmecrunch.com/robots.txt and should be available at this address. It represents a typical text file, which corresponds exception standard for robots, and includes one or more rules, each of which forbids or allows one or another web robot to access to a given path on the website.
Ideal Encoding Format
The file offers recommendations to web robots as for what pages/files to scan. It works UTF-8 encoded, and if it is to contain symbols of some other coding, web robots can handle them wrong. The rules listed in the robots.txt file are valid concerning the host, protocol and port number the data is.
Index or No Index
This option allows fixing a limit on queries, which your web server receives, and step the load down. It is not meant to forbid page impressions in Google results. In case you do not want some of your materials from your website to be in Google, use tags or no index directives. Also, you can create website sections scrambled with a password.
On the Internet, you can find many publications on the topic of how to create a better (or even the best) robots.txt file for WordPress. At the same time, in some such popular articles, many rules are not explained and, it seems to me, are hardly understood by the authors themselves. The only review I found that deserves attention is the wp-kama blog article.
However, there I found not entirely correct recommendations. It is clear that each site will have its nuances when compiling a robots.txt file. However, there are many characteristic points for altogether different websites that can be a basis. Robots.txt, published in this article, can be directly copied and pasted to a new site and further refined according to its nuances.
In more detail about the compilation of robots.txt and the meaning of all its directives, I wrote here. Below I will not elaborate on the purpose of each rule. I will confine myself to briefly commenting on what is necessary for what.
The reason for robots.txt existence
For instance, sometime robots.txt should not visit
- Pages with personal information of a user;
- Pages with different forms of unloading submissions;
- Scraper sites;
- Pages with search results.
Correct Robots.txt for WordPress
The best robots.txt that I have seen now is robots, suggested in the wp-kama blog. I will take some directives and comments from his sample + make my adjustments. Adjustments will affect several rules, why I am writing below. Besides, we will write individual rules for all robots, for Google.
There are short and extended versions. Short version does not include separate blocks for Google. Advanced is already less relevant because now there are no fundamental features between two major search engines: both systems need to index script and image files, both do not support the Host directive. However, if something changes in this world again, or you still need to manage the indexing of data on the website by Google in a separate way, I will also save the second option in this article.
Once again, I note that this is the base robots.txt file. In each case, you need to look at the real site and, if necessary, make adjustments. Entrust this business to experienced specialists!
Mistakes made by other bloggers for Robots.txt on WordPress
- Use rules for User-agent only: *
For many search engines, JS and CSS indexing are not required to improve rankings, also, for less significant robots you can adjust a higher Crawl-Delay value and reduce the load on your site at their expense.
- Sitemap assignment after each User-agent
It is not necessary. One sitemap must be specified once anywhere in the robots.txt file.
- Close wp-content, wp-includes, cache, plugins, themes folders
These are outdated requirements. However, I found such advice even in an article with the grand rhetoric "The correct robots for WordPress 2018"! For Google, it would be better not to close them at all. Alternatively, close "smart," as described above.
- Close tag and category pages
If your site has such a structure that the content on these pages is duplicate and there is no particular value in them, it is better to close it. However, often the promotion of the resource is carried out including through the pages of categories and tagging. In this case, you can lose some traffic.
- Close pagination pages/page/from indexing
It is not necessary. For such pages, the rel = "canonical" tag is configured; thus, robots visited such pages, and they take into account the positioned goods/articles, as well as the internal reference mass.
- Register Crawl-Delay
Fashionable rule. However, it should be only when there is a need to restrict visits by robots to your site. If the website is small, then limiting the time "to be" will not be the most sensible undertaking.
Some rules I can only attribute to the category of "the blogger did not have thought." For example Disallow/ 20 - according to this rule, not close all archives only, but at the same time all articles about 20 ways or 200 tips on how to make the world a better.
With the help of robots.txt, you can set the instructions of web robots, to promote oneself, your brand, look for specialists. There is much room for experiments. Just keep in mind to fill file correctly and about common mistakes.
Make smart use of an indexed file, and your website will always be in search results.
About the author
With her writing, Melisa Marzett declares that she knows what is interesting for an enthusiastic reader. Being such one herself and currently working for www.findwritingservice.com/. She writes beautiful articles, which are a pure delight to read.