Creating a flawless robots.txt file is important for SEO. The difficulty lies in including the orders correctly to indicate what we want to crawl and what we want not and by whom (which robots can crawl us). A mistake while creating this file can mean a great loss of traffic, since it could be limiting access to big search engines like Google. We will now outline some of the most common mistakes:
- One of the most common mistakes is that the robots.txt file is not edited after launching a site. In some cases, people work with a preliminary version which includes indications that it shall not be crawled. It is common that people launch the site to production without deleting this restriction.
- Leaving it empty or host it in a different place to that where it should be (www.examplerepublica.com/robots.txt).
- Blocking the access to CSS files or images. Here you can see a video by Matt Cutts explaining that you should not do it.
- Incorrect use of the functions to block or allow different URLs at the same time. Here you have a list with the most common, which you can start using correctly or check those you have.
- Google indicates that it is only necessary to create the file if there are parts of your website that you do not wish to be indexed. However, I think it is crucial to create it always, because we should have at the very least a list with the robots to which we want to block access. These are the bots that might be harmful, such as those that slow down the performance of our webpage or which copy the content automatically.
- Use disallow so that a URL disappears from the results. We do this only to indicate that it shall not be crawled. To prevent the indexation, we have to allow access to the bot and include the tag noindex, redirect it or offer another option depending on the situation we are in.
- Another common mistake for this file is to indicate something that contradicts what is stated on the sitemap. For example, blocking the access to a folder while including a URL inside that folder. Remember that the sitemap and the robots always have to be aligned!
Do not forget that you can validate your robots.txt file using Google Search Console (which we will keep calling Webmaster Tools forever). You can do it from the side menu: crawling / robots.txt
Have you ever thought about including a hidden message in your file? Here you have a selection of original and funny robots.txt files.