| Is Your Robots.txt File Doing It's Job?
Categories: Website Design & Development, WordPress477 words1.8 min read

Is Your Robots.txt File Doing It’s Job?

How to Tell if Your Robots.txt File is Working

First off, do you even have a Robots.txt file? If not, chances are good you need one.  The Robots.txt file’s job is to tell the search engines which files and directories they should and should not index. Creating it is really simple and can be done with any text editor from Notepad to Microsoft Word. This handy little file is really essential if you are using either a CMS like WordPress or creating websites with hand-coding that contain secure files or even folders with design information.

While most search engines won’t crawl or index the content of pages blocked by robots.txt, they may still index the URLs found on other pages on the Web. This can result in the URL of the blocked page and other contents such as anchor text in links appear in search results making it all the more critical that the pages are also protected by a login requiring a user name and password. Plus, there’s also the potential that a less than reputable spider will ignore the robots.txt file.

Robots.txt for WordPress

Here is an example of the contents of a Robots.txt file for a WordPress site.  If the WordPress install is in a subdirectory, prefix it as such.  See the WordPress Codex for more information https://wordpress.org/support/article/search-engine-optimization/.

User–agent: *
Allow: /
Disallow: /cgi–bin
Disallow: /wp–admin
Disallow: /wp–includes
Disallow: /wp–content
Disallow: /e/
Disallow: /show–error–*
Disallow: /xmlrpc.php
Disallow: /trackback/
Disallow: /comment–page–
Allow: /wp–content/uploads/
User–agent: Mediapartners–Google
Allow: /
User–agent: Adsbot–Google
Allow: /
User–agent: AdsBot–Google–Mobile–Apps
Allow: /
User–agent: Googlebot
Allow: /
User–agent: Googlebot–Image
Allow: /
User–agent: Googlebot–Mobile
Allow: /
User–agent: Googlebot–News
Allow: /
User–agent: Googlebot–Video
Allow: /
Sitemap: https://YOUR_SITEMAP_URL

Google provides a free testing tool to make sure your robots.txt file is correctly formatted. You can access it in your Google Webmaster Tools under Site configuration/Crawler access or learn more about it at Google Webmaster Tools Help.

Note: The robots.txt file belongs in the root folder of the server. If you don’t have access to that, you.can use the robots meta tag to provide this information to the spider.

Robots Meta Tag

You can use a special HTML tag to tell robots not to index the content of a page, and/or not scan it for links to follow in a manner similar to the robots.txt file.

The default for the robots meta tag is INDEX,FOLLOW so you do not need to add a tag for that.  Pages only need a tag if you want to provide the spider with other directions such as:

Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes

Explore More Articles

Share This, Choose Your Platform!