Robots.txt WordPress Plugin
This is another one of those handy plugins designed for people like myself, who just want to be able to set something up and then not worry about it again.
What the plugin does
You probably know that when a search engine spider visits your site, one of the first things it does is look for a file called robots.txt which tells it which files and folders it can go and look at. By default, WordPress lets every robot go everywhere. That might be ok for some people, but I prefer to exercise a bit more control over things.
For example, if the robot identifies itself as a bad bot – yes, some of them do – then I don't want it to go anywhere. All it's probably going to do is trawl for email addresses to add to a spam list somewhere. And I don't really want any robots poking their noses into such places as the WordPress admin folder. Control freak? Me? Don't know what you mean…
The solution is to add the names of bad bots to your robots.txt file and disallow them from going anywhere, and add the names of common search engine spiders and specify which locations or files they are allowed to visit.
This plugin will do that in a completely hands free way by setting up a virtual robots.txt file for your blog as soon as it's activated. Whenever a request for a robots.txt file comes in, WordPress will display the contents of your virtual robots.txt file. No physical file is created on your site but one is shown to the search engine bot.
By default, your virtual robots.txt file will have Google's Mediabot allowed, a bunch of spam-bots disallowed, and a few of the standard WordPress folders and files disallowed. The default collection of bad bots is borrowed from http://www.clickability.co.uk/robotstxt.html.
Ok, even though it's completely automated and hands free I admit there are times when I want to tweak what's contained in the virtual robots.txt file. There's now a handy options page which lets you edit the contents.
Oh yeah, and if you mess up your robots.txt file you can just deactivate and reactivate the plugin and it will revert back to the default list of rules.
Also, if the plugin detects an existing sitemap.xml file (or if you are using my XML Sitemap plugin) it will add a reference to your sitemap.xml to the end of the robots.txt file. I'm told this helps with the discovery of your sitemap.xml and indexing of your pages. That's got to be a good idea.
How to use the plugin
With the plugin now being hosted on WordPress, the easiest way to install this baby is to visit your blog admin pages, click the Plugins menu, and then click the Add New menu. In the search box type something like "robots.txt" and with a bit of luck you should see PC Robots.txt in the list that appears. To the right of it you'll see a link to Install the plugin. Click that.
If you happen to be using the version that was hosted on this site, please delete it and install a new version using the instruction above. That way you'll always have the latest version and you'll get notified of updates and such by WordPress.
The official download page is at http://wordpress.org/extend/plugins/pc-robotstxt/
And please do give me a shout to let me know if it works for you or not :-)
del.icio.us
Digg
Propeller
StumbleUpon
Reddit
Furl

Hi there. My name is Peter Coughlin. I am a freelance web developer living in the UK, and at the moment I am specialising in WordPress customisation.
60 Responses so far ↓
Sep 5, 2008 at 12:15 pm
This is a truly elegant solution to a total PITA.
Top job!
Sep 5, 2008 at 12:18 pm
Does a robots.txt file help a blog get re-indexed in Google, if you’ve been kicked out?
My site was delisted from Google, I suspect for syndicating headlines of everyone elses content with a link back to the full text article. Do you know how I can fix this?
Thanks kindly,
Shawn
May 27, 2009 at 4:00 pm
Can this be used in conjunction with the google xml sitemaps plugin?
Jun 1, 2009 at 5:01 am
Hi Mark. I can’t think of any reason why the robots.txt plugin shouldn’t be used with the google xml sitemaps plugin.
Jun 12, 2009 at 10:51 pm
Hey Peter,
GREAT PLUGIN! It’s the only robots.txt that can properly be automated in wordpress. I’m impressed. Thanks very much. Works as advertised.
I’ll be reviewing your plugin on my blog, if you like, http://www.usingwp.com.
Jun 12, 2009 at 11:14 pm
[...] a cruise over to and check out his plugin, Robots.txt Wordpress plugin. Now, this is a simple but slick plugin. And does it work [...]
Jun 21, 2009 at 12:21 pm
I’m getting a 404 File Not Found error when I preview my robots.txt file. It’s pointing to http://www.pat-phillips-homes.com/robots.txt. Is it because robots.txt is a virtual file and doesn’t actually exist in the root directory, and so I’ll get the error?
I also am using the XML Sitemap Generator for WordPress 3.1.3 plugin and have UNCHECKED the setting: “Add sitemap URL to the virtual robots.txt file.
The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!” Should it be checked? I thought it might conflict with your plugin.
Jun 21, 2009 at 1:07 pm
@Pat – It looks like there’s something odd going on somewhere – your 404 message has a strange path for the robots.txt file it couldn’t find. I’m not sure what would cause that, but maybe you could try deactivating your sitemap generator plugin and previewing the robots.txt file again?
Jun 21, 2009 at 3:19 pm
Well I tried deactivating Google XML Sitemap and changed my permalinks setting back to default instead of structured and neither action made any difference. Still getting the 404 error: /static/pat-phillips-homes.com//robots.txt’ was not found on this server. Should I go ahead and create a robots.txt file copy the code that your plugin generated and paste it into the robots.txt file and upload it to my server. Then delete the two plugins since I now have a physical robots.txt file which now I can point Google Webmaster Tools to?
Jun 21, 2009 at 7:15 pm
@Pat – Yeah, I guess creating a physical robots.txt might be the easiest thing in your case.
Jul 3, 2009 at 2:01 pm
Hey what does it mean by disallow /
on a bunch of things do I need to edit the file?
User-agent: Telesoft
Disallow: /
User-agent: The Intraformant
Disallow: /
..etc.
Jul 3, 2009 at 2:22 pm
I tried a validator and says I have a lot of errors and warnings I noticed there were multiple names of some like webextractor in there twice. Any idea what else I need to do to change this or can I leave it?
Jul 3, 2009 at 3:26 pm
Hi Amamda. Yes, there were a couple of duplicate entries in there – don’t know how I missed those.. I have removed them and updated the plugin files on wordpress.org so if you re-install the plugin you should be good to go. I also ran it through a validator and it comes up clean. Thanks for letting me know about the errors.
The “Disallow: /” line means disallow access to the complete domain for whichever “User-agent” is listed above it. Unless you want to make specific exceptions you shouldn’t need to edit anything.
Jul 10, 2009 at 10:49 am
I have the same problem like Pat. When I check the page http://www.pawel-trzepiota.pl/robots.txt there is 404 message. But if I check http://www.pawel-trzepiota.pl/index.php/robots.txt the plugin generates the file.
Now I wonder if I should add a rewrite clause for robots.txt… but the way you describe it it should work without it.
Jul 10, 2009 at 11:08 am
@mac13 – that’s interesting, I notice all your links begin with /index.php/ – do you have that as part of your permalink structure?
Jul 10, 2009 at 11:22 am
Self response – I’ve added ErrorDocument for index.php and it works….
Jul 11, 2009 at 1:41 am
Well it was. Now I’ve changed it to do it without index.php and with the ErrorDoc 404 pointed to index.php it works.
Jul 18, 2009 at 12:08 pm
Hello, I just want to be sure of something before I use this plugin. Since SEO is very important for my site and I know very little about WP folder/file structure can you just reassure me that your list in PC Robots.txt won’t block any pages/posts that I create?
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Jul 18, 2009 at 12:16 pm
@Marko – good for you for asking! Everything in that list can safely be blocked without it affecting your posts and pages.
Jul 18, 2009 at 6:37 pm
Thank you Peter :)
I suppose since my post and pages are dynamic those can’t be blocked using robots.txt? The only way would be to use robots meta element on them?
And thank you for the plugin very much!
Jul 18, 2009 at 7:10 pm
@Marko – You’re welcome! You can block your posts and pages by adding the post or page URL to the robots.txt file with a Disallow instruction, but as you mention I prefer to add a meta tag..
Jul 19, 2009 at 6:48 am
Hello Peter, me again :)
I just saw on WordPress Version 2.8 Release Notes http://codex.wordpress.org/Version_2.8 that Login and Registration pages are made noindex followed by default. Maybe now you would like to remove those locations from your PC Robots.txt. I know it can’t hurt to leave them there, but I just wanted to give you a heads up.
Cheers!
Jul 19, 2009 at 6:52 am
@Marko – thanks for that, but I will leave those lines in there for the people not yet using 2.8..
Aug 25, 2009 at 12:21 pm
One question: not particularly related to your plugin but here it is. I have 2 sites: one in root server and other one in subfolder:
1. /
2. /subfolder/
Both have their own URLs which can be accessed through HTTP. I know that when robot comes to index site in the root it would index the other site in subfolder too as a part of it. But I want to prevent indexing of the site in subfolder as a part of the first site because they are not related at all.
My idea:
I want to put robots.txt in root to forbid that subfolder and thus prevent indexing the second site with the first. But I also want to put robots.txt in subfolder and allow indexing. I think this is possible since the second site has its own HTTP URL.
Will this work? Thank you and sorry for boring you. :)
Aug 27, 2009 at 2:24 pm
Thank you, thank you, thank you… This is a plugin that everyone who cares about SEO on their WP blogs should install as a default.
In fact, why doesn’t the privacy page of WP already offer this vitally-important functionality? Do WP admins/coders not care about SEO at all?
Thanks again,
Saul
Aug 27, 2009 at 3:15 pm
@marko — I’m just guessing here, but I think you might be able to do it. Robots only look for robots.txt files in the root of a domain, so if you disallow site 2 (/subfolder/) in the robots.txt file for site 1, it shouldn’t be indexed as part of site 1. If you then allow site 2 to be indexed using it’s own robots.txt file (in /subfolder/) it should be indexed as a separate domain. Well that’s the theory at least.. you can run a quick check in Google webmaster tools to see if it works..
Aug 27, 2009 at 3:17 pm
@Saul — thank you for such a great comment – very much appreciated!
Regards,
Peter.
Aug 27, 2009 at 5:49 pm
Thanks Peter for your opinion very much, it does work!
Sep 11, 2009 at 10:24 am
I noticed you automatically included this in your plugin’s virtual robots.txt:
User-agent: Googlebot
Disallow:
Wouldn’t that prevent Google from indexing one’s site… and hence, result in decrease of search engine traffic?
Sep 11, 2009 at 11:02 am
@Fruity – Whatever comes after the Disallow statement is disallowed.. so if there’s nothing after it, then nothing is disallowed, or in other words everything is allowed. I know it sounds backwards but it is in fact the proper way of allowing full access…
Sep 17, 2009 at 11:48 am
I just installed this plugin and it works great. I use the “Google XML Sitemaps” by Arne Brachhold and this creates a sitemap.xml and sitemap.xml.gz automatically.
So is it possible to enhance your check to both files and add the zipped version to the virtual robots.txt, if such a file exists?
Sep 17, 2009 at 2:26 pm
@Henning — Glad the plugin is working great. Yes, good idea about the two types of sitemap. Look for an update shortly…
Sep 17, 2009 at 7:11 pm
Wow … fast reaction! Thanks for the update.
Sep 21, 2009 at 8:06 pm
Peter when I installed this I looked at the default values and saw that Google bots were under Disallow. You might want to check the default values in the version over at WordPress.org. I corrected the values to Allow.
Is Allow necessary if the default is to allow all agents with disallow used to prevent bots you don’t want?
Sep 22, 2009 at 3:55 am
@Doug – take a look at http://www.robotstxt.org/orig.html#format for the official format. You can use an Allow statement, but it isn’t universally supported.. and you’re right that it wouldn’t be necessary if all pages are allowed by default.
Oct 4, 2009 at 5:28 pm
Great plugin – superbly done. Thank you.
Not included in your list – just noted this one mentioned on an info site … not sure as to whether I ought to add it to the list …
User-agent: ia_archiver-web.archive.org
Disallow: /
Oct 19, 2009 at 5:25 pm
Hello Peter. Thanks for this great plugin. What would novices like me do without gurus like yourself?!
My blog is in a subdirectory like Mark’s above. Does it mean I need to make a subdomain like http://subdomain.domain.tld to allow bots access my WP blog separate from the rest of the site?
I don’t see how the method you recommended above can work otherwise.
Oct 26, 2009 at 11:22 pm
[...] Robots.txt Tell search engines where they can and cannot crawl. This is an important file that many people forget about, but this plugin will help you through that. [...]
Oct 28, 2009 at 4:18 pm
is there a changelog? if not, there ought to be. some people actually like to know what changed ;)
Nov 8, 2009 at 9:34 pm
Hi, once I install the Robots.txt plugin, do I need to make any changes to the settings? I’m afraid I don’t quite understand what to do once it’s installed!
Nov 8, 2009 at 9:43 pm
Wow! Thanks for the tip. I have been looking for any easy way to do robots.txt all day and this is perfect for someone like me who is not so tech savvy.
Nov 11, 2009 at 9:41 pm
The plugin is not updating my file. When I preview the robot.txt file it still says
User-agent: *
Disallow: /
Do I need to reactivate the plugin or will it change after a period of time?
Nov 12, 2009 at 4:29 am
@Ciuly – when I update the plugin using the wordpress SVN it asks for a note to go with the update, which I always provide. I confess I thought those notes appeared on wordpress.. ah well, I will put one on this page too.. thanks for letting me know..
Nov 12, 2009 at 4:33 am
@Amy – if you don’t understand what the plugin does then you probably don’t need it.. wordpress will work perfectly well without it. If you still want to use it anyway, there’s no need to change any of the settings, just activate it and you’re good to go.
Nov 12, 2009 at 4:34 am
@chris – it looks like you might have your blog privacy settings set to block the search engines. The plugin will only change your robots.txt if your blog is public..
Nov 12, 2009 at 9:12 am
Peter – you were right. It was my privacy settings blocking the search engines. It’s all updated. Thank you.
Nov 25, 2009 at 5:51 am
How do I change my virtual robots.txt on my wordpress blog to allow googlebot to crawl my site? I realized under privacy settings that my blog was not visible to search engines (why is that set to default?) so I checked it to be visible, but now my virtual robots.txt reads as:
User-agent: *
Disallow:
How do I make it allow? I know creating a new robots.txt is not allowed?
Nov 25, 2009 at 6:04 am
@Pet Society – Your robots.txt file is fine. Notice there is nothing after the Disallow: bit, that means nothing is disallowed, or in other words, everything is allowed..
Nov 25, 2009 at 10:14 pm
Oh ok, I was wondering if that made a difference, glad it does. Thank you for your help, great blog. I’ll be sure to keep coming back for my worpress help and info.
Nov 30, 2009 at 7:00 pm
Peter, thanks for the easy to use and follow plug-in! Quick question, since Bing, Yahoo, Ask and other “safe” robots aren’t listed does that mean they automatically have access to crawl the site? I assume it does, just want to make sure. Thanks!
Dec 1, 2009 at 4:31 am
@Greg – Yes, you’ve got it right.. and thank you for the kind words..
Dec 1, 2009 at 11:45 am
I installed your plugin and I am seeing the following among other code when I look at the robots file
User-agent: Googlebot
Disallow:
Does this mean to disallow google? If so, what do to allow google, msn, yahoo and small meta search engines. Thanks.
Dec 1, 2009 at 12:23 pm
@Peter – please read the previous comments on this page – that question has been answered a few times already :-)
Dec 9, 2009 at 4:12 pm
Hi
I am running the latest wp, with google sitemap, as the post last year I get webpage could not be found when I click on preview my file.
I am using the default permalinks, does that cause a problem. Deactivated it then reactivated it.
Any tips would be most appreciated.
Cheers
Jan 11, 2010 at 1:58 pm
Hi Peter, I have installed your plugin, but id doesn’t work, I only get 404 Not Found.
Oh, I use WP 2.8…:(
Jan 13, 2010 at 2:52 pm
I get the 404 error that a couple of others have mentioned, but I can’t seem to resolve the issue. Regardless of whether I manually put a robots.txt or remove robots.txt in my root directory (pagsa.missouri.edu/robots.txt), this plugin does not save changes to the file. I am using the “Google XML Sitemaps” plugin, and have tried using it with and without checking “Add sitemap URL to the virtual robots.txt file”.
Jan 24, 2010 at 10:16 am
Installed easily, but when I click on -
“You can preview your robots.txt file”
(under “PC Robots.txt Settings”)
I get -
“Page Cannot be Found, Please try Search or Return to HomePage”.
Any advice greatly appreciated. (Concerned improper install, might hurt with SEs…
Thanks for the plugin! – Ken
A
Feb 5, 2010 at 10:37 am
Peter, Google’s info on robots.txt says:
If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).
So, if you’re activating the plug-in without making any amendments /blocking page/s will it harm your site in any way from Google’s point of view?
Feb 8, 2010 at 10:00 am
You need custom permalinks for the plugin to work. When you have custom permalinks enabled wordpress will process request for non-existent files (including robots.txt) but if you don’t have custom permalinks enabled wordpress wil never even see the request.. you will just get a 404 not found error..
Feb 8, 2010 at 10:03 am
@Mark — if you don’t have a specific reason to use this plugin, i.e. to block spambots, then you don’t need to use it at all. But saying that, it won’t do any harm if you do use it..
Tell me what you think...
Note: your comments may need to be approved before they are shown.