Many websites use robots.txt to block Google to index their pages and thus preventing the pages from showing up in the search engine. But the fact is that robots.txt doesn’t actually do the latter, even though it does prevent your site from being indexed. Before getting back to the explanation why Google does that? Lets have a view at some basic terms involved:-
Indexed / Indexing – The process of downloading a site or a page’s content to the server of the search engine, thereby adding it to it’s “index”.
Ranking / Listing – Showing a site in the search result pages (aka SERPs).
So when the process of indexing to listing, it is not necessary that the site should be indexed to get listed. If a link points to a page domain or wherever, that link will be followed. If you block that page from robots.txt still that page will be followed and will be listed in the search results. Here is what Matt Cutts explains why a page that is disallowed in robots.txt may still appear in Google’s search results.
So if you want to effectively hide any pages from appearing in the search results, you need them to get indexed. When search index those pages you can tell them not to list them. The tag below does that for you.
<meta name="robots" content="noindex,nofollow"/>
You need to copy them to all your pages which you don’t want to get indexed by search engines. In WordPress there is a Robots Meta option in the Edit Post page in the right hand side column (under the categories). To not list them in search engines, just select noindex, nofollow.
If you have something to ask or share, do add you comment.