Noindex, Nofollow and Disallow: What are They?

5 min readNov 19, 2018

Do you own a site or manage one for a company? Ever used the search engine on a particular time trying to check your site index, then realised there was a page you did not want search engines to display?

‘How did I?’ you might have exclaimed!

Should you ever find yourself in this unwitting situation, there is a way to write instruction for web crawlers (also known as web bots or spiders) that will stop them from crawling and indexing your web pages or links.

Noindex, nofollow and disallow are three attributes used for giving instructions to web crawlers on how to process web pages. They all perform similar functions, but in different formats. Moreover, they are part of the Robots Exclusion Standard established in 1994.

Now, I will explain what these three attributes mean one after the other, and state why they are useful.

What is noindex?

This is a meta tag attribute used within the head section of HTML markup to give instruction to search engine bots not to index the web page where the tag is specified. Specifying this attribute means that, bots can crawl the page, but will not index it to be displayed on search engine results.

Why use noindex?

Using noindex gives site owners the authority to block certain pages on their sites from being part of search engine result pages.

Some of the reasons to use noindex are: it can help prevent duplicate contents; prevents bots from indexing private pages; denies bots the access to index clients / customers pages; prevents crawlers from indexing the ‘thank you’ page, privacy page or terms & condition page.

How to set noindex instruction

<head>
  <title>Submitting A Sitemap to Google Search Console</title>
  <meta name="robots" content="noindex" />
</head>

When this line of code is placed on the head section of your site source code, web crawlers immediately refrain from placing the page on search engine index.

What is nofollow?

This is also an attribute used in giving instruction to search engine bots, but it can be used for two purposes. The first purpose is to disavow certain link(s) within a web page. And the second is to instruct bots not to follow some particular links within a web page.

For instance, if you have several links within your web page pointing to other sites, bots will see those links as credible source of information, therefore give credit to those links. So, one way to disconnect your web page with any links is to set nofollow attribute.

Why use nofollow?

This attribute is usedful against spam links placed on the comment section of your web page. Another good reason is to stop bots from indexing pages such as: paid links, clients / customers page, staff page or sign in / register page etc., that appears as links within your web page.

How to set nofollow instruction

<head>
  <title>Submitting A Sitemap to Google Search Console</title>
  <meta name="robots" content="nofollow" />
</head>

Alternatively, you can use this example.

<a href="http://www.example.com" rel="nofollow">Link text here</a>

The first example above means that all links within the web page should be treated with nofollow directive. And the second example specifically points to the link where nofollow should apply but not the whole links on that page.

Get more information from Google about where and how to use this attribute.

What is disallow?

This function is used in robots.txt file to instruct search engine bots not to index certain folder or file within a site. Wherever bots see disallow, they immediately back away from indexing that folder / file.

Why use disallow?

The reason is similar to that of noindex. The only difference is the format used in giving the instruction. It also can help prevent duplicate contents, prevents indexing private pages, prevents indexing clients or customer pages.

How to set disallow instruction

To give any instruction with disallow, you need to create a text file with the file name robots.txt. Then the instruction will be passed through the file.

To create this file type, open a text editor, and type in the appropriate directives. When finished, save the file as robots.txt. The file name and extension is what tells web crawlers the type of document it is.

Next, login to your cpanel account and upload the file to the root directory of your site. Below are examples of how instructions can be passed.

Gives access to every bot to crawl and index all the pages on your site.

User-agent: *
Disallow:

Instructs bots not to index any page on the entire site.

User-agent: *
Disallow: /

Denies bots the permission to index all pages in the specified folder(s).

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /pages/

Denies bots the permission to index a particular page within a specific folder.

User-agent: *
Disallow: /pages/staff.html

To view your site robots.txt file, use this example.

http://www.calistus.net/robots.txt

NOTE: when using this file, be absolutely careful. Make sure you understand the instructions you are passing. Else, you may accidentally block your site from showing up on search engines.

For more information about web robots and how they work, visit Robotstxt site for full details. If you have Google Search Console account, you can test the instructions you have provided through this link.

Conclusion

When all three attributes are properly used, it keeps the privacy of site owners out of public reach. In addition, site owners get to avoid giving validity to untrusthy web pages.

In a nutshell, the noindex attribute instructs bots not to index specific web page; nofollow is to disavow one or more links; disallow is used to instruct bots not to index web page(s), folder(s) or an entire site.

For the purpose of SEO ranking, it is necessary to block any page or link with no valued information for your target audience. Displaying pages with no values counts towards the overall ranking of your site.

Finally, in case you do not know how these tags and file work, seek the help of an expert. Feel free to contact me if you have concerns. I will be glad to assist in any way necessary.