Universal Robots tag

Current standards for robot exclusion, robots.txt and robots META tag, are effective only for page level robot exclusion directives.  In light of Google versus Blog controversy, I think it makes sense to introduce finer-grained robot exclusion tags.  Here is a sketchy proposal:

<robots> Tag

<robots> tag advises web robots whether data and links following the tag are indexable and/or followable.  <robots> tag may be used in HTML or integrated into XML application-specific schemas.  It has only one attribute, 'content', with same case-insensitive values as the content attribute of the robots META tag, specifically ALL, NONE, INDEX, FOLLOW, NOINDEX, NOFOLLOW.  'content' attribute is required.  <robots> tag is an empty tag, meaning it has no child elements nor textual contents.

<robots> tag's namespace URI is "http://www.robotstxt.org/xmlns&quot;.  Namespace declaration is not required in HTML documents.

HTML Example:

  <robots content="noindex" />
  <table id="blogroll">....</table>
  <robots content='index' />
 
...

XML Example:

 <channel>
<robots content='none'
xmlns="http://www.robotstxt.org/xmlns"/>

<title>Don Park's Blog</title>
...
<item>
<robots content='all'
xmlns="http://www.robotstxt.org/xmlns"/>
<description>...</description>

Processing Guidelines

  • With HTML documents, robots may search <robots> tags by searching for string pattern "<robots" to find next indexable and/or followable areas of the document.