Universal Robots tag

Current standards for robot exclusion, robots.txt and robots META tag, are effective only for page level robot exclusion directives. In light of Google versus Blog controversy, I think it makes sense to introduce finer-grained robot exclusion tags. Here is a sketchy proposal:

<robots> Tag

<robots> tag advises web robots whether data and links following the tag are indexable and/or followable. <robots> tag may be used in HTML or integrated into XML application-specific schemas. It has only one attribute, 'content', with same case-insensitive values as the content attribute of the robots META tag, specifically ALL, NONE, INDEX, FOLLOW, NOINDEX, NOFOLLOW. 'content' attribute is required. <robots> tag is an empty tag, meaning it has no child elements nor textual contents.

<robots> tag's namespace URI is "http://www.robotstxt.org/xmlns". Namespace declaration is not required in HTML documents.

HTML Example:

  <robots content="noindex" />
  <table id="blogroll">....</table>
  <robots content='index' />
  ...

XML Example:

 <channel>
 <robots content='none'
 xmlns="http://www.robotstxt.org/xmlns"/>
 <title>Don Park's Blog</title>
 ...
 <item>
 <robots content='all'
 xmlns="http://www.robotstxt.org/xmlns"/>
 <description>...</description>

Processing Guidelines

With HTML documents, robots may search <robots> tags by searching for string pattern "<robots" to find next indexable and/or followable areas of the document.

Don Park's Weekly Habit

Well, sorta weekly.

Universal Robots tag

Don Park's Weekly Habit

Well, sorta weekly.

Share this: