Current standards for robot exclusion, robots.txt and robots META tag, are effective only for page level robot exclusion directives. In light of Google versus Blog controversy, I think it makes sense to introduce finer-grained robot exclusion tags. Here is a sketchy proposal:
<robots> Tag
<robots> tag advises web robots whether data and links following the tag are indexable and/or followable. <robots> tag may be used in HTML or integrated into XML application-specific schemas. It has only one attribute, 'content', with same case-insensitive values as the content attribute of the robots META tag, specifically ALL, NONE, INDEX, FOLLOW, NOINDEX, NOFOLLOW. 'content' attribute is required. <robots> tag is an empty tag, meaning it has no child elements nor textual contents.
<robots> tag's namespace URI is "http://www.robotstxt.org/xmlns". Namespace declaration is not required in HTML documents.
HTML Example:
<robots content="noindex" />
<table id="blogroll">....</table>
<robots content='index' />
...
XML Example:
<channel>
<robots content='none'
xmlns="http://www.robotstxt.org/xmlns"/>
<title>Don Park's Blog</title>
...
<item>
<robots content='all'
xmlns="http://www.robotstxt.org/xmlns"/>
<description>...</description>
Processing Guidelines
- With HTML documents, robots may search <robots> tags by searching for string pattern "<robots" to find next indexable and/or followable areas of the document.
It is a wireless device that "slowly transitions between thousands of colors to show changes in the weather, the health of your stock portfolio, or if your boss or friend is on instant messenger."