Current standards for robot exclusion, robots.txt and robots META tag, are effective only for page level robot exclusion directives. In light of Google versus Blog controversy, I think it makes sense to introduce finer-grained robot exclusion tags. Here is a sketchy proposal:
<robots> Tag
<robots> tag advises web robots whether data and links following the tag are indexable and/or followable. <robots> tag may be used in HTML or integrated into XML application-specific schemas. It has only one attribute, 'content', with same case-insensitive values as the content attribute of the robots META tag, specifically ALL, NONE, INDEX, FOLLOW, NOINDEX, NOFOLLOW. 'content' attribute is required. <robots> tag is an empty tag, meaning it has no child elements nor textual contents.
<robots> tag's namespace URI is "http://www.robotstxt.org/xmlns". Namespace declaration is not required in HTML documents.
HTML Example:
<robots content="noindex" />
<table id="blogroll">....</table>
<robots content='index' />
...
XML Example:
<channel>
<robots content='none'
xmlns="http://www.robotstxt.org/xmlns"/>
<title>Don Park's Blog</title>
...
<item>
<robots content='all'
xmlns="http://www.robotstxt.org/xmlns"/>
<description>...</description>
Processing Guidelines
- With HTML documents, robots may search <robots> tags by searching for string pattern "<robots" to find next indexable and/or followable areas of the document.