Emergent Markup Languages

Believe it or not, we live in a technologically backward civilization.  Everyday, hundreds of millions of people turn on their computer and enter a long sequence of characters into it, sequence which computers store, networks transport, and search engines index without understanding what they mean.

Markup languages such as XML can add meaning to those sequences of characters so computers can process them more intelligently, like differentiating prices from order numbers or Don Park from Donner Park.  But XML has, so far, failed to deliver on its promises.

Two great pitfalls of XML are:

  1. need for centralized control over creation and evolution of schemas
  2. high cost of developement, standardization, and education in time and resources

For some piece of data to be marked up, some one has to first define, document, publish, and publicize the schema.  After that the fun part starts: dealing with competing schemas and standardization process.  After standardization, millions of users have to learn how to use the new schema.  Whole lot of work and wait, for all parties involved, just to see a glimpse of XML Heaven.

What if we can skip all that?  What if people markuped content using their own names and structures, not those dictated by the central committee?  Will the resulting chaos be unsurmountable?  I believe not.  I believe that constraints and mechanisms inherent within human languages and social structures lead to what I call Emergent Markup Languages, common tags and structures that emerge from natural behaviors of individuals following simple rules like "call it what it is" and interacting with their immediate surroundings and neighbors.

Initially, Emergent Markup Languages will be most useful in marking-up salient parts of free-form textual content: phone numbers, addresses, numbers, names, etc.  Doing so will lead to abundance of fine-grained data we can harvest and process.  Its not quite Semantic Web yet, but we'll be much further along than before.