Human Nature and Metadata

In Metadata, Semiotics, and the Tower of Babel, Tim Oren rants against some of the assumptions behind the metadata bubblet which Joi Ito hinted at in his If I were Microsoft post.  It is kind of ironic that Tim used the Semiotic wand to blast the Shangri-la illusion off the Tower of Metadata Babel because people had many different interpretations of Joi's post.

Symbols are just containers for semantics.  If you put cookies in one, people know them as jar with cookies and identify it by its shape and location.  As soon as you have one jar, you will have many so people invented ways to refer to each jar by name (cookie jar) and distinguish it by sight (it has a picture of cookies carved on it).

Semiotics is all about problems of jar selection.  While size and shape of a jar limit what can be stored inside it, there are enough leeways for people to use the jars for wide variety of purposes.

In Korea, there is a specially shaped jar for urine so one doesn't have to go outside in the middle of the night to take a leak.  It's a beautifully shaped jar.  Now imagine what would happen if an American came upon the jar.  Chances are pretty good that he would see a cookie jar, a jarring example of semiotics.

So Joi was musing about Microsoft monopolizing the emerging jar market to gain an upper hand on Google which cornered the market on baskets.  Tim disagrees because people can put anything in jars regardless of the picture or label on the jar which will cause confusion like the Ecstasy test result mixup.

I am somewhere in between.  I believe two common forms of human nature, conformity and mimicry, allow sufficiently large and reasonably coherent set of metadata to be created, if not by design, then by popularity (see Emergent Markup Languages).

Web Server Performance Myths

Here is a recent semi-public paper on web server performance mentioned in a message to Tomcat developer mailing list.  Download article.zip and read the PDF file inside the zip file.  It has some interesting discussion about web server performance myths.  Here is an choice excerpt:

[…] yahoo gets 1.5 billion pageviews a day. […]

Yahoo uses 4,500 server to serve up 1.5 billion pageviews each day. If we divide that by the number of seconds in a day, we get 17,361 pageviews per second. Assuming the load is distributed evenly across the servers, each server handles 3-4 pageviews per second per system.

One of the key points the paper stresses is the performance/value offered by hardware XML accelerators for XML-happy web applications.  There are other choice bits in the paper, so check it out before the authors take it offline.

Getting Better

ServInt came back up and behaving much better than before.  I sure hope they solved the problem permanently.  Anyway, it was stable enough for me to go ahead and setup five accounts on them.  I had enough of playing admin for the day so created an amusing Under Construction page and closed shop for the day.

I am not looking forward to setting up Tomcat though.  Last version of Tomcat I used was a unstable, slow, and a resource leech.  cPanel has an installation package for Tomcat but there are some conflicting advices about whether to use it or not.  Maybe I'll just go for the latest Tomcat 5 beta after checking for stability issues on Tomcat mailing lists.  It's supposed to be much faster than Tomcat 4.  I did review Tomcat 4 JSP engine and it was pretty sloppy with resources, allocating objects left and right for trivial stuff.  Meanwhile, two domains in transit are still in transit.  Domain registars seems to be pretty slow at letting domain names go.

I did find some good dedicated server deals at ServerMatrix.  $79/month for a dedicated server is yummy even if its a 1.4 GHz Celeron box.  40G of storage is not that exciting but 750GB per month bandwidth sure is.  I like the SuperServer package more but the 2.4 GHz Pentium 4 it comes with is not the version with HyperThreading so I think I wait until that becomes available.  At $150 per box, I can setup two web servers, two application servers, and two database servers for $900/month.  Woohoo!

Update #1:

It turns out the cPanel Tomcat package is unsupported and hasn't been updated for a year.  There are also some confusion of mod_perl support also.  mod_perl is listed as being loaded but it didn't work.  So I tried installing it via RPM and got errors like Apache is not installed.  Huh?  When I went over to Update Apache section of cPanel to update Apache with mod_perl, there was this short note next to mod_perl checkbox:

not required to run .cgi scripts/not compatible with php

First part is pretty funny.  I'll bet some sales person put that up.  So I have to disable PHP to use mod_perl?  Pecking around various webhosting forums and cPanel forums, I found even more confusing messages.  It seems cPanel itself and mod_perl don't get along two well.  What kind of crazyness is that?  There were other weird problems too.  Perl is installed but PPM was not.  To install a Perl module, I have to use cPanel to do it.

cPanel is looking more and more like Kathy Bates in Misery.

ServInt and UI Design Rant

So far, I am not very happy with my ServInt VPS account.  It's freaky slow for some reason and the machine seems to be down half the time I tried to access it.  When it's up both SSH and cPanel was dog slow.  If things don't improve within a week, I am going to get a dedicated server elsewhere.

As I mentioned before, cPanel is pretty popular in the webhosting business.  What I didn't know was it's price tag: $1500!  Just look at this screenshot and tell me if this is software worth $1500.  Notice the crappy layout.  Font size on the default cPanel theme was fixed too.  List of commands were ordered like the content of a grocery bag too.  I also looked at other players in this market to see if they are any better.  Elsim and Plesk has cleaner looking UI, but were about same as cPanel: difficult to use.

Most engineers think replacing command-line UI with form-based UI makes it easy to use.  All that does is making the parameters visible which is no where near explaining what the command and parameters are, when they are useful, what the expected values are, explaining side effects and other issues.  Even a simple term like IP and name server should be linked to full explainations so the users can learn as they use the product.

I think a client-side, platform-independent, super easy to use, extensible Linux management console selling for $100 per user will do really well.  It should also have integrated extensible P2P help system for sharing knowledge and helplets.  To use it, copy a directory over, run it on your desktop, and point it to a server machine.  The program then conects via SSH and scans the machine to see what and where things are and configures it the way user wants, downloading and installing necessary packages after checking online issue database and validating the packages.  One could potentially sell one for every Linux box out there.

Alternate Definition of Blogged

blogged (blôgd)

adj.

  1. Hard to find in Google because of blog noise.
  2. Saturated with blog posts.

tr.v

  1. To make hard to find in Google by blogging
  2. To push off the first page of Google result
  3. To raise price of keywords by increasing noise

If you have a popular blog and you don't like certain companies or products, just blog about them without linking to the company or the products.

Pass it on.

PS: I retract the solution suggested in Neutralizing Blogger Effect on Google and the Robots tag idea.  I now see that they will not work because there are no tangible incentives for people to implement them.

Growing Pains

I spent most of today trying to configure my new ServInt account, moving domain names, and changing name server settings.  It's a quite a learning experience/torture.  ServInt account is a VPS account which is just a step below a dedicated server.  Such an account is used by resellers to divide up the bandwidth and storage for sale to those who just wants modest bandwidth and storage.  So it was configured bareass.  Gee, where is my copy of Linux for Dummies?

What I can't figure out is how in the world software like cPanel became popular?  What is the point of putting up user-friendly forms for Linux server administration if you are not going to explain anything about what the forms does, explain input being requested, and what the effects and possible problems will be?  Sure, cPanel has documentation but it's just as skimpy.  It goes something like this,

To assign name server IP, enter IP address for the name server in that box and press Submit.

When you scratch your head and enter an IP address, it will say something like X is assigned to Y.  Huh?  Where did Y come from?  With all that space left empty on the form, why do you need a separate documentation at all?  Why not just include the documentation on the form and turn the form into a Wiki so I can made notes on it and share it with others?

While it's easy to blame the Unix mentality, Windows software aren't much better.  When was the last time you saw a dialog or a wizard that explained itself to your satisfaction?  Help button on those dialogs are common but what is the use when there are idiotic text that rarely say anything beyond the self-evident.

Excuses I often hear are:

  • It's obvious.
  • I am not a GUI engineer.
  • What do you expect from open source?
  • File a bug report.

Most geeks are just too obsessed with adding new nifty features than polishing what they already have more accessible.