Dealing with the Good, the Bad, and the Ugly Feeds

It's amusing to me that so many bright engineers are fighting inside a paperbag over the issue of what to do with bad feeds.  For some reason, probably human nature, they have limited themselves into finding a solution at the spec and parser implementation level when there is no solution there and the discussion has devolved into exchanging “is so“ and “is not“.

The Biased Liberal post offers a solution at the UI level without irritating the user.  The problem can also be solved at marketing and legal level although I favor the UI level solution.

One rather amusing (?) thought I occasionally use to pull myself out of a hole is that the ultimate solution to every problem is world destruction.  All right, it's not funny, but it does shake me out of the box.  As to the implementation, it's easy if you believe a tree falling without an observer makes no sound.

Update:

Looks like the XML-DEV got thrown into the paperbag as well.  There is now a long ongoing thread on Postel's Law, exceptions, and what to do with bad feeds.  Elliotte Rusty Harold did mention my Biased Liberal solution, but XML-DEVers seems to prefer talking about parsers, specs, what the world wants and needs, theories, history, scenarios, etc.  Well, they seem to be enjoying themselves so let's not bother them.

Sifry vs MySQL

While I was reading Jeremy Zawodny's reminder post on the Geek Dinner, I saw a link to David Sifry's post Zen and the Art of Bugfixing from March of last year.  Tempted by the title, I followed the link and found an enjoyable post on David's bout with MySQL.  I use Technorati at least once a day so I am a beneficiary of his bugfix.  Cool.

IMHO, databases should be able to detect such problems and suggest changes.  With performance profile expectations specified by the administrator, databases can easily isolate slow SQL commands and pull a list of suggestions.

If queries are expected to take no more than 1/2 second (which should be plenty for most use case) to execute, it doesn't take a rocket scientist to figure out that there is something wrong with queries that take more than 10 seconds to execute.  Flagging those would give administrators better start at solving the problem than waiting for vague complaints from customers.

Heck, this would make a nice product addon for MySQL and I would be surprised if there isn't one out there already.

Geek Dinner Tommorrow

Tommorrow, I will be attending the Geek Dinner Jeremy Zawodny is organizing for Tim Bray.  See Jeremy's original post for time, location, and direction.

Update:

Well, it's tommorrow now and I am afraid I won't be able to attend the dinner on account of being buried under a mountain of unstable code.  I am trying to dig out as fast as I can but I have a feeling I will be seeing sunrise instead of sunset by the time I get out.  Argh! 

Biased Liberal

Robert Scoble asks whether news aggregators should deal with feeds that aren't done properly?  Yes, it's the Postel's Law controversy again.  This is what I commented in reply:

I believe the spec should be strict and implementations should be liberal yet *visibly distinguish* good feeds from bad ones. Badly formed feeds should be displayed with 'broken' icons and posts should be displayed with a header or footer message clearly indicating that the feed data is bad. This should reduce or at least limit proliferation of bad feeds.

I call it the biased liberal approach.

FYI, I am neither liberal nor conservative.  I am also not confused.  What I am is confusing.  ;-p

MP3 Player 2 Player

I haven't been keeping track of what is going on in the MP3 player market, but I haven't seen advertisements for MP3 devices that can communicate with each other.  Why bother downloading music from the Internet when you can just get them from friends you meet or passing strangers (via Bluetooth).  Are there such MP3 devices?

Everyone of the MP3 devices has some means to upload music into it so all you need is a device that either download or resample from one MP3 player and upload into another.  Eventually, you won't even need the bridging device when MP3 songs can be beamed across devices with a single button.

Hmm.  That turns each MP3 devices into a inbox of sort.  Receiving messages from secret admirers and ads from street vendors this way could be a little surprising though.

Benchmarking Languages

Interesting results from a benchmark comparing Visual C++ vs. gcc, Java against .NET languages, and Python:

int
math
long
math
double
math

trig

I/O

TOTAL
Visual C++ 9.6 18.8 6.4 3.5 10.5 48.8
Visual C# 9.7 23.9 17.7 4.1 9.9 65.3
gcc C 9.8 28.8 9.5 14.9 10.0 73.0
Visual Basic 9.8 23.7 17.7 4.1 30.7 85.9
Visual J# 9.6 23.9 17.5 4.2 35.1 90.4
Java 1.3.1 14.5 29.6 19.0 22.1 12.3 97.6
Java 1.4.2 9.3 20.2 6.5 57.1 10.1 103.1
Python/Psyco 29.7 615.4 100.4 13.1 10.5 769.1
Python 322.4 891.9 405.7 47.1 11.9 1679.0

.NET languages performed surprisingly well although I have my doubts about the benchmark.  For example, why would C# I/O perform better than VB I/O?

BTW, I heard that there will be a benchmark contest between CPython and Parrot at this year's O'Reilly's Open Source Convention.  I think Parrot will win by a mile.  Why?  Because CPython performance is well known where Parrot's performance is not.  Why would the Parrot team enter a benchmark contest they know they will lose?

Update:

I updated the link to the benchmark to point to the paginated version.  Apparently, the printable version is not meant to be linked to directly.

Stressing ASP.NET

I ran Microsoft Web Application Stress Tool on this blog which uses a variation of dasBlog, an ASP.NET wepapp, running on 2GHz P4 with 1G of memory and the initial result isn't too bad even though ASP.NET caching is off.  Throughput was about 90 pageviews per minute (17 hits/s at 200K/s) over an hour with only a handful of connection failures.  It's not mindblowing performance, but it doesn't stink either considering my server is a low-end box that cost about $800 to put together.

Going through my logs using Urchin, I see that about 7000 pageviews per day were served on the average for the first 12 days of January.  So this blog is using about 5% of the server's capacity.  January is a slow month so I think 7%-10% seems like more reasonable.  Hmm.

Postel and Smart Engine

I agree with Tim Bray that there are indeed exceptions to Postel's Law.

As to Mark's post on this issue and others, I want to know what he is smoking.  Right after dismissing a suggestion because not everyone can set HTTP headers, he claims all HTTP headers can be replicated at document level using HTML meta tag because the HTTP spec said so.  Even if it did (it doesn't), anyone with at least one foot on the ground should know better than to expect the reality to support that sort of broad requirement.  Yeah, I want some of what he is smoking.

An out-of-the-blue comments about intelligence:

Being highly intelligent just means there is a powerful engine under the hood.  What one does with the car is an entirely different matter.  And, of course, you need gas to power it.  What powers your engine?