Sifry vs MySQL

While I was reading Jeremy Zawodny's reminder post on the Geek Dinner, I saw a link to David Sifry's post Zen and the Art of Bugfixing from March of last year.  Tempted by the title, I followed the link and found an enjoyable post on David's bout with MySQL.  I use Technorati at least once a day so I am a beneficiary of his bugfix.  Cool.

IMHO, databases should be able to detect such problems and suggest changes.  With performance profile expectations specified by the administrator, databases can easily isolate slow SQL commands and pull a list of suggestions.

If queries are expected to take no more than 1/2 second (which should be plenty for most use case) to execute, it doesn't take a rocket scientist to figure out that there is something wrong with queries that take more than 10 seconds to execute.  Flagging those would give administrators better start at solving the problem than waiting for vague complaints from customers.

Heck, this would make a nice product addon for MySQL and I would be surprised if there isn't one out there already.

Biased Liberal

Robert Scoble asks whether news aggregators should deal with feeds that aren't done properly?  Yes, it's the Postel's Law controversy again.  This is what I commented in reply:

I believe the spec should be strict and implementations should be liberal yet *visibly distinguish* good feeds from bad ones. Badly formed feeds should be displayed with 'broken' icons and posts should be displayed with a header or footer message clearly indicating that the feed data is bad. This should reduce or at least limit proliferation of bad feeds.

I call it the biased liberal approach.

FYI, I am neither liberal nor conservative.  I am also not confused.  What I am is confusing.  ;-p

Benchmarking Languages

Interesting results from a benchmark comparing Visual C++ vs. gcc, Java against .NET languages, and Python:

int
math
long
math
double
math

trig

I/O

TOTAL
Visual C++ 9.6 18.8 6.4 3.5 10.5 48.8
Visual C# 9.7 23.9 17.7 4.1 9.9 65.3
gcc C 9.8 28.8 9.5 14.9 10.0 73.0
Visual Basic 9.8 23.7 17.7 4.1 30.7 85.9
Visual J# 9.6 23.9 17.5 4.2 35.1 90.4
Java 1.3.1 14.5 29.6 19.0 22.1 12.3 97.6
Java 1.4.2 9.3 20.2 6.5 57.1 10.1 103.1
Python/Psyco 29.7 615.4 100.4 13.1 10.5 769.1
Python 322.4 891.9 405.7 47.1 11.9 1679.0

.NET languages performed surprisingly well although I have my doubts about the benchmark.  For example, why would C# I/O perform better than VB I/O?

BTW, I heard that there will be a benchmark contest between CPython and Parrot at this year's O'Reilly's Open Source Convention.  I think Parrot will win by a mile.  Why?  Because CPython performance is well known where Parrot's performance is not.  Why would the Parrot team enter a benchmark contest they know they will lose?

Update:

I updated the link to the benchmark to point to the paginated version.  Apparently, the printable version is not meant to be linked to directly.

Stressing ASP.NET

I ran Microsoft Web Application Stress Tool on this blog which uses a variation of dasBlog, an ASP.NET wepapp, running on 2GHz P4 with 1G of memory and the initial result isn't too bad even though ASP.NET caching is off.  Throughput was about 90 pageviews per minute (17 hits/s at 200K/s) over an hour with only a handful of connection failures.  It's not mindblowing performance, but it doesn't stink either considering my server is a low-end box that cost about $800 to put together.

Going through my logs using Urchin, I see that about 7000 pageviews per day were served on the average for the first 12 days of January.  So this blog is using about 5% of the server's capacity.  January is a slow month so I think 7%-10% seems like more reasonable.  Hmm.

Postel and Smart Engine

I agree with Tim Bray that there are indeed exceptions to Postel's Law.

As to Mark's post on this issue and others, I want to know what he is smoking.  Right after dismissing a suggestion because not everyone can set HTTP headers, he claims all HTTP headers can be replicated at document level using HTML meta tag because the HTTP spec said so.  Even if it did (it doesn't), anyone with at least one foot on the ground should know better than to expect the reality to support that sort of broad requirement.  Yeah, I want some of what he is smoking.

An out-of-the-blue comments about intelligence:

Being highly intelligent just means there is a powerful engine under the hood.  What one does with the car is an entirely different matter.  And, of course, you need gas to power it.  What powers your engine?

FlexWiki and nGallery

I have been busy with work last few days, but now I got some time to tinker some more.  Beside dasBlog, which is driving this blog, I am playing with FlexWiki and nGallery.  End result will likely be sum of all three along with some of the ideas I have been playing with.  For example, self-organizing wiki pages and entries is interesting.  The ideas are: if anyone can change a wiki, why not the wiki itself?  People make mistakes when they use a wiki, so why can the wiki be as sloppy?

Wiki can be programmed to place higher value on newer entries over older ones and use this information to change size or color of the font used to render the entry, change its position within a page, or inject links into other relevant pages.  Likewise, popular pages and entries should be placed or rendered more prominently because not all information are equal in value.

In a way, it's like mixing wiki with the game of Life.  Information struggling to survive.  Hmm.  I like that.

Imageless RSS Feed Icon for Blogroll

Orange icon typically used for RSS feeds is nice but I had two problems with it:

First, it was too distracting when there are many of them together (i.e. blogroll).  Second, IE 6 sometimes fail to load such small images used many times in a single page.

Following is a modified version of a CSS-based solution suggested by Richard Soderberg which you can find in the comment section of my IE6 SP1 Small Image Loading Bug post.  This version has following changes:

  • Size was reduced.
  • Colors were toned down and inversed to reduce distraction.
  • Original look is restored when mouse is hovering over it.

CSS fragment follows:

.feedIconStyle {
 font-family: arial, helvetica;
 font-size: 8px;
 font-weight: bold;
 text-decoration: none;
 border: 1px solid;
 padding: 0px 2px 0px 2px;
 margin: 0px;
 vertical-align: middle;
}

.feedIconStyle, .feedIconStyle:link, .feedIconStyle:visited, .feedIconStyle:active {
 color: #FF9966;
 background-color: white;
 border-color: #FF9966;
}

.feedIconStyle:hover {
 color: white;
 background-color: #FF6600;
 border-color: #FFC8A4 #7D3302 #3F1A01 #FF9A57;
}

Usage Examples:

<a href=“blah“ class=“feedIconStyle“>XML</a>

<a href=“blah“ class=“feedIconStyle“>RSS 2.0</a>

Visit my blog to see how they look.  I use the logo in my blogroll.

Blog cleanup continues

I finished adding feed URLs to my blogroll and cleaned up the design somewhat.  Some bugs were fixed too.  On optimization front, regular expressions are no longer being instantiated on the fly but compiled and shared when the webapp launches.  Page templates are still being built on every request, but that will get fixed too.  Performance probably improved unbelievably from the original code but I think I can push it much closer to the bare metal before I am pushing up hill.

Biggest chore ahead is removing unnecessary use of ASP.NET controls and ViewState abuse.  ViewState is on by default and I have to check each page to make sure viewstate can be turned off.  For pages that does need to use viewstate, I'll have to inject EnableViewState=false on all form controls except for those that actually need it.  ViewState is one of those features that is both a blessing and a curse.

BTW, there is no permalink image (#) any more because I use post title as the permalink.  For ease of use, I prefixed all the titles with '#' to indicate that title links are permalinks.

More DasBlog Hacking

Although my recent optimization changes improved DasBlog performance drastically, enough for me to defer switching from file-based storage to MySQL or Berkeley DB XML, I had to do something about RSS feeds being generated dynamically.

These feeds don't change unless content changes, so it makes more sense to generate them only when content changes.  Besides, I needed to preserve URLs to my feeds so I needed a way to detect change to content automatically and force regeneration of those feeds.

To do this, I used System.IO.FileSystemWatcher to watch DasBlog's content directory and fire ContentChanged events which RSS builders and secondary data structures like CategoryCache and EntryIdCache instances listen to and react accordingly.  Now I can make changes directly to files in the content folder and those changes are reflected in both the web pages and the feeds.  Nice.

Classes like FileSystemWatcher has to be used carefully because many events firing over a short span of time will force unnecessary updates.  For protection, I used delayed update as a cheap event folding mechanism.  This technique doesn't protect against recursions which happens when a ContentChanged event listener makes change to the content directory.

Swimming in a pool of XmlSerializer

While I was looking at the data storage portion of DasBlog for improvements, a low hanging fruit hit me on the forehead: XmlSerializer.  XmlSerializer is a major feature of .NET Framework that makes it really easy to read and write data structures from and to XML.

To use XmlSerializer, all you have to do is mark some members of the data structures with .NET attributes — .NET attributes are just source code comments for programs instead of people – that provide hints like this member should be an element and the element name should be such and such.

Ease of use encourages ease of abuse.

XmlSerializer is a rather complex beast so the class takes a while to instantiate.  What hit me on the forehead was that DasBlog was using it everywhere and XmlSerializer was being instantiated substantial number of times per request: a juicy low hanging fruit indeed.

So I implemented XmlSerializerFactory which maintains a pool (as in car pool) for XmlSerializer instances and rewired DasBlog to use it whenever it needs an XmlSerializer.  There are actually multiple pools because XmlSerializer instances are target data structure specific.

All that took no more than 30 minutes but the result was very rewarding.  DasBlog with XmlSerializer pooling was noticeably faster.  My guestimate is that it's 20%-30% faster depending on the page type being accessed.  Woohoo!

Update:

It turns out that .NET 1.1 version of XmlSerializer uses an internal cache already although it depends on which constructor is used.  So the speed improvement I saw was either a hallucination or due to some other changes I made.