Swimming in a pool of XmlSerializer

While I was looking at the data storage portion of DasBlog for improvements, a low hanging fruit hit me on the forehead: XmlSerializer.  XmlSerializer is a major feature of .NET Framework that makes it really easy to read and write data structures from and to XML.

To use XmlSerializer, all you have to do is mark some members of the data structures with .NET attributes — .NET attributes are just source code comments for programs instead of people – that provide hints like this member should be an element and the element name should be such and such.

Ease of use encourages ease of abuse.

XmlSerializer is a rather complex beast so the class takes a while to instantiate.  What hit me on the forehead was that DasBlog was using it everywhere and XmlSerializer was being instantiated substantial number of times per request: a juicy low hanging fruit indeed.

So I implemented XmlSerializerFactory which maintains a pool (as in car pool) for XmlSerializer instances and rewired DasBlog to use it whenever it needs an XmlSerializer.  There are actually multiple pools because XmlSerializer instances are target data structure specific.

All that took no more than 30 minutes but the result was very rewarding.  DasBlog with XmlSerializer pooling was noticeably faster.  My guestimate is that it's 20%-30% faster depending on the page type being accessed.  Woohoo!

Update:

It turns out that .NET 1.1 version of XmlSerializer uses an internal cache already although it depends on which constructor is used.  So the speed improvement I saw was either a hallucination or due to some other changes I made.

Preserving Permalinks

An irony of the blogosphere is that permalinks are not permanent.  Whenever a blog changes service or software, its permalinks breaks.  While breaking of permalinks is not worth crying over, it's pretty annoying because internal links break as well.

Unless you are prepared to get your hands dirty, changing blog service or software means you are better off leaving your old posts where they are.

Having your own domain name doesn't protect you either if you decide to switch blogging software.  This is why bloggers are leaving a trail of blogs behind them like a breadcrumbs.  Nice huh?  This is the situation I am in and I did get my hands dirty by writing an ASP.NET HttpModule to redirect date-based Radio URLs to DasBlog's URL format.

It's working pretty well except for the anchor part of URLs which is used by Radio to pinpoint a post in a page containing  multiple posts.  Since those anchors are not sent to the server, I can't map a Radio URL to a page dedicated to a single post.  Oh, well.  At least my permalinks are permalinks.

My next task in the blog transition is building a flexible RSS feed service framework while preserving old feed URLs.

Upcoming Changes

I am in the process of moving my blog to a personal mutation of DasBlog.  Already moved all the post and comments over but there are lots of changes to make in DasBlog.  I chose DasBlog because it gives me a good excuse to play with ASP.NET and it offers relatively easy migration from Radio.  It's pretty slow and code itself is not exactly pretty but I can tear into it easily enough.  At this point, I am gearing up to replacing its file-based storage with either Berkeley DB XML or MySQL.

Visual design-wise, I am going to be replacing the calendar with tabs to display the blog in daily, weekly, or monthly views.  Daily views will show posts spanning days.  Weekly view will show post titles spanning weeks.  Monthly views will show post titles spanning a year.  Search is already there but I'll have to checkout its performance before deciding to replace it or not.  And then there is the legacy URL preserving code so old links don't break.  Lots of enjoyable headaches ahead.

I am switching ISP as well, so don't be surprised if the blog becomes unavailable for a few days.

eSignHere.com

I found a wonderful gift for myself today, a nice domain name for a pet project I have been working on: eSignHere.com.  It's a great name and it cost me only $7.95 per year.  What's is the project about?  It's about making e-signing as easy as picking one's nose.  Try selling that to your board.  Ho Ho Ho!

.NET Pervert

There is something pathetic about getting enthusiastic over a geeky article on Christmas Eve but, if you are a .NET pervert, you should read Aleksandr Mikunov's Rewrite MSIL Code on the .NET Framework Profiling API.  The article doesn't say whether the technique still works under .NET 1.1, but it looks promising.

While mutating and hooking managed code is not exactly encourageable behavior, sometimes you have to do it and you can't argue with Gotta.

Learning English Virtually

Learning English is a big deal outside America.  For Koreans, whether or not you speak English affects your career.  English is taught in school but learning English in America is considered to be essential to properly learn English.  So kids of all ages are sent to America.

So I started thinking about a cheap solution.  I thought about a variation of a Rent-A-Sub idea I had long time ago that lets anyone connected to Internet control a little remotely controlled submarine.  You get a little mobile robot with video camera and speakers that a lets Internet users control.  Imagine little robots running around town trying to engage in conversation with townfolks.  There will be lots of problems, but lots of fun also.

More realistic solution is to build a virtual world designed for non-English speakers to learn English by having real world conversations.  Wanna experience MacDonald?  Drive there and order a burger.  NPCs are part-timers who are asked to type-in what they are saying to help students understand what you are mumbling.  Wanna learn what to say in a car accident?  Smash your car into another car and get into a screaming match in English.  Quickest way to learn foul language?  Go hang out at the 'Hood where tough NPCs will rough you up so you can learn English.

For this kind of service, $200 a month subscription is not expensive considering how much other options for English students cost.

Real Speech Recognition

Speech recognition continues to get better and labor cost keep rising, but, as an engineer in habit of jumping 'out of the box' like Jack does, I like to think about alternative solutions.  Here is one that is amusing.

I walk into a little corner restaurant in Paris to have lunch.  As the waitress comes over, I flip-open my cellphone and press a button that was programmed to connect to a translator service which was part of my vacation package.  When the waitress opens her mouth, I point the phone to her.

Francine is sitting in front of her computer and writing into her blog.  She lives in Chicago and works part-time as an on-demand translator.  When her computer beeps and pops open a window, she is looking at the waitress opening her mouth to say something.  She takes a quick glance over to the side of the popup and sees basic info on the client.  His name is Don and his location is in Paris.  Based on his GPS location, he is in a restaurant.  Francine proceeds to help Don order a lunch.

I picked this scenario because, the last time I was in that siutation, I ended up ordering a lunch of just side-dishes.  It involves more than speech recognition, but the core idea is that speech recognition does not manadate machine doing the recognition.

FTP

Scott Watermasysk asks what the hands-down best FTP tool is.  I have been using FileZilla for the past year and have been very happy with it.  It's fast, free, and trouble-free.  Don't let the 'zilla part bother you if you are a softie.

Speaking of FTP tools, here are some features I want in my FTP tool:

  1. Persistent synchronization – intelligently monitor and update local directories to match remote directories or remote directories to match local directories.
  2. Delta archives – automatically record and archive changes.
  3. Integrity checker – detect illegal modifications or additions to remote directories.

Bad Taste of XSLT

Everytime I use XSLT, it leaves a really bad taste in my mind.  I just spent 3 hours writing an XSLT stylesheet for a new XML-based signature verification result format I created for my client recently.

The format itself is designed to capture data associated with signature verification so that it can be used as legal proof of verification at some later date.  This means capturing data hash, signature, certificates, and OCSP request/response pairs for each cert in the chain; basically bagging every scrap of data on the table.  End result should be routed automatically to a backend repository, but some customers will opt to stored them on local drives which means they need to view it locally.

That's where XSLT comes in.  By associating an XSLT stylesheet with the XML file, users can view the file with just a browser (well, IE).  It's a nice solution except writing XSLT can be a real pain in the ass.  Take one little step outside the simple stuff and you are in a jungle and it doesn't get better over time unless you use it everyday.  Since what I had to do involves fairly advanced XSLT, I was not in a good mood by the time I finished.

If you have a choice, avoid XSLT like the flu.  I didn't.  If you really have to, make sure you have a XSLT debugger.  XSLT being a declarative language is a joke.  It might look declarative, but if you do any serious work with it, you will start thinking procedurally in order to make sense of it.  Like I said, it's a joke.