Google Desktop Search: Killer-App?

I installed Google Desktop Search (GDS) yesterday morning.  First thing I was impressed with was the size of the thing: 400K.  As a developer, I know how difficult it is to pack that much functionality into just 400K these days.  Of course, if it was 20 years ago, I would have howled about the 'huge' footprint.

I uninstalled GDS 8 hours later when my laptop reved up like a car with stuck accelerator.  I was using Eclipse at the time so this version of GDS must have faulty idle detection algorithm.  Since I don't enjoy writing code inside a jet engine (my laptop's two fans make a lot of noise when they are going at full speed), I uninstalled GDS to wait for a better version.

The problem with desktop search is that, while the file system, email archives, and browser cache offers extra metadata, there are no hyperlinks among desktop documents.  Without hyperlinks, you can't do page ranking Google is famous for.

The only advantage Google has over other desktop search is tight integration with their website.  While some people seem to be impressed with seeing the word 'Desktop' added to the Google homepage, I think the tight integration and blurring of the line between the Web and the desktop will result in confusion and concerns with little gain for Google.

The core problem here is that search engines like Google throws everything into one pot.  For web search, all the web pages on the Net gets thrown into that pot.  Thankfully, hyperlink-based pageranking pulls the good stuff to surface with minimal hassle.  With desktop search, all of your documents gets thrown into the pot without an equivalent of page ranking to measure relevance.  IMHO, there aren't enough metadata on the desktop to achieve the same level of utility Google web search offers.

Also, there is a problem that will surface in the future as desktop search over browser cache becomes ubiquitous: desktop spam.  Websites will begin loading up their webpages so links to their websites will appear in desktop searches and, without page ranking, they will find it easier to catch the desktop searcher's eyes than web searchers.

Whether desktop search is a killer-app for the user or not, I have doubts whether it is a killer-app for Google.  If they start showing ads on desktop search result pages, many users won't like it.  If those ads are context-sensitive, meaning they are based on words in YOUR documents, even more users will howl.

I have other issues and possible solutions but I want to think them through before sharing them.

IIS 6 Compression

I just wasted a couple hours fiddling with IIS 6's HTTP compression to fix my feed.  The trouble originally started when I noticed static files being served compressed by IIS 6 when the file size grew beyond 30K or so although I didn't turn on compression.  Since some news aggregators can't read compressed feeds, I looked for ways to disable it.

Googling led me a set of tags inside the metabase.xml file and I set all compression related parameters to FALSE and restarted IIS.  Initially, this seemed to fix the problem but after a few minutes, the background compression service kicks in and I am back to where I started.

The annoying part is that IIS serves up gzip or deflate encoded content even when HTTP header Accept-Encoding is missing or empty.  I'll have to pore over the HTTP specs to see what the behavior is supposed to be but this doesn't make sense.  My guess is that HTTP.SYS's memory cache code is not bothering to check the header.  Another crazy symtom is that the involuntarily compressed feed sometimes appear as blank pages on IE.

I am just going to let this problem drop for now because I got work to do.

Turtle Ship and Korean Food

Just before returning from my recent trip to Seattle, I picked up a copy of Ages of Empires at the Microsoft Store for my son Sean.  It didn't take long for him to start playing the Korean.  He was particularly impressed with the Turtle Ship (world's first ironclad ship designed by Admiral Yi Sun-sin).

Just yesterday, my wife told me that Sean asked if he could go to the Korean language school next summer.  When my wife asked why, Sean said he wants to marry a Korean woman when he grew up so he thought he should start learning Korean properly.  When my wife informed him that there are many English speaking Korean-Americans, my son said 'Yes, but there are many more in Korea to choose from'.  When my wife asked him why he wanted to marry a Korean woman, he said 'Korean food, of course'.

It's scary how practical a 10 year old boy can be.  What does that got to do with the Turtle Ship?  I suspect that, until now, he resented being a Korean.  The Turtle Ship made him proud enough to think that being a Korean is not bad after all.

Addicted to High DPI

While desktop LCD monitors are excellent for playing movies and games, I am not too happy with them for regular uses.

First, they are just too bright.  Most applications I work with consist of mostly white background and, instead of typing on paper-like ambient white background, I feel like I am typing on the surface of florecent lamp.   because LCD monitors radiate white background regularly which usually involves a lot of white background.  To get around the problem, I reduced brightness all the way down but it was not enough.  So I changed the window background color to a light shade of gray and I can now stare at the monitor without seeing ghosts everytime I look away.

Second, pixel density is too low for me.  I got used to my laptop's 15-inch 1600×1480 LCD monitor which supported 130 DPI.  ClearType simply rocks at 130 DPI!  Maximum resolution of my desktop's 17-inch LCD monitor is 1280×1024 and 96 DPI is the highest I can use it at.  At 96 DPI, fonts are ugly even with ClearType on.  Larger LCD monitors don't improve the DPI much either.

Whew.  I feel much better after this rant.  Hopefully, LCD manufacturers will make LCD monitors for business use that is less blinding and more dense.  Until then, I'll just have to suffer.

Digital Typhoon Hits Korea

South Korea is undergoing amazing changes brought on by endless waves of new technologies and trends.  This New Zealand News article provides a good glimpse of what is going on in Korea:

The country has become a hot-bed of free music downloading as fans take advantage of MP3 file-sharing services, including Soribada, South Korea's version of Napster.

In a country of 48 million people, Soribada ("sea of sound") has drawn more than six million users since it launched in 2000.
…Since the launch of these sites, domestic CD sales have nose-dived nearly 50 per cent.
…There were 8000 CD stores in South Korea five years ago, but now we have only 400 left.
…Although the advent of free MP3 files has also devastated music publishers and other retailers, the future of music retailers looks particularly bleak since they also face cut-throat competition from online shopping.
…Sales of music for cellphones alone have outpaced traditional CD sales since 2002.
…"It seems like brick-and-wood music stores like us are nearly doomed, unless the Government comes up with some financial measures to help us stay alive. It may soon be the end of an era for us."

Unfortunately, I don't see a workable solution emerging yet.  If the Lawrence Lessig's so called Free Culture folks have some ideas, I would like to hear them.  Note that people running these businesses in Korea are not idealists nor technologists but people buried neck deep in the new reality trying to stay afloat.

<

p dir=”ltr”>Some are adapting fairly well to these changes though.  For example, book publishers hit hard by rampant booksharing online are publishing books written by amateur online serial writers.  As I mentioned before, decent amateur writers receive publishing offers even before their serial reach the halfway point.  This is because the serial itself is the primary marketing vehicle for these types of books.

StAX Update

If you have used StAX API at all, you have some scars.  StAX API itself is good, IMHO, but the reference implementation was buggy and its sparse documentation failed to explain how the methods worked together.  The result was that each of us wasted a lot of time experimenting and debugging.

Thankfully, the situation is improving.  Version 1.0 of Woodstox, an implementation of StAX, was released recently.  It's author, Tatu Saloranta, is also working on StaxMate (StAX utility library), StaxTest (StAX conformance test suite), and StaxMisc (integration bits).

If you would rather stick with the RI, there is an updated release at stax.codehaus.org site.

A related news is that James Strachan is working on ActiveSoap, a lightweight SOAP stack using StAX.  Excellent.  If you are sick of Axis's slobbish performance but can't wait for ActiveSoap, you might want to take a look at JibxSoap.

Also, here are some useful StAX Tips written by Berthold Daum:

  1. Using StAX
  2. Parsing XML documents partially with StAX
  3. Screen XML documents efficiently with StAX
  4. Write XML documents with StAX
  5. Merge XML documents with StAX

Jonah

I had an amusing thought today.  I thought, if America was a ship, shipmates would look upon Bush as a Jonah.  While he may explain away every accusations leveled at him, even he would have to agree that he presided over a lot of bad luck.

Unreasonable?  Yes.  Unfair?  Absolutely.  But what would you do if your ship is half wrecked, stuck in the middle of the ocean without a trace of wind nor passing ships, and running out of food?  Superstitious sailors would toss the Jonah overboard and keep their fingers acrossed.

I have met people who had extraordinary strings of bad luck.  Everything that could go wrong would go wrong.  Car accidents, illness, failing business, ruined reputations and relationships.  Koreans believe bad lucks are brought upon by bad karma.  If you or your ancestors did bad things, bad luck will visit you and your offsprings.

One way to ward off bad luck and ensuring good luck is to find good gravesites for your ancestors and good locations for your home or business.  If your family or business has been suffering a string of bad lucks, bad feng-shui (poong-soo in Korean) is usually suspect.  If it's not feng-shui, then it must be the spirits so a moo-dang (voodoo doctor or channeler of sort) is called to to shoo the bad spirits away.

I doubt many people in Korea still believes this stuff, but the prevailing attitudes are 'what else can we do' and 'why take unnecessary chances?'

Maybe Bush should call in a moo-dang into the White House to see what pissed our founding fathers off.  Heh.

Feeds out of service

FYI, my RSS feeds are out of service temporarily.

It's related to compression and HTTP headers.  For some reason, gzipped feed content is showing up as garbage for some people.  On my desktop, IE can read the feed just fine but not on my laptop.  Turning off compression on the server-side (by massanging metabase.xml) doesn't quite work either because the compressor is slipping in somehow after a while.

Update:

Feeds now seem to be working although IE on my laptop still thinks the feeds are bad.  I think that IE's configuration is screwed up somehow.  Unfortunately, I don't want to reinstall it because that will cause too many annoyances.  Oh, well.

Another problem is that sometimes gzip encoded response is received although gzip is not in the Accept-Encoding.  I have yet to figure out if this is because a proxy is serving up cached copy without honoring Accept-Encoding header or simply IIS behaving erroneously.

Ponzi and Maryam: My New Drinking Buddies

After spending an entire day in geek mode, I wasn't in the mood for more at one of two dinners Microsoft hosted.  Thankfully, Ponzi and Maryam were there to rescued me.  As Ponzi wrote, we had wacky conversations while enjoying wine and cigar.  They are lovely gals with hearts of living gold.  Chris and Robert are lucky to have them as soulmates.