Vary: ETag

These days, I am not tracking Atom mailing list too closely due to the traffic volume (currently 7 times XML-DEV traffic)and lack of time, but Tim's FooCamp2004 post prompted me to read Sam's Vary: ETag post and comments.

While I like the cleverness of the solution, I have misgivings about how practical it really is.  Aside from requiring Vary: ETag aware clients keep track of ETags, the solution requires a lot of server side work for doubtful gain.

  • Everyone seems to agree that low traffic blogs won't see any noticeable gain.
     
  • I don't think the large blog services like TypePad and Blogger will gain much either because such services must support tens of thousands of feeds, each of which must be sliced and diced at the expense of CPU load to reduce bandwidth.  Parsing every feed for every request to figure out which subset to send back is not cheap.  Even if a cache is used, frequent editing of recent posts will increase the CPU load noticeably.
     
  • That leaves only feeds like the MSDN aggregated feed which will see noticeable bandwidth reduction at the expense of writing a custom Vary: ETag handler.

A key problem is that XML is not an efficient format if you are doing a lot of search and extraction.  Regular expression can be used but not reliably or fast enough unless the feed is preprocessed into a more palatable form (canonical or proprietary reg-ex friendly format).

A similar but more practical solution might be to serve feeds as a multipart MIME resources with sequenced parts.  Each feed item becomes a MIME part and feed metadata is also a MIME part.  Extra benefit is that binaries can be embedded as well and other content formats (i.e. RSS) can be supported as well.

MUD Scraps

These are just MUD/IRC/IM related scraps of thought trails accumulated over the past couple of weeks.  Actually, most of them are resurfaced scraps from long past but I turned them over like one would a flipjack so I thought I should park them on my blog so I can look it up next time they resurface.

  • IRC channels as rooms in a MUD – an IRC channel is, in essence, a bunch of people talking in a room.  Everyone can hear and see each other (unless they are whispering or lurking invisibly).  It's just like being at a same location in a MUD.
  • IM session inside a MUD – IM is like talking inside a private room in a MUD.  Overlaying the room metaphor gives persistence to sessions and storage for shared objects.
  • Wiki page as a room, an object, or a thread of conversation inside a MUD.
  • Topic or thread of conversation as an object – threads of conversations are constantly created and updated and 'leads' to the threads can be found by where, when, who, and what.  So if you enter a room where a conversation took place there some time ago, you can find it and, if you want, pocket it to keep track.  In a sense, you are subscribing to a feed.
  • Googling conversations in IRC and MUD – this is self evident so I won't explain.  A hack implementation can just generate a web page per thread of conversation and let Google index them.
  • NPC as IRC bots – enough said other than that they should be easily programmable, customizable, and deployable.
    • Fido, open the report.
    • Fido> Which report?  I have…
  • MUD objects with MIME types and custom renderers – images are displayed, audio and movies are played, spreadsheet is displayed using Excel, etc.  Thumbnails of these objects are displayed when 'see' command is invoked.

Linux VMware Blues

If you are running a Linux guest under VMware like me and my blog's hyperlinks are green instead of blue, turn on subpixel font rendering to get the blues.

FYI, I am running RedHat 9 under VMware running on XP, primarily for development and testing.  For example, I needed to write a milter so I initially wrote a C++ version using Eclipse running under RH9 VMware guest.  The milter was talking to sendmail server running inside the same virtual machine.  Eclipse CDT running inside the VM was rather difficult to work with so rewrote the milter in pure Java using Eclipse running on XP.

To debug, I configured the sendmail server running inside the VM to invoke the pure Java milter running under Eclipse debugger outside the VM.  Then I sent both plain text and multipart MIME messages using Evolution, running inside the VM, as well as Outlook, running on another machine, to the sendmail server inside the VM which in turn invoked the milter running outside the VM.

While all this might be confusing to some, it worked amazingly well.

Snowballed Blogger

A while back, I noticed that a fresh 'best web development language' war broke out, this time between Java and PHP.  I tuned out when it got ugly but noted that the war got started when a Friendster engineer posted about the Friendster's switch from JSP to PHP along with some throwaway comments that many Java geeks interpreted as insults.

I didn't know that the engineer was Joyce Park (hey, a fellow Korean-American blogger) and that she got fired recently for blogging until I read the Red Herring article (aren't they supposed to be dead?).

The problem here is that most bloggers just don't realize that they have a powerful tool in their hands that could cause serious damages whether intentional or not.  If Joyce said what she posted to engineers she met in person, all there would be a heated discussion.  But posting the same on her blog created a big wave of controversy among geeks with Friendster as the center piece.

Although everything she wrote was indeed public information and the change in file name extension (.jsp to .php) made it clear to any geek that Friendster switched from JSP to PHP, Friendster could have paved over inquiries about the switchover with silence just as Google did with Orkut's use of ASP.NET.  But Joyce pushed Friendster into the middle of PHP vs. Java battlefield when she wrote in a post titled Friendster goes PHP:

… we can now stop being a byword for unacceptably poky site performance…

What I don't understand is how she failed to see the consequences of her post.  The flamebait title and comment along with her being an employee of Friendster made it a perfect slashdot fodder.  Not only has she not seen it then, she doesn't see it  even now.

I am not saying Friendster was right to fire Joyce.  I think they overreacted and opened a supersize can of whupass on their own face.  But I think it's more important for corporate bloggers like Joyce to learn how to protect their employers from their blogging activities.  Pointing angry fingers at Friendster points in the wrong direction IMHO.  We need more learning and less lashing out.

Running Longhorn inside Quake

Longhorn will bring 3D to the desktop.  This means that windows will be hierarchical 3D objects that can be, for example, swivelled aside to make room on the desktop without hiding the window's content.  Window contents are 3D so buttons and controls are swivelled when the container window is swivelled.

I am not yet convinced that Longhorn's 3D window manager has much non-geeky values.  In the end, we are still simulating a deskbound world and therefore the GUI is constrained and contorted by the desktop metaphor.  Because our world cannot be translated into a language that consists purely of desktop lingos like documents, folders, pens, and magnifying glass, we invented windows.

While I could go on and on about the limitations of the window metaphor, I am running short on time so I'll just get to my point, which is:

Longhorn desktop itself should swivel down to reveal a 3D world within which we can work 'directly' with persistent objects instead of limiting ourselves to windows.

Instead of squeezing real world size objects into spacially meaningless hierarchy of folders, lets place them in rooms, buildings, and places to which we can navigate in.

As an engineer, I realize fully that this sort of change is equivalent to a mindless auto executive telling the engineers to lower the dashboard by 5 inches.  But this subtle change will finally let computer users leverage their real world knowledge and experiences when working with computers.

Zen and Mental Wedgies

A nice quotes on Zen via Marc,

"before studying Zen – man is man and mountains are mountains – yet things are confused."

"while studying Zen – man is no longer man and mountains are no longer mountains."

"after studying Zen – man is man and mountains are moutains – yet things are no longer confused."

My take on Zen is even more down to earth.  To me, Zen is just a tool to remove mental wedgies.  Just as brushing your teeth regularly keeps your teeth healthy and smelling fresh, practicing zazen regularly removes mental debris stuck in your mind before they could infest and grow into something more menacing.

Zen doesn't deliver truth or meaning of life, just a more comfortable perspective, a product created out of necessity by those who suffered enough to shave their heads over thousands of years ago.  It won't help you if you are crazy though just as you can't brush your teeth if you can't hold a toothbrush.  That's what shrinks are for.  It will help you if your mind is tangled into a ball of mess by your own doing.

Thank goodness Zen masters don't charge like doctors do.

Bush’s Terror Error

If you think Bush will protect America from terrorists, read this report on where the $144 billion Bush wasted on the Iraq War could have been spent on to protect America from terrorists.  Instead of fighting terror, Bush fought in error, a terrible error that cost us more than 1000 lives and all that money without showing any improvement in national security.

Phishing and Bouncing

Looks like the trick of using redirection CGIs at popular website (described in Phishing with Google) is getting popular among phishers.   I just got a couple that uses AOL's redir-complex CGI at:

http://r.aol.com/cgi/redir-complex?url=whereever

Note that phishers can use not just the redirecting CGIs, but also those CGIs that use return URL as parameters.  In fact, these types of CGIs are popular among financial institutions and single-sign on services.  For example, both Passport and 3D-Secure uses them.

Syndication Scalability Problem

Scoble writes:

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

I know of a major broadcaster that refuses to turn on RSS feeds because of this issue too. We need smarter aggregators and better defaults.

I think it's high time we addressed this problem and I think the upcoming BloggerCon is the right place to do it.

We need to 'teach' syndication clients to speak with servers and other clients at more intelligent level than that of a spoiled child screaming GIVE ME GIVE ME endlessly.  We need more than just technical solutions.  We need to introduce ways to reward clients that improves scalability and punish bad clients that hog resources selfishly.

Update:

Following excerpt was pulled up from the comment section.

Here is possible solution for a site like blogs.msdn.com that doesn't require P2P:

1. Each blog feed is divided into two feeds, a daily feed (DF) and hourly feed (HF). There is also a master update notification feed (MF).

2. DF is updated once a day and has a permanent URL. HF is updated whenever a post is made and has a transient URL. MF is updated continually and has a permanent URL.

3. Both DF and HF embed a link to a MF.

4. MF contains transient links to updated HFs.

Since HF URLs are temporary and can only be found at MF, clients that ignore the link to MF in DF will only be able to pull DF.

Because DF has update period of 24 hours, dumb yet well behaving clients will hit it only once a day. Originating IPs of misbehaving clients can be identified and access can be slowed down enough to prevent abuse without refusing to serve entirely.

The main idea here is to require changes in clients to access the superior product  (HF) and entice users away from the inferior product (DF) in one of several ways (slower speed, summary only, etc.).  If done right, users will flock to clients that gives them access to the superior product over those that can only access the inferior ones.

Now, coming up with the solution outlined above took only ten minutes of thinking.  I think we can come up with more possible solutions if we put our heads together.