Easier Dynamic Win32 API

When Win32 API first came out, it wasn't sprawled out like it is now. As it expanded (exploded) in spurts across versions, architectures, and service packs, Windows developers had to deal increasingly with the headaches from the need to take advantage of latest features without crippling their software with older Windows.

While one could now use deferred library loading creatively to address this problem, I've always handled it with manual LoadLibrary and GetProcAddress calls. Well, I finally got tired of doing this so I tried the handy LateLoad, a small MFC-style event framework-like macros that creates a C++ class for each library and a C++ method for each procedure. Very nice.

Here is the LateLoad definitions for a set of Win32 functions one needs to enumerating running processes and modules across Win32 versions.

LATELOAD_BEGIN_CLASS(CPsApi, PSAPI.DLL, FALSE, FALSE)
LATELOAD_FUNC_3(FALSE, BOOL, STDAPICALLTYPE,
      EnumProcesses, DWORD*, DWORD, DWORD*)
LATELOAD_FUNC_4(FALSE, BOOL, STDAPICALLTYPE,
      EnumProcessModules, HANDLE, HMODULE*, DWORD, LPDWORD)
LATELOAD_FUNC_4(0, DWORD, STDAPICALLTYPE, GetModuleBaseNameA,
      HANDLE, HMODULE, LPTSTR, DWORD)
LATELOAD_FUNC_4(0, DWORD, STDAPICALLTYPE, GetModuleFileNameExA,
      HANDLE, HMODULE, LPTSTR, DWORD)
LATELOAD_END_CLASS()

LATELOAD_BEGIN_CLASS(CVdmDbg, VDMDBG.DLL, FALSE, FALSE)
LATELOAD_FUNC_3(0, INT, STDAPICALLTYPE, VDMEnumTaskWOWEx,
DWORD, TASKENUMPROCEX, LPARAM)
LATELOAD_END_CLASS()

LATELOAD_BEGIN_CLASS(CKernel32, Kernel32.DLL, FALSE, FALSE)
LATELOAD_FUNC_2(NULL, HANDLE, STDAPICALLTYPE,
      CreateToolhelp32Snapshot, DWORD, DWORD)
LATELOAD_FUNC_2(FALSE, BOOL, STDAPICALLTYPE,
      Process32First, HANDLE, LPPROCESSENTRY32)
LATELOAD_FUNC_2(FALSE, BOOL, STDAPICALLTYPE,
      Process32Next, HANDLE, LPPROCESSENTRY32)
LATELOAD_END_CLASS()

Serving Large PDF Files

If you have large PDF files at your site, you might want to read this Microsoft technote explaining how IE handles content types associated with [Netscape] plug-ins (i.e. Acrobat) and programmer-level workaround Matt Raible. I suspect IE's weird triple request was to support some misbehaving yet popular plugins.

I am not sure if popular web servers like Apache and IIS comes with built-in filters to address this problem. If not, then this opens up another golden shareware moment.

Java Bayesian

Is there a decent open source Java Bayesian package that is not GPL or similarly restricted from commercial use? I am aware of only Classifier4J. Preferably, it should be optimized for server applications and high performance.

Tagging and WinFS in the Enterprise

Tagging is useful at personal level for categorizing data of any size and type without the constrain of hierarchy. While WinFS is more than this, adding tagging to traditional file system will deliver most of the benefits WinFS offers at the user experience level.

But IMHO the real power of tagging can be hatched only at the workgroup level. This means introducing new ways (UI-wise) to expose and discover tags others use with the same level of ease as children learn new words.

AJAX in Flash

As I pointed out in my AJAX post, I think difficulties of writing AJAX make it a poor web application platform, particularly since there are easier alternatives.

Flash, for example, is a better platform for some applications than AJAX because it offers similar capabilities (i.e. XMLHttpRequest in DHTML) and comparable, if not better, level of availability along with much better graphics capability. Flash tool developers such as Lazlo and Xamlon makes it easy to develop interactive web application. Just take a look at this Google Maps like demo built over a weekend using Xamlon's upcoming tool.

Note that AJAX in Flash is inappropriate for web applications that manipulate DHTML DOM extensively and has a number of issues that makes it prohibitively expensive for uses beyond demos and small tightly-focused applications. For example, you can't built PhotoShop with it without abandoning usability.

Beyond Flash, .NET looms with superior functionalities and flexibilities. The only thing it lacks is the availability in several sense.

Emulating Errors for Tag Convergence

One possible solution I see for the Tag Divergence problem is errors. When people communicate verbally, mistakes are often made and miscommunication results. What if such miscommunication is possible in a tag system? What if services like Flickr or Delicious are capable of confusion? This idea sounds crazy at first which is a good sign. 😉

Let's see. What if similar tags (in terms of edit distance or some other aspect) were merged randomly? Tags like dog, dogs, dag, and dig would get confused (aka merged) into one. It doesn't matter which tag is chosen because more common tag will eventually emerge. Problems stemmng from intentional tag confusion mechanism needs to be minimized but I don't think they are serious if they are constrained appropriately.

Accumulation of confusion over time has a kick on it's own so I think confusions have to be soft. I am still trying to figure out what soft means so don't feel bad if you don't understand what I am saying.

AJAX

I am playing with AJAX myself but I see many problems with AJAX as the next generation DHTML application platform. AJAX applications are more expensive to build, test, and update than traditional DHTML applications.

Frankly, I am not even sure whether current crop of popular web browsers can support AJAX because they weren't built with the expectation that a signle web page might stay up for as long as GUI applications. When even small carefully written DHTML apps can cause enough browser resource leaks to require frequent browser restarts, I think good stable AJAX applications will be rarer than the picture recent hype paints.

And by the time engineers discover the cost of AJAX first-hand, .NET-based ClickOnce applications will look much more attractive than AJAX-based applications can ever be.

Risk by Proxy

As part of my fraud detection work, I've been looking at anonymous network proxies (aka anonymizers) as a source of risk. What is a proxy? A proxy is, in essence, a man-in-the-middle (MITM). If the MITM is a bad guy, then you've just invited a wolf in sheep's clothing into your house.

While there are many MITM attacks possible, including SSL certificate spoofing, most lucrative attacks are the ones that keeps the door open. For example, a proxy can inject virus into any executables users download. Once they are in, they can start harvesting passwords through keylogging or inject bogus certificates to monitor SSL traffic.

Come to think of it, this is a great way to deliver monitoring software into hacker's desktops.

OPML Revisited

OPML is a simple, widely used, yet often misunderstood, XML format created by Dave Winer. IMHO, misunderstandings stem from overexposure to traditional ways of using XML. I must admit, I also laughed at OPML when I first looked at it years ago. But when I cocked my head (a technique anyone can learn from their dogs), it began to make a lot of sense.

This is what I saw:

Infoset:

An OPML document is a collection of objects.
An object may have properties and contents.
An object's properties are unordered map of name/value pairs.
An object's contents are ordered list of objects.

Syntax:

Objects are encoded as XML elements named 'outline'.
Properties are encoded as XML attributes.
Content objects are encoded as child XML elements.

Once you get this picture in your mind, you start to appreciate OPML more. Throw in display and interaction semantics builted into the format along with distributed object linking and embedding Dave often raves about and you got quite a beast of a language.

As to the question of who defines the properties, the answer is everybody does. OPML is a kind of Emergent Markup Language in that common properties are expected to emerge through industry practices rather than standardization through committees.

There are some shortcomings with OPML though which I would like to see addressed.

OPML Wiki

OPML needs a wiki for OPML developers to interact with each other and to document how each of them are using OPML so that standard or type-specific properties may emerge.

Structured Properties

One weakness of XML is that, while elements may be structured, attributes may not. Since properties are encoded as XML attributes in OPML, (semi) structured properties (i.e. HTML fragments) have to be encoded at the cost of readability.

I think the need for a wiki is far more serious than the need for structured property support.

Fixing AutoLink

Enough people have already commented on what is wrong with Google's AutoLink, most of which I agree 100% with, so I won't delve into that. Instead, I'll write about how to fix it.

Opt-out solutions (i.e. 'noautolink' meta tag) are useless because they still create chores for content providers where there were none before. Google's 'rel=nofollow' was a borderline case. AutoLink is not because the effect of not opting out is greater and more direct.

Short of scrapping AutoLink, a solution that allows content-providers to opt-in is needed. One such solution to AutoLink only anchor tags with recognizable identifier in the 'class' attribute. For example, content producers who opt-in can markup following content:

123 ABC St., San Jose, CA
ISBN 0123456789
Harry Potter

like this:

<a class='address'>123 ABC St., San Jose, CA</a>
<a class='isbn'>ISBN 0123456789</a>
<a class='book'>Harry Potter</a>

AutoLink should link the content only IF the class attribute contains one or more keywords (i.e. isbn, ssn, gps, person) it can handle. There should also be a generic autolink class so content authors can markup without knowing which class to markup a text fragment with. So above could have been written like this:

<a class='autolink'>123 ABC St., San Jose, CA</a>
<a class='autolink'>ISBN 0123456789</a>
<a class='autolink'>Harry Potter</a>

If the anchor tag is not appropriate, then I think 'span' tag can be used with minimal visual side-effects.

Don Park's Weekly Habit

Well, sorta weekly.

Category: Technical