Tomcat and Log4J Conflict

Tomcat 5 is currently stable enough for me to use, but I think it's use of Jakarta Commons Logging framework is going to cause headaches for webapps using Log4J. For some reason, Tomcat fails to start if there is an webapp using Log4J. After spending several hours of fiddling with configuration, I had to stub out all log4j calls from a webapp to get it to work.

XForms 1.0

XForms 1.0 is finally a W3C Recommendation. No comment.

Firebird 0.7

Firebird 0.7 is out. While I prefer IE over Mozilla, I have taken a liking to Firebird because it is fast, easy to use (tab heaven), easy to install handy extensions, and has superior FTP and i18n support. I am seriously considering switching to Firebird as my main browser now. As a developer, I'll still be using IE to test my web pages, but Firebird is too useful to ignore any more although there are still some odd kinks that needs to be fixed. I sure hope Firebird development continues on.

Fixed-URI for Site Metadata

There is a lot of discussion going on about ways for user agents (read browsers) can locate site metadata. People are even arguing about what constitutes a site. Beside the discussion within W3C TAG, RSS developers are discussing this topic with RSS feed discovery in mind. Consensus seems to be moving away from using robots.txt style solution which uses fixed-URI.

Tim Berners-Lee wrote back in February:

The architecture of the web is that the space of identifiers on an http web site is owned by the owner of the domain name. The owner, "publisher", is free to allocate identifiers and define how they are served.

Any variation from this breaks the web.

Hogwash.

Web is just not that brittle.
Other solutions are not as easy.
User agents should protect themselves from unexpected data.
People will not revolt if W3C reserves some range of names if they are reasonably unique.

Simplest solution IMHO is to introduce a special file extension for metadata and a default file name for directory metadata.

For example, if ".w3c" file extension is used for metadata and default file name for directory metadata is empty string, metadata for the resource "/application/foobar.html" can be found in "/application/foobar.w3c" and metadata for the path "/application/" can be found in "/application/.w3c".

Add to this a hierarchical inheritance rule which basically say metadata not specific to a resource can be overriden by subpaths. For convenience sake, subpaths starting with "_w3c" should be reserved.

Using this solution, my blog's RSS feed list can be located by fetching "https://blog.docuverse.com/.w3c". Problem solved.

To me, current discussions are no different than discussions about where the toilet flush lever should be placed. Should it be on the right-side because there are more right-handed people or at the center to be fair? I say let the manufacturers place the damn lever anywhere convenient and noticeable. 'Users' will do the rest.

A Letter from Linus

Just came across this copy of e-mail from Linus that started the whole Linux movement in a Wired article.

Message-ID: 1991Aug25.205708.9541@klaava.helsinki.fi
From: torvalds@klaava.helsinki.fi (Linus Benedict Torvalds)
To: Newsgroups: comp.os.inix
Subject: What would you like to see most in minix?
Summary: small poll for my new operating system

Hello everybody out there using minix-I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386 (486) AT clones. This has been brewing since april, and is starting to get ready. I'd like any feedback on things people like/dislike in minix, as my OS resembles it somewhat

Any suggestions are welcome, but I won't promise I'll implement them 🙂

Linus

I too was looking at Minix then and thinking that it would be fun to write a Unix clone for PCs. Hah!

SQLLite and QDBM

SQLLite

While LAMP (Linux, Apache, MySQL, PHP/Perl/Python) is still going strong as a web application platform, MySQL is being challenged as the default database by SQLLite (home, download, wiki). SQLLite is an embeddable SQL database engine, meaning it runs inside your program. Beside being embeddable, it has these attractive features:

Speed – SQLLite is faster than MySQL (benchmark)
Code Size – just 25K lines of C
Data Size – much smaller backup file than MySQL
Data Storage – everything is stored in one file
Transaction – built-in and default unlike MySQL
Dependency – SQLLite has no external dependencies

Upcoming PHP 5 will include and use SQLLite as its default database engine. This PHP Internals message outlines the benefits SQLLite brings to PHP. Interestingly, MySQL client libraries will no longer be bundled with PHP due to some licensing issues. I am not sure if this is a major trend in the making, but MySQL is taking the embeddable threat seriously enough to work on its own version of embeddable MySQL (mentioned in the October issue of Linux Magazine).

QDBM

If you don't need to use SQL, Mikio Hirabayashi's QDBM is an attractive xDBM-style (GDBM, NDBM, SDBM, Berkeley DB, etc.) database management library. This benchmark (PDF) compares QDBM with other xDBM libraries. It's main competitor is Berkeley DB which also offers both hash table and B+ tree API. In comparison to Berkeley DB, QDBM has a nice speed/data-size ratio. Only problem is that QDBM is still in beta. Hopefully, this post will give the open source project more exposure and attract more resources to it.

Sender-side Spam Filtering

This is a non-crypto wack at the spam problem. It's half-baked at the moment, but I am sure you guys will provide the necessary heat to cook fully or burn it crisp.

Today, e-mail senders have no way of knowing whether a message sent has been erroneously flagged as a spam on the receiver side by either receiver-SMTPs or SMTP clients. Being able to check whether my message is likely to be flagged as a spam has some value to me. Starting with that idea, let's see if a solution comes together.

Spam-Filtered Outgoing Mail

A sender-SMTP that uses spam-filters on outgoing messages returns messages flagged as a spam or a likely spam back to the sender instead of sending them, allowing the sender to revise the message or use another communication channel like telephoning. Sender-SMTP is basically saying that the message is being returned because chance of the message getting through the spam filter on the receiving side is low, a valuable service IMHO.

Sender-SMTP can weed out spammer's mail accounts by monitoring spam ratio on each account.

Filtering spam on the sender-side has two side-effects:

outgoing mail volume drops.
spam ratio decreases.

These effects will be visible to both receiver-SMTPs and mail recipients, meaning less spams for them. Sender-SMTP can also actively weed out spammers by monitoring spam ratio on each mail account.

Identifying sender-SMTP by IP address

To encourage sender-SMTPs to use spam-filters on outgoing mail, they have to be identified. One cheap solution is by IP address.

If sender-SMTPs are encouraged to have static IP addresses, receiver-SMTPs can identify sender-SMPTs and rate each accordingly, giving higher marks to those that seems to be filtering spam. Penalties to those who rate low can range from limiting frequency of connections and/or limiting volume per connection.

To encourage sender-SMTPs to use a static IP address, receiver-SMTPs can apply penalties to unknown sender-SMTPs. To avoid the penalty, sender-SMTPs must use a senderid assigned to the IP address on first connection.

Recipient Feedback

Receiver-SMTPs can append a URL to each message to collect recipient feedback which can be used to differentiate good SMTPs from masquerading bad SMTPs. Feedback can be sent as part of receiver-SMTP's response when the suspect sender-SMTP connects next time. Sender-SMTP can use the information to throttle back the suspect sender's mail volume.

I am not sure if the solution I just sketched will work or not, but it is definitely more scalable than TEN or SMTP4All. Please let me know what you think.

Update - 2003/10/14 12:32PM PST

Mitch Ratcliffe is looking at the spam problem from a similar angle:

Push the responsibility back onto the sources of spam, not the end-user who generally doesn't spam one iota.

Right on, Mitch.

Fade: Turning Pirates into Addicts

Fade, described in 'Subversive' code could kill off software piracy, is a new anti-piracy system with two notable features:

Virtual Scratches as Watermark

Fragments of 'subversive' code is laced into the CD in a way that makes them look like real scratches on CDs. Scratch-aware programs access the CD directly to see if the scratches are there. Since current crop of CD copying software tries to correct scratches as it copies, missing scratches means pirated copy.

Pirated Copies as Demos

When a pirated copy is detected, the program silently starts degrading it's functionalities. Degradation speed is set slow enough for the unsuspecting pirate to become dependent on the program and, hopefully, force a purchase. Degradation probably involves overwriting parts of the program on the hard disk.

While I don't think Fade will make a noticeable dent in piracy, I like the idea of using piracy as a marketing tool.

More on E-Mail

While I was getting a haircut today, I thought about the scalability problem of solutions like TEN and SMTP4All.

Digitally signing each message will reduce mail throughput significantly. Some of the throughput loss will be offset by removal of spam which will reduce message traffic by as much as 90%. Scalability problem remains still because mail traffic is not constant.

Add to this the cost of bi-directional authentication between sender-SMTP and receiver SMTP. If messages per session is high (i.e. mail traffic between AOL and MSN), cost will be minimal. But I suspect the average number of messages per session is pretty low, meaning near one message per session.

Newsreel Multimedia Archive

Amazing treasury of stills and video from 75 years of Newsreel. Low-res versions can be used freely. Hi-res 300 dpi versions can be licensed. I think the low-res versions are good enough for most web-uses unless you need to zoom-in on small sections. BTW, don't go there now because the site just opened and is swamped.

Yesterday, I was looking at Hemera Photo-Objects series and got side-tracked as usual into thinking about value-added multimedia contents. Photo-Objects is basically an image plus a mask. There is no reason for the image and the mask to be together. Likewise for image transforms. In a sense these are like add-ons for an image and one could potentially search for all available add-ons given an image's URL or URI. The same can be done with audio. Given any audio, there are countless ways to transform it. Also, a song doesn't have to be played in just one way or have just one lyric per language.

Only problem is that there is no infrastructure for value-added content. There are plenty of tools to create them, but what do you do with them once you have them? You can't find them reliably using search engines nor are there ways to buy or sell them.

Don Park's Weekly Habit

Well, sorta weekly.

Month: October 2003