Hibernate and TopLink

I've had a lot of trouble using TopLink with Tomcat. Most of the problems appeared to be related to foreign keys and reverse-engineering but, looking deeper, were caused by 'weavers' and classloading. TopLink intercepts calls to POJOs by generating dynamic proxy code then weave them by playing switcheroo at classloader level. This technique apparently has some conflicts with Tomcat's own way of loading classes, conflicts that I spent many hours chasing and resolving only to face more later before giving up. I haven't used OpenJPA but I've read elsewhere that it's even worse. I am sure they'll improve but, for now, stay way.

After giving up on TopLink, I tried Hibernate 3.3 and, despite a lot of initial head scratching, I am happy to report that it worked well without even a single problem. Hibernate's two downsides are that it needs a lot of third-party jars to get it working and that it's rather slow. But then I couldn't go far enough along with TopLink to even test performance so the downsides are bearable. I can always rewrite the DAO layer later using iBATIS or use JDBC directly. For now, ease of development matters the most.

JPA Rough Spots

While JPA is nice, here are some possible rough spots:
Some foreign key columns will confuse TopLink JPA provider off in LAZY fetch mode, causing TOPLINK-60 exception that complains mysteriously about a missing method. Apparently the cause is related to some complicted 'weaving' logic. Quick fix is to switch to EAGER for the offending column (use missing getter method name as hint) and deal with consequences later.
Uncommitted Persist
Persisted but not yet committed POJO instances don't really exist which could lead to problems when you reference them from other POJO instances. Fix is to fiddle with CascadeType settings. For example, if A and B are both new instances and A.b references B, then make sure A.b cascade type is set appropriately (i.e. CascadeType.ALL) then just persist A. Since A.b is set to cascade, persisting A will cause B to be inserted first then its surrogate primary key will be used to insert A. If A and B has a OneToMany relatiohship then you would use A.getB().add(b) instead of A.setB(b).
Frankly, I don't like all this behind-the-scene cascading business so I try to avoid using it whenever I can. But it's difficult to avoid during inserts as the only alternative I can see is  brute-forcing commit and fetch at each step and deal with rollback manually.

OPML and Namespace

Dave Winer asks XML experts: can we put a namespace declaration at the head of an OPML file without breaking processors? If breakage of not even a single existing OPML app is acceptable, then the question can't be answered without actually counting the number of apps that breaks. But this can't be done because not everyone is paying attention.
My best informed guess is that this change will break some apps. A downside of simplicity in XML-based data format is that simplicity could mislead developers into doing task-specific hand-parsing instead of using fully compliant XML parsers.
I think it's more relevant to ask whether his immediate need requires namespace support. Looking at his proposed change (add ownerId element in the head section), I think just adding the element as-is will break fewer OPML apps than doing it the standard way using XML namespace.
Standards are useful but not when it hurts more than helps.

Bruce Schneier on Scanbone Culture

In response to news of pirated copies of the last Harry Potter book appearing on BitTorrent, Bruce Schneier wrote:

Anyone fan-crazed enough to read digital photographs of the pages a few days before the real copy comes out is also someone who is going to buy a real copy. And anyone who will read the digital photographs instead of the real book would have borrowed a copy from a friend. My guess is that the publishers will lose zero sales, and that the pre-release will simply increase the press frenzy.

Sounds reasonable but reasons tend to crumble before adaptive ways of life and culture. Already, Asia has developed a sizable culture of photocopied books shared online. In Korea, photocopied books are called 스켄본 (Scanbone) and they have gone beyond BitTorrent and into private storage services. And it's irreleveant whether they are fans or not because shortcuts can easily become a way of life.


Peter Yared, ex CEO of ActiveGrid, came by my house today. We sat outside (great weather today) and talked about our common past, (NetDynamics) and common near term focus. We took turn demoing our stuff and talked about ways to help each other. I think he has a potential winner with wdgtbldr, his new venture for which he already got a decent seed funding lined up, if he executes right. I am funding my own but I think Other People's Money could add the right kind of motivation and urgency I need to execute.
As to what his startup is about, it's ugly name speaks for itself. ;-p 

Chording Keyboard for iPhone and Mac?

I've read that iPhone's virtual keyboard is not as bad as initially thought but I wonder if Apple will implement virtual version of Engelbart's 5-key chorded keyboard using multi-touch. If done right, users could just tap on the phone's glass surface with five fingers to type (fingers will have to be closer to each other due to lack of space but I don't see a major problem). I thought about hacking together a web-version but, according to iPhone developer docs, iPhone browser doesn't deliver multi-touch events to webapps.
For Macs, I think multi-touch mouse which Apple recently filed patent for might be used. 

Update Pings via EC2

I am not sure if I got the numbers right but, assuming constant load of 20K users (not realistic but what the heck) pinging the server every 5 seconds (did anyone send me roses? no? did anyone send me flowers? no? so on) with each ping weighing about 1K, around 10-20 terabytes of traffic per month would result which would cost around $3-5K/month.
Other option is to get 10 dedicated servers, each with 2T bandwidth/month budget and handling 2K concurrent users, would handle the traffic and load handily. At $200 per server per month, price tag is $2K/month, cheaper than EC2 but more manual labor is needed to scale up.
Considering that 20K concurrent users is in the ballpark of Second Life, I think these numbers are good but they are specific to my apps which is task-oriented. For presence-oriented apps like Pownce, numbers would be much worse because they would have massive number of simultaneous users, even at lower ping rate (60 seconds for Pownce I think).
BTW, I am notoriously bad with numbers. This post is a disguised attempt to uncross my eyes with readers' help. ;-p