Phishing and Bouncing

Looks like the trick of using redirection CGIs at popular website (described in Phishing with Google) is getting popular among phishers.   I just got a couple that uses AOL's redir-complex CGI at:

http://r.aol.com/cgi/redir-complex?url=whereever

Note that phishers can use not just the redirecting CGIs, but also those CGIs that use return URL as parameters.  In fact, these types of CGIs are popular among financial institutions and single-sign on services.  For example, both Passport and 3D-Secure uses them.

Syndication Scalability Problem

Scoble writes:

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

I know of a major broadcaster that refuses to turn on RSS feeds because of this issue too. We need smarter aggregators and better defaults.

I think it's high time we addressed this problem and I think the upcoming BloggerCon is the right place to do it.

We need to 'teach' syndication clients to speak with servers and other clients at more intelligent level than that of a spoiled child screaming GIVE ME GIVE ME endlessly.  We need more than just technical solutions.  We need to introduce ways to reward clients that improves scalability and punish bad clients that hog resources selfishly.

Update:

Following excerpt was pulled up from the comment section.

Here is possible solution for a site like blogs.msdn.com that doesn't require P2P:

1. Each blog feed is divided into two feeds, a daily feed (DF) and hourly feed (HF). There is also a master update notification feed (MF).

2. DF is updated once a day and has a permanent URL. HF is updated whenever a post is made and has a transient URL. MF is updated continually and has a permanent URL.

3. Both DF and HF embed a link to a MF.

4. MF contains transient links to updated HFs.

Since HF URLs are temporary and can only be found at MF, clients that ignore the link to MF in DF will only be able to pull DF.

Because DF has update period of 24 hours, dumb yet well behaving clients will hit it only once a day. Originating IPs of misbehaving clients can be identified and access can be slowed down enough to prevent abuse without refusing to serve entirely.

The main idea here is to require changes in clients to access the superior product  (HF) and entice users away from the inferior product (DF) in one of several ways (slower speed, summary only, etc.).  If done right, users will flock to clients that gives them access to the superior product over those that can only access the inferior ones.

Now, coming up with the solution outlined above took only ten minutes of thinking.  I think we can come up with more possible solutions if we put our heads together.

Compiling away my weekends

I spent a good part of Sunday recompiling many popular open source libraries (i.e. iconv, libxml, libxslt, xmlsec, openssl, zlib, etc.) as Win32 static multithreaded libraries without C runtime DLL name dependency (/MT) so I can use them in VC7.1 projects.  All because some guy at Microsoft decided to change the C runtime DLL name from fixed MSVCRT.DLL to version-based names like MSVCR70.DLL.

When that happend, popular open source tools and libraries got stuck with Visual C++ 6.  Binaries of Python and all the libraries I mentioned above, for example, are built with VC6 and links to C runtime library dynamically (meaning they require MSVCRT.DLL).  This means Win32 Python extensions couldn't be built with later versions of VC++ without walking into a lot of headaches.  Same problem stopped me from using prebuilt binaries of those libraries so I had to rebuild them myself.

I bent a few things and had to guess at a few places, but nothing seems to be broken.  Only trouble is that I am using WinInet to fetch XML files from Internet and it ain't behaving too well.  WinHTTP is supposed to be better but I think I'll opt for cURL instead.  Of course, that means I'll have more compiling to do next weekend.  Urgh.

Update:

Heh.  I didn't realize the weekend wasn't over yet until my wife informed me that today was the Labor Day.  To celebrate, I labored some more after a mini-vacation at the neighborhood pool with my wife and son.  cURL had Win32 project files so it was a cinch to build.  I did have to turn on OpenSSL and ZLib though to get HTTPS and gzip compression working (define USE_OPENSSL, HAVE_ZLIB, and HAVE_ZLIB_H flags).  Now I can reliably pull compressed RSS feeds over HTTPS from a Win32 client.

BTW, all these chores wouldn't have been necessary if I wrote it in Java or .NET because both platforms have most of these libraries built in.  So why am I humping the sidewalk?  Well, there are still things one can do with C++ that you can't do with Java or .NET…

BadPaddingException

If you are a Java engineer who ran into a BadPaddingException and ended up here by googling, here is the clue:

BadPaddingException is thrown not because padding was actually 'bad'.  The exception is thrown because the data you are trying to decrypt is malformed, usually by errors in transport-related encoding/decoding code (i.e. BASE64).  Check your encoders and decoders.  Also check to see if you are doing questionable conversion between byte array and String.

For the rest of you, nevermind.

Milter Smelter, Wash Out with gSoap

I spent the whole weekend writing a milter (sendmail filter).  Everything was going great until I started making some SOAP calls from within the milter and, wham, I am getting that stupid 'Expecting 5' message.  Urgh.  It's not fun working with a sensitive piece of crap.

BTW, gSoap is a great library for implementing web service client or service in C or C++.  Only problem is that its rather disorganized so you'll have to waste some time figuring out what the hell is going on.

PHPEclipse

PHPEclipse is an Eclipse plugin that turns Eclipse into a PHP IDE.  I don't usually do PHP work, but a close friend of mine asked me to review his company's PHP-based website so I had to review a massive body of PHP code within only a few hours I could spare.

While any text editor can be used to write PHP code, mere text editors are not enough when you don't have much time to cover a lot of code.  So installed PHPEclipse and found it to be really nice.  It checks syntax and helps you trace and navigate call hierarchies easily.  I haven't tried its debugging capabilities, but I was delighted enough with just the capabilities I used to recommend it to PHP developers.

BTW, I am not a PHP developer and I don't build websites for small businesses.  It's not that that is not a respectable business.  It's just that I don't like doing what millions of others developers can with adequate competence.  Yes, I am a prima donna of sort.

Phishing behind Google

I just received a phishing email purporting to be from PayPal.  No surprise there since I get many of them everyday, but I looked closer at this one because it looked very professionally done.  I looked at the raw message and found this odd link:

This particular phisher is bouncing off Google to hide itself from domain name-based phishing detectors and scanners.  Clever.  Clicking on the link will open a browser to Google's URL search CGI which will automatically redirect the browser to the phishing site at IP address 209.152.181.10.  This trick will bypass phishing detectors that examines only the domain name part of a URL to see if it looks suspicious.

So the lesson here for security developers is to look at all the parameters and to keep track of oh-so-helpful redirectors like Google.  Also, website developers should keep in mind that helpful service is helpful to all, including the bad guys, and they might become an unwitting partner in crime.  For lawyers, it's a new source of income concern.

Open Source Inspectors

Open source is not inherently more secure than closed source.  If you have doubts about the preceding statement, Dare Obasanjo's The Myth of Open Source Security series of articles is a good place to start.

Two main problems I see from my perspective with open source security are that a) there are no compelling incentives for open source developers to examine the code, and b) they have to examine everything.  Even if all the developers are coerced into doing so, not everyone will do a good job and everyone is not the same as everything.

On the other hand, blackhats have compelling incentives to look at the code and they only need to look at a fraction of the code developers have to look at since they only need to find one vulnerability to hit paydirt.

While I agree with Dare on most points, I think his suggested solution of adopting software quality enhancing techniques and practices is unimplementable for most open source projects.  As software developers and managers, we tend to focus too much on how we doing things and what we use to get things done, meaning skills, techniques, and tools we use every day.  The open source movement is not about those things.  It's not about how or what but who, people doing things together.

Quality of open source software cannot be improved by asking people to wear straight jackets and drawing lines on the floor telling people where to go next.  Instead, we need to see the entire open source community as a global ecology and find subtle ways to change the antfarm environment so that the ants people will naturally respond in the direction that improves the quality of goods they produce.

One such solution is the introduction of open source inspectors backed by inspector rating and reward systems.  An open source inspector is a software engineer whose responsibility is to inspect the quality of software.  Unlike developers who tend to stay with a small stable of projects for extended periods of time, inspectors are gypsies who move from projects to projects.

Each inspector examines code for quality and security.  Result of an inspection is a report and a rating assertion signed by the inspector.  Rating assertions by an inspector ultimately affects the proficiency rating of the inspector.  Each bug or vulnerability discovered in the code they inspected lowers their proficiency rating.

Achieving and maintaining high proficiency rating is the lure reward motivating inspectors to dedicate a substantial portion of their time to inspect open source projects of their choosing pro bono.  If they are any good, they will find plenty of paying customers.

In summary, I am advocating the use of social engineering over software engineering to enhance open source security.  Designing, developing, debugging, and deploying social forces is the ultimate engineering profession IMHO.  The only problem with such a profession is that lifecycles of such 'wares' literally means lifecycles.

Getting a rise out of Reiser4

Last time I looked at ReiserFS was, I think, at least couple of years ago.  It was a nice file system but I didn't find any use for it.  Two years later, Reiser4 is released and I still can't find a good use for it, but it sure has some intriguing one liner feature list that would any geek a bit of excitement:

  • Reiser4 is the fastest filesystem, and here are the benchmarks.
  • Reiser4 is an atomic filesystem, which means that your filesystem operations either entirely occur, or they entirely don't, and they don't corrupt due to half occuring. We do this without significant performance losses, because we invented algorithms to do it without copying the data twice.
  • Reiser4 uses dancing trees, which obsolete the balanced tree algorithms used in databases (see farther down). This makes Reiser4 more space efficient than other filesystems because we squish small files together rather than wasting space due to block alignment like they do. It also means that Reiser4 scales better than any other filesystem. Do you want a million files in a directory, and want to create them fast? No problem.
  • Reiser4 is based on plugins, which means that it will attract many outside contributors, and you'll be able to upgrade to their innovations without reformatting your disk. If you like to code, you'll really like plugins….
  • Reiser4 is architected for military grade security. You'll find it is easy to audit the code, and that assertions guard the entrance to every function.

Dancing trees?  I gotta look into that algorithm sometimes.  I wonder if variations of the algorithms will be called Disco or Samba?  ;-)  Hmm.  One of the testimonials is LivingXML which is a native XML engine built-on top of Reiser.  That's nice except LivingXML seems to be, well, dead.  Oh, well.

Perl vs. Java RegEx

Tim Bray compares Perl and Java regular expression performance with the result of Java performing twice as fast as Perl when output performance is factored out.  Fantastic.  I knew Java regular expression library was fast but I didn't know it was this fast.  Even more encouraging, there are even faster third party regular expression libraries for Java.  I wonder if .NET 2.0 makes up for the lackluster RegEx performance in .NET 1.1.

Update:

Jeff Atwood is getting completely different result (.NET RE faster by ~40%) from an informal benchmark I did a while back (.NET slower by ~60%).  BTW, I don't believe .NET RE is 20 times slower than Java RE.