These days, I am not tracking Atom mailing list too closely due to the traffic volume (currently 7 times XML-DEV traffic)and lack of time, but Tim's FooCamp2004 post prompted me to read Sam's Vary: ETag post and comments.
While I like the cleverness of the solution, I have misgivings about how practical it really is. Aside from requiring Vary: ETag aware clients keep track of ETags, the solution requires a lot of server side work for doubtful gain.
- Everyone seems to agree that low traffic blogs won't see any noticeable gain.
- I don't think the large blog services like TypePad and Blogger will gain much either because such services must support tens of thousands of feeds, each of which must be sliced and diced at the expense of CPU load to reduce bandwidth. Parsing every feed for every request to figure out which subset to send back is not cheap. Even if a cache is used, frequent editing of recent posts will increase the CPU load noticeably.
- That leaves only feeds like the MSDN aggregated feed which will see noticeable bandwidth reduction at the expense of writing a custom Vary: ETag handler.
A key problem is that XML is not an efficient format if you are doing a lot of search and extraction. Regular expression can be used but not reliably or fast enough unless the feed is preprocessed into a more palatable form (canonical or proprietary reg-ex friendly format).
A similar but more practical solution might be to serve feeds as a multipart MIME resources with sequenced parts. Each feed item becomes a MIME part and feed metadata is also a MIME part. Extra benefit is that binaries can be embedded as well and other content formats (i.e. RSS) can be supported as well.