Syndication Scalability Problem

Scoble writes:

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

I know of a major broadcaster that refuses to turn on RSS feeds because of this issue too. We need smarter aggregators and better defaults.

I think it's high time we addressed this problem and I think the upcoming BloggerCon is the right place to do it.

We need to 'teach' syndication clients to speak with servers and other clients at more intelligent level than that of a spoiled child screaming GIVE ME GIVE ME endlessly.  We need more than just technical solutions.  We need to introduce ways to reward clients that improves scalability and punish bad clients that hog resources selfishly.

Update:

Following excerpt was pulled up from the comment section.

Here is possible solution for a site like blogs.msdn.com that doesn't require P2P:

1. Each blog feed is divided into two feeds, a daily feed (DF) and hourly feed (HF). There is also a master update notification feed (MF).

2. DF is updated once a day and has a permanent URL. HF is updated whenever a post is made and has a transient URL. MF is updated continually and has a permanent URL.

3. Both DF and HF embed a link to a MF.

4. MF contains transient links to updated HFs.

Since HF URLs are temporary and can only be found at MF, clients that ignore the link to MF in DF will only be able to pull DF.

Because DF has update period of 24 hours, dumb yet well behaving clients will hit it only once a day. Originating IPs of misbehaving clients can be identified and access can be slowed down enough to prevent abuse without refusing to serve entirely.

The main idea here is to require changes in clients to access the superior product  (HF) and entice users away from the inferior product (DF) in one of several ways (slower speed, summary only, etc.).  If done right, users will flock to clients that gives them access to the superior product over those that can only access the inferior ones.

Now, coming up with the solution outlined above took only ten minutes of thinking.  I think we can come up with more possible solutions if we put our heads together.

Advertisements