Information Overload

April 13th, 2003 9:57 PM

I’m hitting the threshold of too much information from my blogroll, yet every day I find new feeds I’d like to monitor. This got me thinking about how I could sensibly start filtering my blogroll feeds so that I’d be able to passively monitor them without it becoming a full time job.

I’m interested to see how statistical analysis (perhaps in the form of frequency analysis or Bayesian filters) might cut down on the posts that I’m not interested in. Specifically, I believe NetNewsWire is perfectly suited to this job if I could figure out a way to force hooks into it; all articles that NetNewsWire is aware of become the data set with articles I “Open in Browser” being tagged as positive and others being tagged as negative (this seems like a good first cut, at least).

At that point, I could keep NetNewsWire subscribed to my actively tracked blogroll, and set up a single, custom feed (generated locally) of articles from all passively tracked feeds that pass the filter. I haven’t had any luck in figuring out how to add those hooks to NetNewsWire (to run a script when an article is opened in a browser) which may indicate its absence from the feature set, or may indicate my poor understanding of and familiarity with AppleScript.

If this sort of scripting of NetNewsWire were accomplished, I believe very little (if any) application modification would be necessary to allow my blogroll to expand from 60+ subscriptions to several hundred. Actual results from statistical analysis may prove this theory wrong, but it looks to be worth a try.

Comments

this sounds like such a great idea. i’m also starting to suffer from info overload in my daily blog reports :) although, i’d be afraid of missing something new if i used a purely statistical approach. what happens when your interest shifts or you find a novel topic and your current statistical filter throws it out before it has a chance to learn it?

Posted by: gary on April 14th, 2003 11:20 PM

That’s why I’m advocating a dual-approach: a core set of actively tracked feeds that inform and constantly update the filter dataset, and a set of passively tracked feeds that are interesting enough to track, but not a big deal if you miss some articles. It might not be perfect, but it seems like a good first cut.

Posted by: kasei on April 15th, 2003 2:32 AM

ok. that definitely sounds more useful. i know i have feeds i usually just skim over—like metafilter, with its gazillion new entries per day :) so it’d be nice to have a filter run over those. this sounds very close to being a full-fledged knowledge database though… you might as well start feeding in your bookmarks, emails, im’s etc =P what was that emacs plugin you mentioned recently? that’d be a good candidate to integrate with your blog idea.

Posted by: gary on April 15th, 2003 2:34 PM