Sunday, May 20, 2012

The Internet Lacks Fresh-By Dating

Having been online since 1988, I fondly remember when navigating the Information Superhighway required arcane tools like Gopher, Telnet, and FTP, and discovering information demanded a hacker's ingenuity and persistence. I was ecstatic when, in 1999 or so, its popularity exploded. Email's much more fun when everyone you know has an email address! And the ability to answer most any question online amazes me more than it does more recent users. This superpower is actually fairly new. Back when there were only hundreds of web sites, we never dreamed it'd get so good so fast.

But there's a downside to this explosion, which worried me early on: the Internet's a pack rat. Like any library, it doesn't favor fresh information. Search for a Wayne Shorter performance, and you'll find yourself swimming through a decade's worth of gig announcements, most undated by year. Go to Shorter's own web site, and it may be stagnant, but it's hard to tell unless the info's dated by-year...which it probably isn't.

I used to worry that the noise from so much stale information would make the Web less and less usable. And while it hasn't melted down yet, it eventually will. Right now, the Internet's a relatively young pack rat. But what happens by 2020 or 2030?

Google hasn't addressed the issue. There's no "freshness" search parameter, and adding "2012" as a search term mostly yields results from ancient pages auto-stamped with an updated "copyright 2012" footer. And it's hard to imagine how they could make such a feature work, because so few sites date their content, and those that do date often exclude the year. So the only way Google could distinguish would be via "most-recent-change-to-page". But that's an awfully imprecise method in an age of dynamic web pages, where old content is framed by constantly updated ads, nav bars, and "latest tweets".

It's a major problem, and I hope it can be addressed. The best route would be for webmasters to establish a practice of diligent dating, perhaps in the metadata. Of course, this would be gamed by SEO*-minded webmasters. But, then again, what wouldn't?

* - SEO, or "Search Engine Optimization", is the sneaky means of fooling Google into paying more attention to your crappy content. Chowhound had a unique SEO strategy: we offered highly useful content, figuring that as people came to appreciate it, Google would up-rank it. Bwa-ha-ha...the old "quality" trick!

Update: a Googler replies

