Corante

Quote
"I can’t think of anything that demonstrates the sovereign nature of the self better than a blog.” - Doc Searls
About the Author
stowegold150x150.jpg
Stowe Boyd is a well-known media subversive, and an internationally recognized authority on real-time, collaborative and social technologies. His new blog is Message.
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

Get Real

« Jon Stewart/Daily Show on Watching The Watchers | Main | AlwaysOn/Technorati Open Media 100: Controversy In The Blogosphere »

May 16, 2005

What's Going On At Technorati?

Email This Entry

Posted by Stowe Boyd

I had a strange episode last week. I have been trying to use Technorati tags (you may have noticed them at the bottom of recent posts), but even though I was, in some cases, using tags that other bloggers had already created and posted with (such as Les Blogs and Dodgeball), my posts weren't showing up at Technorati -- often not for days.

Last week, I finally emailed the nice people at Technorati. I got the following message from Andy Adam (updated 1.30pm -- see comment) Hertz:

[via email]

Stowe,

Thanks for reporting this. We had a glitch producing our tag page results. It's all fixed. We had the posts all along -- and as you can see now, you were the first!

Adam

By being first, he meant first user of the tag for Dodgeball.

So, apparently they had some sort of glitch where they weren't updating results of various database activities. I wondered what else might not be getting updated. While it's not scientific, I looked at recent references to Get Real after getting this email, and there seemed to be a long list of links that I hadn't noticed before -- and I look through that often, trying to find new voices that are building on themes we track at Get Real.

Even more interesting: Get Real has been rapidly rising in the Technorati rankings, growing from around 8000 around the turn of the year to a recent high of 4,017 or so. We had been stalled for weeks, which seemed odd. So I looked, and in just that morning, since I had reported the bug and received the message, Get real had climbed like 600 increments in Technorati ranking, up to 3,416!

I also noted this morning that we are stalled again: Get Real has not moved up (or down) from 3,416 since last Thursday. I am happy to see that Get Real is the 3,416th most linked to blog, but I wonder about the stall: shouldn't these rankings be constantly moving up or down, based on new links being created? So, the question is, is there something going on at Technorati, where they have to go over and kick a server? Are they so backlogged with queued analysis tasks that things artificially stall? Did they run an update on Get Real alone, or the entire blogosphere? What's the story?

I have had a number of knowledgeable folks suggest that Technorati is having trouble scaling with the explosive growth of the blogosphere. It's a shame if it's true, because they provide an invaluable service, and with the growth of tags edging out blog categorization as an taxonomic mechanism, it is in the public interest that Technorati work. We are all coming to depend on it as a means of making sense of the world. Clcik on the tags at the foot of this story: at this monet, 9am ET Monday 16 May, this piece is not showing up on the Technorati pages associated with those tags, alrthough I have test posted this three or four times, and manually pinged technorati, as well.

I hope that someone like Google or Yahoo scoops them up and ensures that these core infrastructure mechanisms work as needed for the blogosphere.

[tags: , , , ]

Comments (3) + TrackBacks (0) | Category: Technology


COMMENTS

1. Adam Hertz on May 16, 2005 11:36 AM writes...

Stowe,

Thanks for your thoughtful comments, and your support for Technorati. Please forgive the length of this comment; you raise some important issues, and I felt they deserve a substantial response.

Our mantra at Technorati is Be Of Service. We take this really seriously. We're very proud that we've created a valuable service that people depend on every day. We strive for perfection when it comes to accuracy, and we try to stay as close to real-time as we can get. It's not easy, and we don't always measure up to these lofty goals.

Another value of our company is transparency and honesty. That's why I wrote you back so quickly with an explanation of the behavior you were seeing. I've worked at companies where responses to criticisms are "spun". That's not our style.

Having said that, we don't always go into excruciating detail in our responses. Our users just want our service to work, they don't necessarily care about the details. In this case though, perhaps I could have been more explicit. The problem was not that we hadn't indexed your post with the Dodgeball tag. It was a transient failure in the application that produces the tag results page. So as such, it wasn't a symptom of not "keeping up".

There is no doubt that keeping up with the growth of the blogosphere is a major technical challenge. You probably noticed that we just passed 10 million blogs; even more astoundingly, we tracked our 1 billionth link about a month ago. We are trying to do two things at once: index the blogosphere in real time, and maintain a deep historical database with up-to-date analytics. Doing either one of these alone would be hard enough. The combination is a tall order, but that's our mission. As far as I know, we're the only service that's trying.

To meet this challenge, we've created a solid architecture, secured the resources to buy the necessary hardware, and most importantly, assembled one of the finest teams I've ever had the privilege to work with.

Scaling is evolutionary. Sometimes it's just a matter of adding capacity to an existing system. Other times we need to redesign a system that has grown faster than we expected.

You reported two different problems in your post. The first concerns the delay in spidering new blog posts. Up until recently, our median time to index a new post was about five minutes. Mornings are very busy in the blogosphere, and lately we've started to fall behind between 6 AM and noon Pacific time. We are adding more capacity to address this. You should see us back in the zone soon.

The second problem you raised concerns link counts getting "stuck". In this case, we're redesigning the way we report on link counts. The most important thing is, we have all the underlying link data. The part we need to scale is how we calculate and report the counts. We currently do this by querying our database and caching the results. This worked fine for the first year or so, but at this point the caches are getting stale faster than we can recalculate the counts. So, we're replacing this with a system that increments counts as links are seen. Given that we need to keep track of counts for hundreds of millions of URLs, this needs to be a very robust system. We're happy with the design we've come up with, and we should be deploying the new system in the next few weeks.

I hope you find these explanations helpful. Thanks again for your attention and support, and for taking the time to write about us. We're working hard to keep you happy.

-A-

P.S. (it's Adam, not Andy :)

Permalink to Comment

2. Kim M. Bayne on May 16, 2005 12:21 PM writes...

Wow! Had I not found your entry, I may have continued to think I was blogging in the Twilight Zone regarding my Technorati indexing (or lack thereof). I pondered whether my blog had been banned from Technorati for some unknown reason, like overzealous pinging. Like you, I reference Technorati tags, but can't get the system to index in a timely manner. But splogs or spam blogs don't seem to have that problem. What's up with that?

Permalink to Comment

3. Adam Hertz on May 16, 2005 08:38 PM writes...

Kim, please see my comments on your blog. For some reason, we classified your blog as spam and stopped updating it. I reclassified it just now, so you should be seeing your recent posts in our index in a little while. Sorry.

-A-

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Reminder -- /Message
/Message - A New Blog
The Individual Is The New Group -- Part 1
1000 Tags: Tag Advertising
Social Ethics And Technology Design
Nancy Hass on In Your Facebook.com
Black and White and Dead All Over: Is Newsprint Dead?
Anonymous Trolls, Beware: You Are Breaking Federal Laws