« London Symposium on Social Tools for the Enterprise |
Main
| The New Yahoo! (Messenger that is) »
May 26, 2004
SpeechBot - Indexing Audio Conversations
Posted by Gregory Narain
John Dowdell points to an interesting research project being conducted at HP Labs, the SpeechBot. As the site describes, "SpeechBot is a search engine for audio & video content that is hosted and played from other websites".
Digging a little deeper into the technical documentation for SpeechBot, I came across this summary:
SpeechBot (http://www.compaq.com/speechbot) is the first Internet search site for indexing streaming spoken audio on the web. Unlike previous attempts to index spoken audio on the Web, which have relied on either adjacent text, metadata, or hand supplied transcripts and close captions, SpeechBot uses automatic speech recognition technology to transcribe and index documents that do not have transcripts or other content information. The use of speech recognition permits the efficient and cost-effective indexing of thousands of hours of audio content, which were previously inaccessible. Because of this indexing, SpeechBot allows users to quickly search for relevant content in long audio documents and yields a high precision on first page-retrieved items.
SpeechBot indexes streaming media files based on their content, much as conventional search sites index ordinary Web pages by their text content. Like conventional search sites, SpeechBot does not store or serve the multimedia files themselves, but rather provides users with links. SpeechBots current index has over 3200 shows, 3500 hours of audio and 20 million words. The index is continually updated using SpeechBots highly scalable architecture.
SpeechBot was designed, in principle, to dynamically index streaming audio and other multimedia files that otherwise lack text transcripts. Unlike traditional text documents, audio and other multimedia documents have the additional time vector to account for. The interesting thing about SpeechBot is not that it generates textual transcripts from streaming media sources but that it also indexes time- and format-specific metadata into a separate database. Keyword searches then utilize both databases to pinpoint the location of a reference in the stream itself.
This particular type of technology, however, is most interesting when used in a quite different context. The assumption now is that the greatest value is actually in traversing published media. Realistically, though, there seems to be an even greater opportunity on the horizon. Consider two quick and coming trends: 1) migration of both consumer and business phone services to IP-based technologies and 2) growth of real-time communications tools such as IM, Video Conferencing, and Application Sharing. Both of these methods generate "streams" of content-rich media, though they're usually consumed immediately as opposed to persisted - a la "runtime media".
Imagine applying this to instead search your voicemail by keyword, or better yet, your online conversations with co-workers or friends. Unfortunately, the resources required to support SpeechBot are extensive and any usage in this scenario would require not only deep pockets but overwhelming public trust. This is probably not such a problem for now as the Privacy Policy would still have a field day with this application. Just think -- as packets are to Carnivore, our real-time, online engagements are to SpeechBot. And we know how much everyone loves Carnivore.
Comments (2)
+ TrackBacks (0) | Category: Technology | Telecommunications
- RELATED ENTRIES
- Reminder -- /Message
- /Message - A New Blog
- The Individual Is The New Group -- Part 1
- 1000 Tags: Tag Advertising
- Social Ethics And Technology Design
- Nancy Hass on In Your Facebook.com
- Black and White and Dead All Over: Is Newsprint Dead?
- Anonymous Trolls, Beware: You Are Breaking Federal Laws
1. Duncan Lamb on May 28, 2004 10:02 AM writes...
Very interesting article. Reminds me a bit of Microsoft's OneNote, which allows you to record meetings while scribbling on a tablet PC, then search your notes. Upon finding the keyword you're looking for, you can start the recorded audio from the precise point in the meeting you wrote the note.
There's no auto-transcription however. Would be very interesting if they tried to sell this as an enterprise product to record project meetings, status updates, strategy reviews and the like, and I could see it being useful for technical documentation. Cool stuff.
Permalink to Comment2. Gregory Narain on May 28, 2004 09:18 PM writes...
Duncan,
I'm very intrigued about the OneNote applications. I had previously heard about the voice annotation features but I didn't realize that it was quite so extensive.
I checked out your blog and definitely think we agree that there is tremendous potential in better accessing this information. With PocketPCs getting 600MHx processors and up (not to mention Tablet PCs full-on processors) it's only a matter of time before some of this moves to our pockets.
Greg
Permalink to Comment