jump to navigation

What is Audio Mining .. ? January 6, 2011

Posted by hasnain110 in Uncategorized.

Audio mining approaches

There are two main approaches to audio mining.

Text-based indexing. Text-based indexing, also known as large-vocabulary continuous speech recognition (LVCSR), converts speech to text and then identifies words in a dictionary that can contain up to several hundred thousand entries. If a word or name is not in the dictionary, the LVCSR system will choose the most similar word it can find.

The system uses language understanding to create a confidence level for its findings. For findings with less than a 100 percent confidence level, the system offers other possible word matches, said Professor Dan Ellis, who leads Columbia University’s Laboratory for Recognition and Organization of Speech and Audio (http://labrosa.ee.columbia.edu).

Thus, an LVCSR system can enhance its accuracy level by storing words that sound much like other words, although this approach also generates some wrong results.

Phoneme-based indexing. Phoneme-based indexing doesn’t convert speech to text but instead works only with sounds.

The system first analyzes and identifies sounds in a piece of audio content to create a phonetic-based index. It then uses a dictionary of several dozen phonemes to convert a user’s search term to the correct phoneme string. (Phonemes are the smallest unit of speech that distinguishes one utterance from another. For example, “ai”, “eigh”, and “ey” are the long “a” phoneme. Each language has a finite set of phonemes, and all words are sets of phonemes.) Finally, the system looks for the search terms in the index.

“A phonetic system requires a more proprietary search tool because it must phoneticize the query term, then try to match it with the existing phonetic-string output,” Weideman said. This is considerably more complex than using one of the many existing text-based search tools.

Phoneme-based searches can result in more false matches than the text-based approach, particularly for short search terms, because many words sound alike or sound like parts of other words. For example, Weideman explained, a search for the word “ray” might get a match from within the word “trading.”

According to Ellis, it’s difficult for a phonetic system to accurately classify a phoneme except by recognizing the entire word that it is part of or by understanding that a language permits only certain phoneme sequences.

However, he added, phonetic indexing can still be useful if the analyzed material contains important words that are likely to be missing from a text system’s dictionary, such as foreign terms and names of people and places.

How the technology works

Text- and phoneme-based systems operate in much the same way, except that the former uses a text-based dictionary and the latter uses a phonetic dictionary.

The most important and complex component technology for audio mining is speech recognition. In these systems, explained University of Texas Assistant Professor Latifur R. Khan, “A speech recognizer converts the observed acoustic signal into the corresponding [written] representation of the spoken [words].”

Speech recognition software contains acoustic models of the way in which all phonemes are represented. Also, TMA’s Meisel said, there is a statistical language model that indicates how likely words are to follow each other in a specific language. By using these capabilities, as well as complex probability analysis, the technology can take a speech signal of unknown content and convert it to a series of words from the program’s dictionary.

Khan noted that this process is more difficult with highly inflected languages, such as Chinese, in which tonality changes the meaning of a word.

Some audio mining dictionaries are domain specific, for use by professionals in different fields, such as law or medicine. In any event, users can update dictionaries, usually manually but sometimes automatically by scanning Web sites or other sources into an audio mining product.

Some products, such as ScanSoft’s AudioMining Development System, use XML’s ability to tag data so that it can be read by other XML-capable systems, ScanSoft’s Weideman noted. This lets the product export speech index information to other systems, he said.


By working with powerful host-system processors, large memories, and efficient algorithms, most audio mining technology provides high performance levels.

For example, Fast-Talk says its newest technology can index a one-hour audio file in five minutes, and can process 30 hours of content per second in response to a specific, 10-phoneme search query in a host system running a 2.53-GHz Pentium CPU.




1. Festool PC1224S C12 12V NiCD 2.4 Ah Cordless Drill Set With Right Angle and Eccentric Chucks | Reconditioned Cordless Drill - January 6, 2011

[…] What is Audio Mining .. ? « Hasnain Ali Blog […]

2. Manya Mayes - January 6, 2011

Hasnain, this is a great audio mining article and a really fascinating industry! I work in the text analytics arena and often interface with customers who use text analytics/text mining to find patterns/issues about their customers/brands/products in voice to text transcriptions. I’m really excited to hear of any improvements in transcription speed and indexing accuracy. One day(and perhaps not in the too distant future) it will be possible to perform real time text analytics on customer voice data across multiple channels like internal databases and social media sites. Very cool. But, for now, the technology is already sufficiently advanced to take voice transcriptions and analyze them using text analytics techniques and more. And, even though the transcriptions are not 100% accurate (as you mention in your post), there is plenty to be learnt from the huge volumes of voice that companies capture from their customers.

hasnain110 - January 7, 2011

Hei Manya !

Thanks for the descriptive comments. Please do share more from valuable information

3. parental control - January 22, 2011

I am very thrilled you said this..-Sincere Regards

4. Autoverzekering vergelijken - January 28, 2011

readig this was fun

5. zenithink - January 30, 2011

Your post is very useful.Thanks a lot for sharing these information. The post has also helped a lot. Look forward to your next post Your blog is very useful. Thank you so much for providing plenty of useful content. I have bookmark your blog site and will be without doubt coming back. Once again, I appreciate all your work and also providing a lot vital tricks for your readers.
Thanks for the great idea you have post. I’ll wait for another info which will you share. zenithink wrist phone gpad spy cameras i found it very interesting and at the same time very informative i will definitely bookmark this site for future reference…

6. hitch mount bike rack - February 3, 2011

You wrote some good parts here. I searched for the topic and found plenty of people who agree with you.

hasnain110 - February 3, 2011

Thanks , I always welcome to publish useful information give by other people

7. jack - October 21, 2011

I have found a good database for free examples of audio mining called mystro.


It’s an editor-friendly desktop application with over 7,000 music tracks. Thought you and your readers may like to actually here some examples.


8. sex web cam chat opinion - July 20, 2017

sex web cam chat opinion

What is Audio Mining .. ? | Hasnain Ali Blog

9. Website - October 13, 2018


What is Audio Mining .. ? | Hasnain Ali Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: