The art of speech and the art of transcription are combined to create a new cutting-edge technology called automatic speech recognition software. ASR or Automatic Speech Recognition software is found to be the talk of the town. Speech recognition has been a dream for us since the good old days of Star Wars and other science fiction movies and stories. Have our dreams come true? Today the novelties in the markets have been partially fulfilled. Every company has been in this competition to offer the best speech recognition software to the world market. What happened to the race between them? It reminds me of the tale of the hare and the tortoise. The slow and steady seems to have won the race, but still has miles to go before touching the finish line. Discussing what exactly is the goal of the race? Is it reaching the top or reaching the people? It’s again a million dollar question. Since all the cumulative revenue for speech recognition has started to dry up, there is a need to analyze the growth factor over time, which will clearly show a flattened graph showing the stagnant nature of software research and development.
Imagine a situation where you have invested in speech recognition software for a few thousand dollars a month and find that it is not worth it as your dictations are misspelled, words are replaced and mixed up, and the context becomes different. What chaos would that create? . The frustration on display at such times is truly unbearable. Flawless products or services are nowhere to be found as everything on earth comes with unique pros and cons. This also applies to speech-to-text software. It has its own flaws and demerits, which limits its use within the small community. The concept needs more attention and research to catch up or compete with languages that have been developed over millions of years.
The world ethnologist seems to be too long and endless. The languages we speak today are the development of it over millions of years together with all the efforts of millions of generations. All animals communicate with each other, but only humans have formulated communication into a predefined set of signals known as language. The Cortical Speech Center is again an evolutionary characteristic that only humans possess, which differentiates the human brain from other animals in the animal kingdom. Therefore, speech recognition software that has a very recent history compared to languages has to travel not millions but at least a few decades to understand the bare minimum about speech and languages spoken by different groups of people.
The drawbacks of speech or audio-to-text recognition software are:
- He can’t understand all the words after spending hours together training the software. Time is precious after all, we only have 24 hours a day!
- All punctuation marks such as comma, period, semicolon, hyphens require the speaker to dictate where he wants one.
- Understanding context is another big drawback or demerit: some words, especially in English, have many meanings and need to be used in the correct context to get good results on records. The software doesn’t seem to understand the context in most places.
- Homophones are again a difficult task for audio-to-text software to handle: different words with the same pronunciation but different meanings: eg elicitar-illicit; Dessert Desert; they are there; fine flour; bowl-bowl; Words with the same pronunciation but different spelling and meaning, used in different contexts, confuse the software and generate errors and funny phrases and sentences.
- The other major black mark on speech recognition is that it cannot understand the various types of accent that are present in a single language. Understanding words in a neutral slang itself is difficult for the software, so how can it understand the different slangs or accents used by different people around the world?
In 1997, Bill Gates made an outspoken statement saying, “In this 10-year period, I believe we will not only use the keyboard and mouse to interact, but during that time we will have perfected speech recognition and speech output.” . well enough that they become a standard part of the interface.” Now, 3 years out of a decade have passed, and yet speech recognition is only in the primitive stage of use and development.
So, to conclude, the transcription industry has a bigger share in audio-to-text software. Transcribers are not obsolete. They have their own space and need in the field for their integrity, caliber and industry experience.