Home / Blogs / Text-to-Speech: Is it as good as it sounds?

Text-to-Speech: Is it as good as it sounds?

Copied to clipboard
Published: 22 Nov 2023

In a word — Yes & No!

 

Unlike Speech-to-Text (STT), Text-to-Speech (TTS) holds a clear advantage in that it generates a computer-generated voice from a non-ambiguous source. Assuming the text is correct, the “voice” will reproduce it more or less accurately. The same cannot be said for Speech-to-Text, where errors routinely occur for a variety of reasons, such as the speaker’s accent, talking speed, slurring of words, and so on.

 

However, the devil’s in the details. There are challenges in Text-to-Speech, even when the source text is perfect… As text, that is, such as:

  • foreign words
  • mispronunciation of names
  • abbreviations / acronyms
  • dates, times & measures
  • homonyms

 

There are workarounds for all of the above, but like Machine Translation, Speech-to-Text, and Optical Character Recognition (OCR), the time & manpower required often negate the very utility one hopes to realize by using these technologies in the first place. Much depends on the desired end use in order to rationalize such investment.

 

For example, applying Text-to-Speech to a 300-page novel might well be worth the investment, but certainly a waste of time for a short document or web page. And even with the added costs, it will still sound less-than-human regardless of the latest improvements. Besides, Text-to-Speech generated audio books might just not be a very pleasant user experience.

 

 

That said, Text-to-Speech is still a great tool regardless of its drawbacks. It may not be the perfect substitute for humans, at least not yet, but it still fills a gap that would otherwise leave certain audiences with no means whatsoever to access the written word. And if the intonation is a little off, or certain words are mispronounced, the user is still happy enough with the results. The advancement in human-sounding voices only enhances the experience further, which will only get better over time. And even better, the apps for TTS are often free, with customizable voices to boot!

 

Today, the most common users of Text-to-Speech include…

  • readers / listeners on the go (audiobooks)
  • the visually impaired
  • non-native speakers who can understand but cannot read a foreign language
  • the speech impaired, to deliver their message
  • low budget eLearning courses

 

Here at EQHO, we’ve definitely profited from Speech-to-Text technology as we are often tasked with translating videos where no script is provided. In the past, the audio would have to be transcribed manually, but today we use the latest software to transcribe, then a human review as there are always errors. Still, it’s a great technology that helps us a lot in both time & costs.

 

However, the same cannot be said for Text-to-Speech. By the time we correct for intonation, apply rules for abbreviations, acronyms, homonyms, etc., we can provide a professional human voice just as easily & quickly at competitive prices. As the technology improves over time, humans may well become obsolete… But we’re not there yet — not by a long shot!

 

*Photo courtesy of Pexels.

SIMILAR ARTICLES
2024-01-10 12:45:00
Suranya (Organ) Poonyaphitak enhanced her expertise by participating in the Asian Producers’ Platform (APP) Camp 2023, held in Bangkok and Chiang Mai from 1-9 April 2023.
2023-11-22 15:11:00
The internet has become an enormous shop window for anyone with products and services to sell in recent times. It started off with banner advertisements and pop-ups as far back as the late 1990s, but marketing has now evolved into something much more sophisticated. Among the biggest aids to this has been the advent of video advertising.
2024-01-10 12:18:00
The Bangkok International Performing Arts Meeting (BIPAM) was established in Thailand in 2017. In this interview we spoke with Siriwanij about BIPAM and how it is attracting attention as a new performing arts platform for Southeast Asians.
2023-11-22 16:03:00
Nobody wants to be left embarrassed (or worse still, liable) after signing off on audio that contains mispronunciations, as every file or video scene with a mistake will have to be rerecorded. As well as delaying launch, unnecessary retakes are also expensive and can see budgets pushed to breaking point. It’s easy to prevent against this, so don’t leave yourself exposed.
2024-01-10 12:42:00
Since its inception in 2016, the Bangkok International Children’s Theatre Festival, or BICT Fest, has been providing performing arts experiences and making an impact on Thailand’s cultural scene.
Get in touch
mail PROJECTV@EQHO.COM
© 2024 Project V. All rights reserved.
V PARTNERS
Your subscription could not be saved. Please try again.
Your subscription has been successful.