Home / Blogs / Text-to-Speech: Is it as good as it sounds?

Text-to-Speech: Is it as good as it sounds?

Copied to clipboard
Published: 22 Nov 2023

In a word — Yes & No!

 

Unlike Speech-to-Text (STT), Text-to-Speech (TTS) holds a clear advantage in that it generates a computer-generated voice from a non-ambiguous source. Assuming the text is correct, the “voice” will reproduce it more or less accurately. The same cannot be said for Speech-to-Text, where errors routinely occur for a variety of reasons, such as the speaker’s accent, talking speed, slurring of words, and so on.

 

However, the devil’s in the details. There are challenges in Text-to-Speech, even when the source text is perfect… As text, that is, such as:

  • foreign words
  • mispronunciation of names
  • abbreviations / acronyms
  • dates, times & measures
  • homonyms

 

There are workarounds for all of the above, but like Machine Translation, Speech-to-Text, and Optical Character Recognition (OCR), the time & manpower required often negate the very utility one hopes to realize by using these technologies in the first place. Much depends on the desired end use in order to rationalize such investment.

 

For example, applying Text-to-Speech to a 300-page novel might well be worth the investment, but certainly a waste of time for a short document or web page. And even with the added costs, it will still sound less-than-human regardless of the latest improvements. Besides, Text-to-Speech generated audio books might just not be a very pleasant user experience.

 

 

That said, Text-to-Speech is still a great tool regardless of its drawbacks. It may not be the perfect substitute for humans, at least not yet, but it still fills a gap that would otherwise leave certain audiences with no means whatsoever to access the written word. And if the intonation is a little off, or certain words are mispronounced, the user is still happy enough with the results. The advancement in human-sounding voices only enhances the experience further, which will only get better over time. And even better, the apps for TTS are often free, with customizable voices to boot!

 

Today, the most common users of Text-to-Speech include…

  • readers / listeners on the go (audiobooks)
  • the visually impaired
  • non-native speakers who can understand but cannot read a foreign language
  • the speech impaired, to deliver their message
  • low budget eLearning courses

 

Here at EQHO, we’ve definitely profited from Speech-to-Text technology as we are often tasked with translating videos where no script is provided. In the past, the audio would have to be transcribed manually, but today we use the latest software to transcribe, then a human review as there are always errors. Still, it’s a great technology that helps us a lot in both time & costs.

 

However, the same cannot be said for Text-to-Speech. By the time we correct for intonation, apply rules for abbreviations, acronyms, homonyms, etc., we can provide a professional human voice just as easily & quickly at competitive prices. As the technology improves over time, humans may well become obsolete… But we’re not there yet — not by a long shot!

 

*Photo courtesy of Pexels.

SIMILAR ARTICLES
2024-01-10 12:18:00
The Bangkok International Performing Arts Meeting (BIPAM) was established in Thailand in 2017. In this interview we spoke with Siriwanij about BIPAM and how it is attracting attention as a new performing arts platform for Southeast Asians.
2023-11-22 16:05:00
Choosing between using subtitles and employing a voiceover artist will be easy, as you’re ideally placed to judge which option will give your customers the most satisfaction. If you’re not sure about which option is best, there are a number of factors worth considering.
2024-04-23 13:07:00
The rise of AI-generated deepfake voices presents a unique set of challenges for multimedia projects requiring authentic human voiceovers.
2023-11-22 16:03:00
Nobody wants to be left embarrassed (or worse still, liable) after signing off on audio that contains mispronunciations, as every file or video scene with a mistake will have to be rerecorded. As well as delaying launch, unnecessary retakes are also expensive and can see budgets pushed to breaking point. It’s easy to prevent against this, so don’t leave yourself exposed.
2024-01-10 12:25:00
นิทรรศการ “Crossing the Lines: แรงงานข้ามเส้น” นิทรรศการกลุ่มที่ออกแบบและพัฒนาโดยผู้ที่ผ่านการคัดเลือกโครงการ Start! Art Curator รุ่นที่ 2 โครงการส่งเสริมภัณฑารักษ์รุ่นใหม่เรียนรู้การสร้างสรรค์นิทรรศการ ผ่านประสบการณ์ทำงานร่วมกันกับทีมงานมืออาชีพ กับผลงานนิทรรศการกลุ่มที่นำเสนอเรื่องราวหลากหลายแง่มุมของผู้ใช้แรงงาน
Get in touch
mail PROJECTV@EQHO.COM
© 2024 Project V. All rights reserved.
V PARTNERS
Your subscription could not be saved. Please try again.
Your subscription has been successful.