Home / Blogs / Text-to-Speech: Is it as good as it sounds?

Text-to-Speech: Is it as good as it sounds?

Copied to clipboard
Published: 22 Nov 2023

In a word — Yes & No!

 

Unlike Speech-to-Text (STT), Text-to-Speech (TTS) holds a clear advantage in that it generates a computer-generated voice from a non-ambiguous source. Assuming the text is correct, the “voice” will reproduce it more or less accurately. The same cannot be said for Speech-to-Text, where errors routinely occur for a variety of reasons, such as the speaker’s accent, talking speed, slurring of words, and so on.

 

However, the devil’s in the details. There are challenges in Text-to-Speech, even when the source text is perfect… As text, that is, such as:

  • foreign words
  • mispronunciation of names
  • abbreviations / acronyms
  • dates, times & measures
  • homonyms

 

There are workarounds for all of the above, but like Machine Translation, Speech-to-Text, and Optical Character Recognition (OCR), the time & manpower required often negate the very utility one hopes to realize by using these technologies in the first place. Much depends on the desired end use in order to rationalize such investment.

 

For example, applying Text-to-Speech to a 300-page novel might well be worth the investment, but certainly a waste of time for a short document or web page. And even with the added costs, it will still sound less-than-human regardless of the latest improvements. Besides, Text-to-Speech generated audio books might just not be a very pleasant user experience.

 

 

That said, Text-to-Speech is still a great tool regardless of its drawbacks. It may not be the perfect substitute for humans, at least not yet, but it still fills a gap that would otherwise leave certain audiences with no means whatsoever to access the written word. And if the intonation is a little off, or certain words are mispronounced, the user is still happy enough with the results. The advancement in human-sounding voices only enhances the experience further, which will only get better over time. And even better, the apps for TTS are often free, with customizable voices to boot!

 

Today, the most common users of Text-to-Speech include…

  • readers / listeners on the go (audiobooks)
  • the visually impaired
  • non-native speakers who can understand but cannot read a foreign language
  • the speech impaired, to deliver their message
  • low budget eLearning courses

 

Here at EQHO, we’ve definitely profited from Speech-to-Text technology as we are often tasked with translating videos where no script is provided. In the past, the audio would have to be transcribed manually, but today we use the latest software to transcribe, then a human review as there are always errors. Still, it’s a great technology that helps us a lot in both time & costs.

 

However, the same cannot be said for Text-to-Speech. By the time we correct for intonation, apply rules for abbreviations, acronyms, homonyms, etc., we can provide a professional human voice just as easily & quickly at competitive prices. As the technology improves over time, humans may well become obsolete… But we’re not there yet — not by a long shot!

 

*Photo courtesy of Pexels.

SIMILAR ARTICLES
2024-01-10 12:36:00
‘พลังของมดลูก’ คือคำที่น่าสนใจอย่างยิ่งจากบทสนทนาของเรากับ ปูเป้-ศศพินทุ์ ศิริวาณิชย์ ศิลปินที่ทำงานกับความเป็นหญิงมาตลอดหลายปี ล่าสุดนี้เธอริเริ่มโปรเจกต์ Vessel ชักชวนใครก็ตามที่นิยามตัวเองว่าเป็นหญิง มาร่วมค้นหาบางสิ่งบางอย่างภายใต้คำนิยามนี้และถ่ายทอดมันออกมาผ่านการเคลื่อนไหวร่างกาย
2024-07-10 18:11:00
At the heart of luxury brand strategy is storytelling - crafting a narrative that speaks to the heritage, craftsmanship, and exclusivity of the brand.
2024-01-10 12:47:00
"I Say Mingalaba, You Say Goodbye," a play that recently graced the stage during the Bangkok International Performing Arts Meeting (BIPAM) in March 2023, comes from the talented Thai playwright and director, Jarunun Phantachat.
2024-08-01 11:18:00
From narration that sends shivers down your spine to vocal performances that make animated characters leap off the screen, voiceovers are the invisible inks that takes movies from good to unforgettable.
2024-08-08 12:33:00
Project V is honored to be a local partner of Bangkok International Children’s Theatre Festival (BICT Fest) 2024.
Get in touch
mail PROJECTV@EQHO.COM
© 2024 Project V. All rights reserved.
V PARTNERS
Your subscription could not be saved. Please try again.
Your subscription has been successful.