Home / Blogs / Text-to-Speech: Is it as good as it sounds?

Text-to-Speech: Is it as good as it sounds?

Copied to clipboard
Published: 22 Nov 2023

In a word — Yes & No!

 

Unlike Speech-to-Text (STT), Text-to-Speech (TTS) holds a clear advantage in that it generates a computer-generated voice from a non-ambiguous source. Assuming the text is correct, the “voice” will reproduce it more or less accurately. The same cannot be said for Speech-to-Text, where errors routinely occur for a variety of reasons, such as the speaker’s accent, talking speed, slurring of words, and so on.

 

However, the devil’s in the details. There are challenges in Text-to-Speech, even when the source text is perfect… As text, that is, such as:

  • foreign words
  • mispronunciation of names
  • abbreviations / acronyms
  • dates, times & measures
  • homonyms

 

There are workarounds for all of the above, but like Machine Translation, Speech-to-Text, and Optical Character Recognition (OCR), the time & manpower required often negate the very utility one hopes to realize by using these technologies in the first place. Much depends on the desired end use in order to rationalize such investment.

 

For example, applying Text-to-Speech to a 300-page novel might well be worth the investment, but certainly a waste of time for a short document or web page. And even with the added costs, it will still sound less-than-human regardless of the latest improvements. Besides, Text-to-Speech generated audio books might just not be a very pleasant user experience.

 

 

That said, Text-to-Speech is still a great tool regardless of its drawbacks. It may not be the perfect substitute for humans, at least not yet, but it still fills a gap that would otherwise leave certain audiences with no means whatsoever to access the written word. And if the intonation is a little off, or certain words are mispronounced, the user is still happy enough with the results. The advancement in human-sounding voices only enhances the experience further, which will only get better over time. And even better, the apps for TTS are often free, with customizable voices to boot!

 

Today, the most common users of Text-to-Speech include…

  • readers / listeners on the go (audiobooks)
  • the visually impaired
  • non-native speakers who can understand but cannot read a foreign language
  • the speech impaired, to deliver their message
  • low budget eLearning courses

 

Here at EQHO, we’ve definitely profited from Speech-to-Text technology as we are often tasked with translating videos where no script is provided. In the past, the audio would have to be transcribed manually, but today we use the latest software to transcribe, then a human review as there are always errors. Still, it’s a great technology that helps us a lot in both time & costs.

 

However, the same cannot be said for Text-to-Speech. By the time we correct for intonation, apply rules for abbreviations, acronyms, homonyms, etc., we can provide a professional human voice just as easily & quickly at competitive prices. As the technology improves over time, humans may well become obsolete… But we’re not there yet — not by a long shot!

 

*Photo courtesy of Pexels.

SIMILAR ARTICLES
2023-11-22 15:11:00
The internet has become an enormous shop window for anyone with products and services to sell in recent times. It started off with banner advertisements and pop-ups as far back as the late 1990s, but marketing has now evolved into something much more sophisticated. Among the biggest aids to this has been the advent of video advertising.
2024-01-10 13:04:00
พูดคุยกับ ปูเป้–ศศพินทุ์ ศิริวาณิชย์ Artistic Director ของ BIPAM ไม่ใช่เรื่องห้องซ้อมที่ไม่เอื้อต่องานสร้างสรรค์เพียงเหตุผลเดียวเท่านั้น แต่สาเหตุของมันต่างหากที่เราอยากถามว่าเชื่อมโยงกับการมีอยู่ของ BIPAM หรือเปล่า แล้วเหตุใดศิลปิน performing arts จึงต้องรวมตัวกันโดยนัดหมายเพื่อแลกเปลี่ยนแนวคิด แรงบันดาลใจ และสร้างงานร่วมกัน
2023-11-22 14:21:00
Voiceover and audiovisual is a powerful communication channel for businesses looking to attract customers and engage with employees. It allows firms to speak directly to people on a more personal level and engage with them in ways not always possible through standard text-based communications.
2024-08-01 11:18:00
From narration that sends shivers down your spine to vocal performances that make animated characters leap off the screen, voiceovers are the invisible inks that takes movies from good to unforgettable.
2024-01-10 12:25:00
นิทรรศการ “Crossing the Lines: แรงงานข้ามเส้น” นิทรรศการกลุ่มที่ออกแบบและพัฒนาโดยผู้ที่ผ่านการคัดเลือกโครงการ Start! Art Curator รุ่นที่ 2 โครงการส่งเสริมภัณฑารักษ์รุ่นใหม่เรียนรู้การสร้างสรรค์นิทรรศการ ผ่านประสบการณ์ทำงานร่วมกันกับทีมงานมืออาชีพ กับผลงานนิทรรศการกลุ่มที่นำเสนอเรื่องราวหลากหลายแง่มุมของผู้ใช้แรงงาน
Get in touch
mail PROJECTV@EQHO.COM
© 2024 Project V. All rights reserved.
V PARTNERS
Your subscription could not be saved. Please try again.
Your subscription has been successful.