Communications using a speech-To-Text-To-speech pipeline

Rafael Dantas Lero, Chris Exton, Andrew Le Gear

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we discuss the feasibility of using an automated speech-To-speech pipeline to encode voice samples instead of regular voice codecs, in situations that require high data compression with high packet loss scenarios.To analyse the advantages of using a speech-To-Text transcription as a voice encoder and a text-To-speech synthesis as a decoder and compare it against standard a PCM A-law codec, we have measured the error rate of user-Transcribed sentences based on the Semantically Unpredictable Sentences (SUS) test.Some of the PCM speech samples were also played in a way to simulate poor network conditions, specifically 5% and 10% packet loss. These were added to compare the performance of the speech-To-Text method to standard voice codecs.Additionally, we have evaluated how similar the transcribed messages were to the ground truth by measuring the Levenshtein distance between the sentences and also their Double Metaphone phonetic representations.We conclude it may be feasible to use speech-To-Text as a codec, as the results of do not show a significant difference between the synthesised voice and the real voice.

Original languageEnglish
Title of host publication2019 International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019
PublisherIEEE Computer Society
ISBN (Electronic)9781728133164
DOIs
Publication statusPublished - Oct 2019
Event15th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019 - Barcelona, Spain
Duration: 21 Oct 201923 Oct 2019

Publication series

NameInternational Conference on Wireless and Mobile Computing, Networking and Communications
Volume2019-October
ISSN (Print)2161-9646
ISSN (Electronic)2161-9654

Conference

Conference15th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019
Country/TerritorySpain
CityBarcelona
Period21/10/1923/10/19

Keywords

  • codec
  • mobile
  • speech-To-Text
  • sus
  • text-To-speech
  • VoIP

Fingerprint

Dive into the research topics of 'Communications using a speech-To-Text-To-speech pipeline'. Together they form a unique fingerprint.

Cite this