Enhanced output-based perceptual measure for predicting subjective quality of speech

A. E. Mahdi, D. Picovici

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents an enhanced version of a non-intrusive measure for assessment of speech quality of voice communication systems and evaluates its performance. The new measure, which uses only the output of the system, is based on measuring perception-based objective auditory distances between voiced parts of the output (processed) speech whose quality is to be evaluated to appropriately matching references extracted from one of four pre-formulated codebooks, depending on their estimated pitch values. The codebooks are formed by optimally clustering large number of parametric speech vectors extracted from a database of clean speech records. The measured auditory distances are then mapped into equivalent subjective Mean Opinion Scores (MOS). The required clustering and matching process was effected by using an efficient data-mining tool known as the Self-Organizing Map (SOM). The short-time Bark Spectrum analysis is used in order to achieve perceptionbased, speaker-independent parametric representation of the speech. Reported evaluation results show that the proposed enhanced speech quality assessment method provides quality scores that are highly correlated with MOS obtained by formal subjective listening tests.

Original languageEnglish
Title of host publication13th European Signal Processing Conference, EUSIPCO 2005
Pages700-703
Number of pages4
Publication statusPublished - 2005
Event13th European Signal Processing Conference, EUSIPCO 2005 - Antalya, Turkey
Duration: 4 Sep 20058 Sep 2005

Publication series

Name13th European Signal Processing Conference, EUSIPCO 2005

Conference

Conference13th European Signal Processing Conference, EUSIPCO 2005
Country/TerritoryTurkey
CityAntalya
Period4/09/058/09/05

Fingerprint

Dive into the research topics of 'Enhanced output-based perceptual measure for predicting subjective quality of speech'. Together they form a unique fingerprint.

Cite this