TY - GEN
T1 - A vocal tract visualisation tool for a computer-based speech training aid for hearing-impaired individuals
AU - Mahdi, Abdulhussain E.
PY - 2008
Y1 - 2008
N2 - This paper describes a computer-based software prototype tool for visualisation of the vocal-tract, during speech articulation, by means of a mid-sagittal view of the human head. The vocal tract graphics are generated by estimating both the area functions and the formant frequencies from the acoustic speech signal. First, it is assumed that the speech production process is an autoregressive model. Using a linear prediction analysis, the vocal tract area functions and the first three formants are estimated. The estimated area functions are then mapped to corresponding mid-sagittal distances and displayed as 2D vocal tract lateral graphics. The mapping process is based on a simple numerical algorithm and an accurate reference grid derived from x-rays for the pronunciation of a number English vowels uttered by different speakers. To compensate for possible errors in the estimated area functions due to variation in vocal tract length between speakers, the first two sectional distances are determined by the three formants. Experimental results show high correlation with x-ray data and the PARAFAC analysis. The tool also displays other speech parameters that are closely related to the production of intelligible speech and hence would be useful as a visual feedback aid for speech training of hearing-impaired individuals.
AB - This paper describes a computer-based software prototype tool for visualisation of the vocal-tract, during speech articulation, by means of a mid-sagittal view of the human head. The vocal tract graphics are generated by estimating both the area functions and the formant frequencies from the acoustic speech signal. First, it is assumed that the speech production process is an autoregressive model. Using a linear prediction analysis, the vocal tract area functions and the first three formants are estimated. The estimated area functions are then mapped to corresponding mid-sagittal distances and displayed as 2D vocal tract lateral graphics. The mapping process is based on a simple numerical algorithm and an accurate reference grid derived from x-rays for the pronunciation of a number English vowels uttered by different speakers. To compensate for possible errors in the estimated area functions due to variation in vocal tract length between speakers, the first two sectional distances are determined by the three formants. Experimental results show high correlation with x-ray data and the PARAFAC analysis. The tool also displays other speech parameters that are closely related to the production of intelligible speech and hence would be useful as a visual feedback aid for speech training of hearing-impaired individuals.
KW - Articulatory training
KW - Linear prediction coding
KW - Speech production
KW - Vocal tract models
UR - http://www.scopus.com/inward/record.url?scp=55649096859&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:55649096859
SN - 9789898111180
T3 - BIOSIGNALS 2008 - Proceedings of the 1st International Conference on Bio-inspired Systems and Signal Processing
SP - 153
EP - 158
BT - BIOSIGNALS 2008 - Proceedings of the 1st International Conference on Bio-inspired Systems and Signal Processing
T2 - BIOSIGNALS 2008 - 1st International Conference on Bio-inspired Systems and Signal Processing
Y2 - 28 January 2008 through 31 January 2008
ER -