Language impacts our daily lives and plays a critical role in expressing our feelings and desires to the world around us.Learning a second language enables people to gain a more profound understanding of diversified cultures.However,traditional learning approaches suffer from several limitations,such as insufficient qualified teachers and lacking of teaching flexibility in location and time.To address the above limitations,CALL(Computer-Assisted Language Learning)has served as a popular teaching tool in second language learning over recent years.An important challenge in CALL is how to accurately assess the quality of speech using acoustic techniques.In this study,we investigate assessment methods of both the pronunciation and the prosody quality.In particular,(i)for the pronunciation assessment,we investigate the limitations of existing assessment methods based on GOP(Goodness Of Pronunciation)and then validate the effectiveness of using the whole set of posterior from the acoustic model as the evaluation metric.(ii)For the prosody assessment,we propose two new algorithms,namely,DTW-based F0 similarity evaluation and alignment-based break similarity evaluation to explore the tone and pause of prosody,respectively.This work proposes a Deep Neural Network framework for both pronunciation and prosody assessment.The developed framework uses the whole set of posterior and GMM(Gaussian mixture model)-smoothed prosody similarity as input features for pronunciation and prosody,respectively.Experimental results demonstrate the effectiveness of the proposed framework that is capable of capturing the inherent relationships between nativeness degree and the identified features. |