| In the definition of biology,speaking and being able to use tools are the most essential differences between human beings and other animals.Language is also an important communication tool since the birth of human beings.Language is natural and convenient,concise,accurate,efficient and so on.Speech is the external representation of language,and it can directly reflect our thinking activities.;phonetics is produced by human pronunciation organs,carrying very rich information,is an important way of human thought expression and emotional communication;in the process of human exploration of language,phonetics is formed.Phonetics is one of the branches of linguistics.It aims at studying human language and voice.The whisper speech is studied in this paper is one of the branches of phonetics.The pronunciation of whispers is called whisper,also known as whispering.It is one of the most common forms of information interaction in daily life.Compared to the normal way of pronunciation,the sound mode of the whisper is unique.When it is used,the sound band does not produce vibration,but a special type of friction excitation mode.In modern times,because of the continuous improvement and development of communication facilities and technological means,the application of whisper speech is also more extensive,from the first theoretical research to the present practical application.Because of the uniqueness of the pronunciation of the whisper,the way to communicate with the whisper can protect the privacy of the individual well and ensure that it will not interfere with the normal activities of other people[1],so whisper speech has important research significance.This paper focuses on the research of whisper speech detection technology.whisper speech endpoint detection is the preprocessing part of whisper speech recognition,which greatly affects the accuracy of whisper speech recognition.In this paper,the whisper speech is detected by Energy than zero and Epirical Mode Decomposition in the silent and noisy background.In this paper,three kinds of basic acoustic features,pitch,intensity and formant,are presented,there are two possible ways of presenting vowels,one is the normal sounds,and the second is whisper sounds.With the help of machine learning classification algorithms,two kinds of sound effects patterns are distinguished from normal and whisper speech,and comparing 8 classification means get the conclusion K nearest neighbors is the best.The offline training of whisper speech detection web page display software is compiled.The influence of the speech on the detection experiments in different time length segments 3s,4s,5s,6s,and mixing durations is studied and the conclusion that different time length speech segments have no influence on the base of the classification results of the whisper and normal speech is obtained.Finally,we use Baidu API and HTK to realize recognition and isolated speech recognition. |