| Crying is a unique language of infants and an important way for infants to transmit information.There are abundant psychological and physiological needs in infant crying,therefore the study of infant crying can help people understand the meaning of crying,and then can realize better caring of infants.The subject of this paper comes from the analysis demand for infant crying of a company.The company intends to collect a large number of infant crying data for infant crying analysis,but there are always adult speech contained in the collected crying data,for privacy protection consideration,the company needs to detect adult speech in the audio stream of infant crying and then remove it effectively.According to the company's needs,this paper studies speech detection based on LSTM network and GMM model and LSTM-GMM-RNN model respectively,the study aims at recognizing adult speech from audio stream,which has great practical significance for the protection of user privacy.Taking the infant crying analysis as the research background,this paper focuses on the privacy protection problem during infant crying data collecting,and carries out adult speech detecing study,the detailed research work includes:1)Analyze the company's infant crying database and adult speech database through time-domain waveform and spectrogram;summarize the signal differences between infant crying and adult speech through listening to the audio streams that contain both infant crying and adult speech,and analyze the audio features that are discriminative in distinguishing infant crying from adult speech.2)Five groups of feature set are extracted as audio features,including MFCC,MFCC+energy,MFCC+pitch,PLP and PLP+energy.A deep neural network with two layers of LSTM network structure is constructed and is used as the classification model;speech detection experiments were carried out based on each group of feature set.3)Three different speech detection schemes are constructed based on GMM model: speech detection based on infant crying GMM model,speech detection based on adult speech GMM model,speech detection based on the combination of infant crying GMM model and adult speech GMM model.4)In order to further improve the accuracy of speech detection,the recognition results of RNN network combined with LSTM network and GMM model are proposed for classification and recognition.A speech detection algorithm based on LSTM-GMM-RNN model is proposed.Compared with the detection algorithm based on LSTM network and GMM model,the accuracy of speech detection of this algorithm is greatly improved.The proposed speech detection algorithm based on LSTM network and the proposed speech detection algorithm based on GMM model and a speech detection algorithm based on LSTM-GMM-RNN can detect adult speech from the infant crying audio streams well.After removing adult speech from the infant audio streams,the proposed algorithms can well realize the privacy protection in the data collecting procedure. |