Font Size: a A A

Research And Implementation Report-oriented Voice Activity Detection

Posted on:2023-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2558306914473234Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Voice is the most common,convenient and effective way of human communication.In recent years,with the progress of science and the rapid development of Internet technology,human-computer interaction products are becoming more and more extensive,and the intelligent level of people’s life is getting higher and higher.Voice has become an important way of communication between human and machine,accounting for an increasingly high proportion of human-computer interaction.Speech processing technology is the basis of voice interaction,giving computers the ability of detecting,processing and replying to voice commands.Voice activity detection(VAD)is the front-end part of speech processing technology,which aims to detect the starting and ending point of the speech part in continuous speech signal and provide correct speech fragments for subsequent processing,and plays an important role in improving the performance of the speech system.Based on the work report speech reporting system of a company,in the study of work reports,work reports are often mixed with Chinese and English sentences with professional vocabulary.In view of the above characteristic,this paper studies the report-oriented VAD algorithm.The main works are as follows:Firstly,a report speech data set mixed with Chinese and English is developed for work report.Then,an improved joint algorithm based on combined feature is designed.There is a denoising autoencoder(DAE)model based on convolutional Long Short-Term Memory(CLSTM)as speech enhancement module,learning the mapping relationship between noisy and pure speech and exporting enhanced feature.The initial feature is combined with the output of the encoder part,and are input into the VAD module for classification,which is based on the convolutional,Long ShortTerm Memory,fully connected deep neural network(CLDNN).The two modules are trained jointly by dynamic weighted average method.The contrast experimental results show that the proposed algorithm has some advantages in classification accuracy and generalization ability.The ablation experiment results show that both the combination of the initial feature and the intermediate feature of the encoder,and the method of dynamic weighted average contribute to the optimal performance of the model.Finally,based on the improved model,this paper designed a reportoriented VAD auxiliary marketing system,which implements report speech endpoint annotation function through manual annotation assisted by model pre-annotation.In this paper,the classification ability of VAD algorithm is improved by improving the existing joint algorithm,that it has certain reference significance for the subsequent research.Based on the improved model,a report-oriented VAD auxiliary annotation system is designed,implementing the endpoint detection of work report speech and meeting the requirements of subsequent functions of the company’s report system.Through various functional tests,it is proved that the system has certain practical value.
Keywords/Search Tags:voice activity detection, speech enhancement, dynamic weight average, auxiliary marketing system
PDF Full Text Request
Related items