Font Size: a A A

Research And Implementation Of Non-specific Human Speech Recognition For Service Robots

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:X XuFull Text:PDF
GTID:2428330614458605Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
With the proposal of ‘Made in China 2025' and the development of artificial intelligence technology,a growing number of service robots have entered people's lives and daily production,playing an extremely important role.Among the technologies related to service robots,speech recognition has become one of the most critical technologies.In recent years,the application of deep learning in speech recognition has become a hot topic for research.This thesis focuses on the application of voice-controlled service robots.First,the current method of speech noise reduction is studied and improved.Second,the end-toend speech recognition model based on the connectionist temporal classification is studied and improved.Last,a speech-controlled service robot system is constructed and implemented,and the feasibility and practicability of the system are verified in a real environment.In the real speech environment,the SNR will decrease,so the noise reduction effect of traditional spectral subtraction and Wiener filtering will be worse,resulting in the problem of noise residue and speech distortion,the thesis proposed a new method based on autoencoder Wasserstein generative adversarial network.In this method,the real noisy speech is passed through the generator in the network,and the discriminator and generator in the network are used to assist each other until the generator generates clean speech.The result of the study shows that the method based on Auto Encoder Wasserstein Generative Adversarial Network can effectively improve the performance of the speech noise reduction in real speech environment,and the generated clean speech has improved speech quality and intelligibility.The thesis also studied the deep convolutional neural networks(DCNN),which is mainly composed of the stacked CNN.With the increased number of network layers,there are issues of gradient disappearance and network performance degradation in the model.To solve those issues,the thesis proposed an improved residual networks bidirectional long short term memory(Res Net-BLSTM)model,which takes the feature of spectrogram as input,introduces residual network and bidirectional long short term memory network,and enables itself to learn the contextual information of speech.The result of the study shows that compared to the DCNN model,the Res Net-BLSTM model reduces the word error rate by 2.52% in the experiment using Chinese context,and has more general application and improved robustness.The thesis built a speech control service robot system on Jetson Nano,and the recognition rate of the corresponding speech commands is tested in the real speech environment to verify the robustness and practicability of the system.
Keywords/Search Tags:service robots, speech denoising, speech recognition, generative adversarial networks, DCNN
PDF Full Text Request
Related items