Font Size: a A A

Sensitive Information Processing Within Audio Based On Speech Recognition Technology

Posted on:2024-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2568306917491714Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet,it has become very easy to spread highly sensitive information about pornography,gun-related and explosive,reactionary and political information on the Internet.While the Internet provides a large amount of learning materials to the majority of young people,it also brings hazards to the healthy growth of young people that cannot be ignored.Some anchors may unconsciously utter some uncivilized and unhealthy words during voice broadcasting,and even some unscrupulous people spread bad information by means of embedding sensitive words in the voice.Compared with the sensitive information of simple text type,the dissemination in such a way is easy to be ignored by everyone,so it is very necessary to use technical means to prevent the dissemination of these sensitive audio.Most of the current literature focuses on how to identify and filter sensitive words appearing in text or web pages,and less literature will study how to handle sensitive information in speech.The existing literature focuses on unilaterally studying speech recognition or sensitive information filtering and detection.This thesis combines these two aspects together,targeting audio on the Web and in life,detecting the presence of sensitive information in it,and classifying these audios as sensitive,followed by processing sensitive audios with the aim of reducing the propagation of sensitive audios.The main research of this thesis is as follows:In this thesis,we use Paddle Speech model for speech recognition of audio.In some cases with noisy environment and large background sound,the text recognition effect of this method will be affected,therefore,in this thesis,before using Paddle Speech for speech recognition,we add a speech enhancement step and use spectral subtraction and wavelet threshold noise reduction for some audio with complex background sound processing,a simple improvement of the Paddle Speech model is performed to improve the recognition accuracy.The results show that the word accuracy of the improved Paddle Speech speech recognition model is improved by 0.66%,and the word accuracy is3.36% higher compared with the commonly used Speech Recognition method in Python,reaching 98.14%.From the results,Paddle Speech’s improved speech recognition accuracy has improved.After speech recognition of audio,text words are created based on the recognized text content,and naive Bayesian is used to sensitively classify the text as well as the audio.The results show that the accuracy,recall,and F1 scores of sensitive classification of text using naive bayes are 95.23%,94.16%,and 93.68%,respectively,which are 2.39%,1.85%,and 1.15% higher than the KNN algorithm;and the values of the three indicators are 1.87%,2.33%,and 1.26% higher than the logistic regression.The accuracy rate of sensitive audio classification is 96.25%.On the whole,Naive Bayesian algorithm has a better classification performance.For sensitive audio,AC automata,DFA filtering algorithm and pyttsx text-to-speech technology are combined to replace sensitive information within the audio and generate new speech without sensitive words.Compared with complex speech encryption techniques,this method is simple in idea and effective.the accuracy of sensitive word filtering by AC automaton is 5.27% higher than that of DFA algorithm,reaching 95.87%,so AC automaton is chosen for sensitive word filtering,while pyttsx can accurately convert the filtered non-sensitive text into non-sensitive audio,which effectively reduces the spread of sensitive audio and is important for protecting healthy and civilized Internet environment is of great significance.
Keywords/Search Tags:speech recognition, sensitive information, PaddleSpeech, Naive Bayes, sensitive words filtering
PDF Full Text Request
Related items