Font Size: a A A

Research On Endogenous Security In Speech Recognition

Posted on:2024-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:W S ZhaiFull Text:PDF
GTID:2568306908483484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning and neural network,the application of automatic recognition is becoming more and more widespread,which brings great convenience to people’s life.Among the various applications of automatic recognition,speech recognition technology plays an important role in human-computer interaction.Speech recognition is reshaping the way of human-computer interaction,while many technology companies have launched speech recognition products,and the field of speech recognition is flourishing.But the emergence of adversarial samples poses a huge challenge to speech recognition,which has become less secure.The attacker only needs to make some small perturbations to the voice,these perturbations,though not perceptible to the human ear,can make the speech recognition system recognize the wrong result,and there is a very big security risk.At present,the research on adversarial samples is mainly focused on the image domain,and the research on adversarial samples in the speech domain is still relatively small,and the related research is not mature enough and needs to be improved.The research on speech adversarial samples mainly focuses on the generation of adversarial samples,the defense of adversarial samples and the detection of adversarial samples.In this paper,we study the generation of adversarial samples and the defense of adversarial samples:firstly,we propose an adversarial training algorithm for the speech domain;secondly,we modify the traditional genetic algorithm and propose a three population parallel genetic algorithm to generate adversarial samples for speech.The main work and innovation of this paper are as follows:(1)A new adversarial training method is proposed to improve the basic iterative method(BIM)adversarial training in the speech domain by analyzing the mechanism of updating the model on input parameters and model weights in the adversarial training,which simplifies the traditional BIM adversarial training from two steps of first forming speech adversarial samples and then conducting adversarial training to one step.Since the total number of iterations is the same as the normal training of a speech recognition model,this reduces the computational consumption.The experiments use a Chinese speech recognition model and a Chinese dataset,and the experimental results show that the adversarial training time of the model is shortened to 1/8 of the BIM adversarial training time in the normal case,and the resistance of the model to attacks is also enhanced compared to the BIM adversarial training.(2)The traditional genetic algorithm is improved to generate adversarial samples for the first time in the speech domain using three populations of parallel genetic algorithms.In this algorithm,a main population and two auxiliary populations are set up,and each of the three populations performs genetic operations independently.After a round of genetic operations,the two auxiliary populations pass the best individuals in their own populations to the main population,and the main population replaces the inferior individuals in the population with the received individuals,so that the population evolution is carried out while the number of individuals in the population remains stable.Meanwhile,in order to broaden the spatial search range of the algorithm,two auxiliary populations adopt different search strategies,one of which adopts a large crossover probability and a small variation probability,and the other adopts a small crossover probability and a large variation probability.The Speech Commands Classification model was used,and after several experiments to adjust the parameters,the success rate(i.e.,attack success rate)of the adversarial samples generated by the three population genetic algorithms reached 99.6%,and the quality of the generated adversarial samples was also high,which was difficult to be distinguished by the human ear.
Keywords/Search Tags:Speech recognition, Speech adversarial samples, adversarial training, three populations of parallel genetic algorithm
PDF Full Text Request
Related items