Font Size: a A A

Local Self-Attention CTC-Based Speech Recognition

Posted on:2022-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Z DengFull Text:PDF
GTID:2518306320990749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the research of automated artificial intelligence is increasing.For example,automatic speech recognition technology,which is closely related to real life,is widely used in people's lives.The acoustic model in automatic speech recognition is the foundation,and it is also a breakthrough point in improving the performance of the model.Therefore,it has become the focus of various scholars' research.Among them,the acoustic model based on the attention mechanism is the most core content.In speech recognition,problems such as pauses of different lengths and differences in individual speech speed often occur.This type of problem is called the problem of difficulty in alignment.Connected temporal classification method(CTC)is an effective method to solve the problem of difficult alignment between input data and output data.Based on the automatic speech recognition model of connection temporal classification based on self-attention mechanism,this paper proposes two methods to improve the accuracy of long input data.One is the connection temporal classification(LSA-CTC)model based on the local self-attention mechanism,which applies a slider mechanism on the self-attention layer without any deep neural network layer,and uses a fixed-length block size in the input data Stack multiple delay layers on top to obtain a larger receiving field to effectively model the long-term spatio-temporal environment to achieve real-time decoding function,and apply the Word Piece model and planned sampling two enhancement methods to this model;Another method is the connected temporal classification of the self-attention mechanism of the Gaussian kernel(GSA-CTC),which adds several layers of convolutional neural networks before sending the data to the self-attention layer,and introduces Gaussian in the self-attention layer The core uses its invariant displacement feature to control the relative position and uses the frame index technology to connect the bare frame index with the input feature to improve the accuracy of long input data.On the data set AISHELL-1,we compare the proposed LSA-CTC model with other excellent models.Experiments show that the character error rate of LSA-CTC is lower than other models.The two mentioned above are applied to the proposed model.The improvement method is verified by the public data set,and the results of this model are better than other comparison models;we compared the differences between GSA-CTC and other top models on the Japanese data set,and the data results prove that the model we proposed is no matter in the long input data.It has better performance on short input data.Both of the above two methods have improved the solution of the difficult alignment problem.
Keywords/Search Tags:Acoustic Model, CTC, Self-Attention Mechanism, Gaussian Kernel
PDF Full Text Request
Related items