Local Self-Attention CTC-Based Speech Recognition

Posted on:2022-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Deng

Full Text:PDF

GTID:2518306320990749

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,the research of automated artificial intelligence is increasing.For example,automatic speech recognition technology,which is closely related to real life,is widely used in people's lives.The acoustic model in automatic speech recognition is the foundation,and it is also a breakthrough point in improving the performance of the model.Therefore,it has become the focus of various scholars' research.Among them,the acoustic model based on the attention mechanism is the most core content.In speech recognition,problems such as pauses of different lengths and differences in individual speech speed often occur.This type of problem is called the problem of difficulty in alignment.Connected temporal classification method(CTC)is an effective method to solve the problem of difficult alignment between input data and output data.Based on the automatic speech recognition model of connection temporal classification based on self-attention mechanism,this paper proposes two methods to improve the accuracy of long input data.One is the connection temporal classification(LSA-CTC)model based on the local self-attention mechanism,which applies a slider mechanism on the self-attention layer without any deep neural network layer,and uses a fixed-length block size in the input data Stack multiple delay layers on top to obtain a larger receiving field to effectively model the long-term spatio-temporal environment to achieve real-time decoding function,and apply the Word Piece model and planned sampling two enhancement methods to this model;Another method is the connected temporal classification of the self-attention mechanism of the Gaussian kernel(GSA-CTC),which adds several layers of convolutional neural networks before sending the data to the self-attention layer,and introduces Gaussian in the self-attention layer The core uses its invariant displacement feature to control the relative position and uses the frame index technology to connect the bare frame index with the input feature to improve the accuracy of long input data.On the data set AISHELL-1,we compare the proposed LSA-CTC model with other excellent models.Experiments show that the character error rate of LSA-CTC is lower than other models.The two mentioned above are applied to the proposed model.The improvement method is verified by the public data set,and the results of this model are better than other comparison models;we compared the differences between GSA-CTC and other top models on the Japanese data set,and the data results prove that the model we proposed is no matter in the long input data.It has better performance on short input data.Both of the above two methods have improved the solution of the difficult alignment problem.

Keywords/Search Tags:

Acoustic Model, CTC, Self-Attention Mechanism, Gaussian Kernel

PDF Full Text Request

Related items

1	Researches On Linear, Kernel Gaussian Models And Their Mixtures
2	Research Of The GMM-HMM Based Acoustic Models
3	Research On Language IDE NT Ification
4	Research And Implementation Of Image Super-resolution Based On Blur Kernel Estimation And Attention Mechanism
5	Parameter Selection For The Gaussian Kernel And Construction Of Orthogonal Polynomial Kernels
6	Research On Model Selection Of Support Vector Machine
7	Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks
8	Attention Mechanism Based Deep Network For Human Action Recognition In Video
9	Research Of Kernel Methods For Support Vector Machine And Multiple Kernel Clustering Algorithm
10	Research On 3D Mesh Model Segmentation Method Based On Wave Kernel Feature And Gaussian Mixture Model