Font Size: a A A

Research On Silent Speech Recognition Of Surface Electromyography Based On Transformer

Posted on:2024-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:R SongFull Text:PDF
GTID:2530306932955789Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Silent speech recognition(SSR)refers to decoding speech intention using nonacoustic signals such as electroencephalogram,electromyogram or motion information of various articulatory organs generated in the process of human pronunciation activities,to realize silent communication.As a complementary modality of speech recognition,this technology can effectively overcome noise interference and ensure the privacy of communication.In addition,it provides a possible communication way for patients with speech disabilities.Surface electromyography(sEMG)is a kind of electrophysiological signal recorded from muscle contractions via surface electrodes.The non-invasive manner and low sensitivity to noise interference allow wide applications of the sEMG technique.Therefore,the SSR can be implemented based on sEMG to decode the speech content by collecting sEMG signals generated by the face and neck muscles in the pronunciation process.It provides an alternative and efficient solution for silent communication.The sEMG-based SSR studies on limited number of isolated words have been widely investigated over the years.However,these studies have been conducted by classifying isolated words using machine learning or deep learning algorithms with the lack of description of the long-or short-term sequential semantic information between and within words,and they cannot meet demands for continuous language communication.Although a few researchers have developed some methods for sequence decoding based on hybrid models to perform continuous and natural SSR,there were still some limitations including difficulty in the design of hybrid models,inability to make joint training and optimization of all sequential modules and the challenge of phoneme alignment.To overcome these problems,this thesis first presents a method for sEMG-based SSR through sequential decoding at the Chinese character level using Transformer.The proposed method was able to characterize sequential semantic information from the sEMG signals sufficiently to decode the sEMG signals into character-level sequences accurately.On this basis,a streaming SSR method using Transformer-Transducer with high performance and low time delay was then designed to promote the practical application of the sEMG-based SSR.The main contents and achievements of this thesis can be summarized as follows:1.A sequential decoding method based on Transformer for sEMG-based SSR was proposed.According to the physiological and anatomical structure of the muscles related to articulation in face and neck,a flexible high-density sEMG electrode array was designed to record sEMG signals in 64 channels generated by the muscle activities during silent speech.Then 33 representative phrases from daily application scenes were selected to form a corpus.A total of 8 subjects without any speech disabilities participated in the sEMG data recording experiments.All phrase-level sEMG samples were obtained by data preprocessing.Each sEMG sample was translated into a sequence of time frames through a feature sequence vectorization module,and the sequence was fed into a Transformer model to obtain sequential character-level decisions.These character-level decisions were tuned to approximate natural language as the final character sequence by an optimization module containing a language model.The proposed method achieved the lowest character error rate of 5.14 ± 3.28%and the highest phrase recognition accuracy of 96.37 ± 2.06%,significantly superior to other common classification methods and sequence decoding methods.The experimental results demonstrated that the proposed method is effective for sEMG-based SSR.2.A streaming SSR method based on Transformer-Transducer was investigated.The definition of streaming SSR is that the recognition results can be immediately feedback during the processing of sEMG data stream.On the basis of the Transformerbased decoding method for SSR mentioned above,a streaming SSR method was further designed based on Transformer-Transducer.More specifically,a phrase-level sEMG sample was first reconstructed into a sequence of time frames through a feature sequence vectorization module.Then,the sequence was segmented into small chunks with a fixed number of frames and these chunks were sequentially fed into the Transformer-Transducer model in a streaming manner.The final character sequence was streamed out by the Transformer-Transducer model.In particular,we explored the effect of limiting the scope of attention context on recognition performance.The experimental results illustrated that the proposed algorithm can provide significant performance improvements in character-level decoding and phrase recognition,and it can also achieve a balance between high performance and low delay.This study demonstrated the feasibility of the proposed algorithm and guided the design and implementation of a real-time SSR system.
Keywords/Search Tags:surface electromyography, silent speech recognition, sequential decoding, Transformer, Transducer
PDF Full Text Request
Related items