Font Size: a A A

Research On Intelligent Speaker Recognition Based On Flexible Sensors

Posted on:2024-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:W L ZhengFull Text:PDF
GTID:2568306938951479Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric authentication technology that uses human voice features.Speaker recognition currently suffers from performance degradation and security threats due to noise.Compared with the close-talk microphone,the flexible trmicrophone based on the flexible sensor has the advantages of anti-noise,and compared with the hard throat microphone has the advantages of portability,small size and fit to the human body.It promises to replace both,enabling robust speaker recognition in noisy environments such as battlefields,public places and construction sites.It can also help people with specific dysarthria to complete speaker recognition,thus promoting the health and independence of the disabled.However,speaker recognition based on flexible throat microphones is still in its infancy.According to the research of speech recognition and speaker recognition based on hard throat microphone,speaker recognition with flexible throat microphone faces the problems of lack of data set and data degradation.In order to solve these two key problems,the main work of this thesis is as follows:(1)Study and collect the speech data set based on the flexible throat microphone needed for research.A flexible throat microphone data acquisition system was built to collect the required data set.Firstly,the data acquisition system and data acquisition process are studied,and then the visualization software is designed to cooperate with the data screening mechanism for screening.Flexible Throat Microphone Speech(FTM-S)data set was formed,and data quality test experiments were designed to verify the effectiveness of the FTM-S data set in completing voice applications based on machine learning algorithms.Based on the experimental paradigm of data set construction,a more realistic data set of Flexible Throat Microphone Throat Speech(FTM-TS)is constructed to support the training of the machine learning model in this thesis and provide validation for the subsequent algorithm research through open source.(2)Aiming at the degeneration of Flexible throat Microphone Data,the Flexible Thorat Microphone Data Augmentation(FTMDA)method was proposed.Specifically,by analyzing the data characteristics,the band-pass filtering method of the flexible throat microphone speech signal,the frequency preweighting method of the flexible throat microphone speech signal and the endpoint detection method of the flexible throat microphone speech signal are studied.Finally,a data augmentation quality verification experiment was designed to verify the effectiveness of this data augmentation method in alleviating speech data degradation of flexible throat microphones.(3)Flexible Throat Microphone Supervised Contrastive Learning(FTMSCL)algorithm for speaker recognition is proposed.To further solve the overfitting problem caused by data degradation and lack of data set.This algorithm studies the supervised contrast loss function for speaker recognition of flexible throat microphones and explores the influence of key hyperparameters on the performance of the speaker recognition system of flexible throat microphones.Furthermore,the FTMDA method is combined with four close-talk microphone speech data augmentation methods to study their effects on the performance of FTMSCL algorithm.By analyzing the experimental results of the FTMSCL algorithm compared with traditional and new algorithms,the superiority of the FTMSCL algorithm in solving the problem of data set deficiency and data degradation is verified.(4)Transformer-based Network for Flexible Throat Microphone Speaker Recognition(TFTMSR-Net).The network uses Transformer’s self-attention mechanism to focus on local and global features in the speaker’s features simultaneously.A dual Transformer module is used to extract the location information of the speech data of the flexible throat microphone to focus on the local features,and aggregate the local features to obtain the global features related to the speaker.At the same time,a multi-resolution feature encoder is proposed to obtain the aggregation features with multiple layers of semantic information.According to the experiments on FTM-TS data set,TFTMSR-Net shows good speaker recognition performance.
Keywords/Search Tags:flexible sensor, throat microphone, speaker recognition, speech recognition, deep learning
PDF Full Text Request
Related items