At the end of 2019,a new coronavirus(COVID-19)suddenly broke out and quickly spread all over the world,which has brought severe challenges to the health of all mankind.The COVID-19 virus is highly contagious,and all countries have invested a lot of human and material resources to prevent and control the epidemic.As an important step to control the epidemic,how to detect the disease more efficiently,economically and safely has attracted a broad research interest from all over the world.In medicine,the commonly-used and effective method is nucleic acid detection,which needs to be carried out at the designated place.With the development of artificial intelligence,the use of speech recognition,image recognition and other technologies to assist medical staff in diagnosing diseases has also been widely used in recent years.Using audio signal to detect COVID-19 can be operated remotely,avoiding cross infection in the detection process,and the detection time is shorter,which has become an important research direction.Currently,the acoustic COVID-19 detection systems mainly have two limitations:1)the amount of available COVID-19 audio data for training neural networks is rather limited,and 2)the environmental background noises recorded by microphones heavily affect the sound quality as well as the detection accuracy.In order to overcome these limitations,in this thesis we therefore propose a pre-training and signal enhancement based COVID-19 detection system using audio signals,where the pre-training method aims at a better utilization of low-resource COVID-19 sound data and the multichannel Wiener fiter(MWF)based signal enhancement is proposed to improve the sound quality as a front end.The deep learning based COVID-19 classification requires a large amount of sound data for network training,which however is rather limited till now.In order to make full use of the available data and improve the modeling efficiency of networks,we propose a COVID-19 detection method by combining supervised pre-training and self-supervised pre-training.Applying the pre-trained model,high-level representations of the COVID-19 audio signals can be learned.Experimental results on the DiCOVA dataset show that the proposed method can achieve the best detection performance for speech and multimodal signals and the suboptimal performance for respiratory signals.As in noisy environments the well-known MWF is a commonly-used noise reduction method,where however the performance is heavily dependent on the estimation of covariance matrices and the reference microphone assignment,in this thesis we therefore first theoretically analyze the relation of the output signal-to-noise ration(SNR)to the rank of signal covariance matrix and the linear relation of the SNR gain to the input SNR gap between microphone pairs.Based on this,we then propose a reference microphone selection method by maximizing the input SNR,which has a much lower time complexity compared to existing reference microphone selection approaches.Experimental results show that the COVID-19 detection performance seriously drops under noisy conditions,which can be significantly improved by using the proposed noise reduction method. |