| Laser coherent speech detection systems are based on the measurement of Doppler vibrations.They reconstruct speech signals by measuring the small displacement in surrounding objects caused by sound vibrations.Compared with traditional electronic microphone approaches that are based on sound pressure detection,laser coherent speech detection systems have the advantage of not needing direct contact,being able to detect over long distances and being easy to conceal.They have particular application value in the fields of information acquisition and national security.However,as the measurement signal emitted by laser coherent speech detection systems is easily affected by both the equipment itself and the detection environment,detected coherent signal is typically full of noise.This leads to the difficulty of coherent signal demodulation and the poor quality of detected speech.Especially in the case of long-distance detection,the measured laser echo is very weak.The intelligibility and clarity of the detected speech therefore be of very low quality,making it hard to understand the speech and obtain the required information.In order to solve the problems of low demodulation accuracy of coherent signal and poor quality of detected speech in long-distance laser coherent speech detection,two key technologies of coherent signal demodulation and speech enhancement are deeply studied.The main research contents and innovations are as follows:(1)Based on the working principles and compositions of the longdistance laser coherent speech detection system,the mathematical expression of a digital coherent signal output is deduced.The all-digital Hilbert arctangent,Hilbert differential cross phase multiplication,I/Q arctangent and I/Q differential cross phase multiplication methods that can run in real time are given.Then the most suitable method for the longdistance laser coherent speech measurement systems is studied.Simulations and physical experiments both show that the chosen I/Q tangent demodulation method has low sampling rate needed,good realtime performance,high precision and strong robustness against noise.Thus,it is well able to meet the needs associated with the real-time demodulation of coherent signals in long-distance laser coherent speech measurement systems.(2)The noise in detected speech is analyzed and classified.According to the characteristics of random location and short duration of impulsive noise,a method of impulsive noise location and bilateral interpolation based on Linear Predictive Coding(LPC)is proposed.To locate the impulsive noise,detected speech signal is preprocessed using LPC decorrelation operation.This weakens the influence of background noise and the speech where the impulsive noise is located,and increases the noise signal ratio of impulsive noise in the signal,which significantly improves the location accuracy of impulsive noise.A bilateral LPC interpolator is then proposed to encode the bilateral samples not polluted by impulsive noise and to replace the noisy samples.The bilateral LPC interpolator makes full use of the correlation of bilateral samples before and after the impulsive noise to significantly improve the accuracy of interpolated speech.Experiments show that the location accuracy of the LPC based impulsive noise location method is much better than that of traditional methods.The bilateral LPC interpolation method also has higher accuracy of interpolated speech and less distortion of interpolated enhanced speech,which meets the needs of the long-distance laser coherent speech detection system for low SNR detected speech enhancement.(3)According to the characteristics of relatively stable and persistent slow-varying noise,the short-time Discrete Fourier Transform(DFT)coefficient distribution of laser detected speech and slow-varying noise is counted,and the Minimum Mean Square Error(MMSE)spectral subtraction under Laplace distribution is proposed to remove the slowvarying noise.Under the assumption of Laplace distribution,the best estimation of MMSE spectral subtraction parameters is derived.The double threshold discrimination method based on short-time logarithmic energy and short-time zero crossing rate is used to distinguish slowly varying noise and speech interval,which improves the estimation accuracy for the a priori SNR.Experiments show that the proposed Laplace MMSE spectral subtraction has better speech enhancement effect than the traditional over spectral subtraction and MMSE spectral subtraction.Proposed Laplace MMSE spectral subtraction simultaneously enhances the detected speech signal and reduces the distortion,thus meeting the noise reduction requirements for weak long-distance laser detected speech.(4)A speech enhancement method based on Cyclic Generative Adversarial Network(CycleGAN)called SE-CycleGAN is presented that can work on detected speech that has been saved offline.The network structures for the generator and discriminator are also given.By combining the dissimilar characteristics of signals,the generation ability of the generator network is stronger.The SE-CycleGAN adds the identity mapping loss on the basis of adversarial loss and cyclic consistency loss,so that the feature information of noisy speech does not change essentially before and after conversion.It is in the nature of laser coherent speech detection system applications that only a limited amount of noisy detected speech can actually be acquired.As there is also usually no corresponding pure speech,it is impossible to form paired labeled datasets to train traditional deep learning based speech enhancement networks.Proposed SE-CycleGAN solves this problem.Experiments show that when the proposed model is trained on unpaired speech training sets,it performs better than traditional speech enhancement methods,and shows a strong ability of noise suppression.The proposed SE-CycleGAN offers a way for speech enhancement and data generation to proceed in the absence of paired labeled training sets.(5)For the coherent signal with very weak and very low signal-tonoise ratio,a novel laser speech phase demodulation method based on Generative Adversarial Networks(PDGAN)is proposed.The PDGAN demodulates the enhanced speech signal directly from an optical coherent signal that is polluted by noise by establishing a model that works between an originally detected coherent signal and a pure speech signal.In this way,the complex coherent signal demodulation and speech enhancement processes are integrated into a single network.The proposed PDGAN uses a "U" shaped generator as demodulation network to fuse the high-order and low-order features of coherent signals,giving a more detailed feature extraction.The generator network also adds additional information to to generate samples with specific properties.The loss function adopts the least square loss function and adds the sparse factor to control the gap between the demodulated speech signal and the pure speech,making better coordination between the discriminator and the generator,and making the training process more stable.Experiments show that PDGAN can demodulate speech signal from very low noisy coherent signal,and still offer better demodulated signal quality than traditional approaches to demodulation with enhanced speech signals.This research meets the demodulation requirements of coherent signals with low SNR over long distances,and can form the basis of new approaches to coherent signal demodulation in the field of precise speech detection and measurement. |