As a new entry point for changing human-computer interaction,voice interfaces have become an indispensable component in personal devices(such as smartphones and voice assistants).Voice interface devices primarily use microphones to convert changes in air pressure into electrical signals to sense sound,while single microphone devices have difficulty picking up sound in noisy environments and many limitations such as voice replay risk and cocktail party issues.In this thesis,we propose VoLe,a transfer learning-based mm Wave acoustic sensor system.This system,which primarily uses mm Wave radar probes to sense vibrations in the throat of the target speaker and recover the speech content in the event of auditory obstruction,addresses the limitations of conventional microphone operation and enriches the aforementioned diversity of audio-aware devices.The main research content of the paper can be summarized as follows:Firstly,a mathematical model is used to study the internal relationship between audio signals and mm Wave signals,and a dynamic clutter removal method based on variational mode decomposition is proposed to enable VoLe to extract micron-level speech information features from the mixed-noise mm Wave signals.Secondly,VNet is used to accurately reconstruct a time-domain speech signal from a limited frequency band RF signal.The speech waveform is synthesized from the estimated vibrational representation by a modified neural network vocoder without phase feature loss.In addition,the idea of transfer learning is added to reduce the dependence of the model training on local data and significantly improve the generalization ability of the model.Finally,the VoLe system is implemented on commercial radar devices.Quantitative and qualitative evaluations show that VoLe can achieve high-quality sound source separation and speech enhancement in different scenes,different distances,different angles and different sound loudness scenarios. |