| With the continuous development of social economy and the continuous improvement of people’s living standards,people pay more and more attention to disease prevention and health monitoring.However,with the increasing demand for convenient and fast health monitoring,the drawbacks of traditional contact-based physiological measurement methods are becoming increasingly apparent.Therefore,video-based remote physiological measurement has attracted more and more attention,and it has great application value in the fields of affective computing,non-contact health monitoring,and telemedicine,etc.The main work of this thesis has the two aspects as follows.In the first work,this thesis proposes a heart rate estimation model based on spatio-temporal map and spatial attention mechanism.Existing research methods mostly use raw videos for physiological signal prediction.Due to the large amount of irrelevant background content in the raw videos,these methods are time-consuming and inefficient.Therefore,this thesis proposes an improved hand-crafted spatio-temporal map,which can effectively represent physiological information by precluding irrelevant background content and color space conversion.Additionally,this thesis also proposes a plug-and-play spatial attention module.It combines the spatial information to enhance the extraction of physiological features by adaptively adjusting the weights of different spatial positions and color channels.Finally,this method simultaneously predicts heart rate and physiological signals,and multi-task learning further improves the robustness and performance of the network model.In the second work,this thesis proposes a heart rate estimation model based on random patch cropping and decomposition reconstruction strategy.The second work is an improvement on the basis of the first work.Existing research methods are generally divided into two groups.The first focuses on mining subtle blood volume pulse signals from face videos,while lacking explicit modeling of the noise that dominate face video content.They are susceptible to the noises and may suffer from poor generalization ability in unseen scenarios.The second focuses on modeling noisy data directly,resulting in suboptimal performance due to the lack of regularity of these severe random noises.In this thesis,we propose a decomposition-reconstruction network that focuses on modeling physiological features.Meanwhile,we design a novel periodic loss function to constrain the inherent periodicity of physiological features in face videos.Furthermore,an effective data augmentation strategy is proposed to synthesize augmented samples with different degrees of noise influence.The experiments in this thesis are conducted on data sets such as VIPL-HR,PURE,UBFC-rPPG and so on.In addition,the excellent performance of this work is verified by ablation experiments and comparison tests. |