Font Size: a A A

Haracteristic Populations Based On Acoustic Characteristics And Visual Behaviors Identification Method Research

Posted on:2024-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhouFull Text:PDF
GTID:2568307073968299Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the outbreak of the novel coronavirus in 2019,the identification and statistical methods for characteristic populations have been increasingly recognized and applied in the fields of medicine,public health management,and public safety management.Capturing typical visual behaviors of characteristic populations through image acquisition and utilizing deep neural networks for sample construction,training,and feature identification modeling have become the mainstream research methods.However,in addition to visual behaviors,acoustic features such as coughing and crying are also important characteristic features when considering populations affected by COVID-19 or infants.This thesis presents a method for identifying characteristic populations based on acoustic features and visual behaviors,integrating image and acoustic data for multimodal feature normalization and determination.This approach effectively enhances the efficiency and accuracy of identifying characteristic populations.The main contributions of this work are as follows:1.In the acoustic modality,to address the issue of insufficient representation from a single feature and the problem of redundant information interference caused by simply stacking different features,a sound classification method based on multi-channel features and a hybrid attention model is proposed.The sound signals are transformed into time-frequency representations,and various filters are employed to extract spectral features.The features are then reconstructed into three-channel feature maps.Furthermore,a hybrid classification model consisting of channel and time-frequency attention modules is introduced.The multi-channel features utilize the complementarity between features to compensate for the limitations of a single feature representation.The time-frequency attention module focuses on the more effective information in the time and frequency domains.This method effectively suppresses background noise,eliminates redundancy,and improves convergence speed and classification accuracy.Comparative experiments demonstrate its effectiveness on several public sound recognition datasets.2.In the visual modality,to address the issue of weight imbalance resulting from the choice of adjacent nodes in behavior recognition using graph convolution,a behavior recognition method based on multi-scale adjacency matrices and adaptive attention is proposed.Multiple-scale adjacency matrices are used to represent non-adjacent nodes in skeleton sequences and their connectivity.Spatial attention is applied to model complex spatial correlations between different positions,and temporal attention captures dynamic temporal correlations between different time frames,enhancing the dynamic spatiotemporal modeling capability of skeleton data.3.In terms of multimodal fusion,an optimal matching weighted fusion discriminant algorithm is proposed.The weights are adaptively adjusted based on the recognition accuracy of different models,achieving the fusion of the sound and visual modalities at the decision level.Additionally,a microphone array module is incorporated for sound source localization.A feature-based population identification system integrating sound and visual fusion is designed and implemented.
Keywords/Search Tags:Acoustic characteristics, Visual behavior, Characteristic population, Identification method, Microphone array
PDF Full Text Request
Related items