Font Size: a A A

Research On Key Problems Of Scene Recognition For Micro-Video

Posted on:2021-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:1368330602982462Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Micro-video is short video,which is taken by user in social media platforms through mobile phone,computer and camera.It is uploaded by user and shared with other users.In recent year,with the development of social media,micro-videos have become the important part of multimedia,and have important impact on people's lives.Scene recognition is the analysis and understanding of image or video scene,it is important content of computer vision.Although the research on scene recognition has a long history,micro-video scene recognition is a new research.It helps in mining the intrinsic value and increasing the usability of micro-video.It can also provide technologies and methods for micro-video related industries.Therefore,the study of micro-video scene recognition not only has important scientific significance,but also has high application value.Compared with traditional scene recognition,micro-videos scene recognition have some characteristics:1)Huge amount of data.There are hundreds millions users on social media platforms,they upload several videos to their social media account every day.These huge amount of data can support scene recognition.2)Shortness.The time of micro-video is short,it is few seconds or few minutes.The shortness makes it difficult to capture micro-video scene information,which in-creases the difficulty of micro-video scene recognition.3)Lots of social attributes.The text in comments and tags can be used as one modality for micro-video scene recognition.These characteristics provide many opportunities for the study of micro-video scene recognition.As the same time,there are many challenges in micro-video scene recognition:1)Noise issue.Micro-video is generated by user,it is subjective.There is no rules for the generation of micro-video.Therefore,noise issue exists in the data.2)Weak correlation between different modalities.The text information contained in comments and tags can be used as one of multiple modalities,together with the visual information of frames and audio information.However,owing to that the generation of micro-video is subjective and freewheeling,the correlation between visual,audio and text is weak.3)Inconsistency issue.There is significant content inconsistency within the same scene owing to the different users,even though the users' intention are the same.4)Weak scene semantics of partial modalities.Owing to the affection of noise and user's intention,partial visual,audio or text information is weakly related to scene semantics,and difficult to classify the category of scene.5)Data imbalance issue.There are large amount of data for popular scene,and small amount of data for unpopular scene.In order to address the above challenges,the studies in this thesis are as follow:1)To address the issue of weak correlation between multiple modalities,the complementary multi-modal fusion method is proposed for micro-video scene re-trieval.This method makes full use of the semantic complementarity of multiple modalities.The features of multiple modalities are concatenated into a vec-tor,and is nonlinearly transformed through multi-layer perceptron to learn the correlation between each dimension of feature and semantics.At the end,the supervised hash learning method is used for learning hash code which retain the discriminant and similarity of pair-wise samples.This method improved the effi-ciency and precision of micro-video scene retrieval.2)To address the issue of inconsistency within same scene,the consistency semantics learning method is proposed in this thesis for micro-video scene classi-fication.In this method,deep feature extraction method of scene is used for spa-tial feature extraction of micro-video scene.Attention mechanism is used in scene feature representation for extracting scene related content in frames.Long-short Term Memory model is used for temporal feature extraction.The two-branch framework and supervised learning mechanism are used in this method for learn-ing consistent feature representation.This method improved the accuracy of micro-video scene classification.3)To address the issue of weak scene semantics of partial modalities,the en-hanced multi-modal semantic learning method is proposed for micro-video scene classification.In this method,the modality with strong semantics can enhance the semantics of modality with weak semantics through minimizing the semantics distance between them.The enhanced feature representation of weak semantic modality is fused with the feature representation before enhancement.Finally,multiple modalities are fused through an adaptive weight learning method.This method improved the accuracy of micro-video scene classification.
Keywords/Search Tags:Micro-video Understanding, Scene Recognition, Multi-modal Fusion, Deep Learning, Modality Enhancement
PDF Full Text Request
Related items