| With the development of online video platforms,one emerging type of usergenerated comment named DanMu has become more and more popular at many online video platforms.Different from traditional comments,DanMu appears in the video content and is synchronized with specific playback times.This form of commenting enriches the interaction between users and provides new impetus for research in related fields.However,low-quality DanMu not only affects user experience but also limits the performance of related tasks.Therefore,detecting low-quality DanMu efficiently and accurately has become an urgent problem.However,the uniqueness of DanMu semantics often results in poor performance of existing methods based on text semantics representation.Therefore,introducing additional modalities is necessary.Among them,visual information is particularly important for detecting low-quality DanMu,as it can effectively alleviate the herd effect in DanMu text.In addition,users often exhibit specific eye-tracking behavior patterns when watching videos with DanMu,and different eye-tracking patterns often correspond to users’ emotional judgments of the visual focus.However,how to use these patterns to improve performance is still a problem that has not been fully explored.In response to the aforementioned problems and challenges,this thesis attempts to use human eye-tracking patterns to detect abnormal DanMu.We collected an eyetracking dataset to explore cognitive patterns in human viewing behavior and understand human cognitive processes,then design a multi-modal abnormal DanMu detection framework that includes eye-tracking features.Specifically,this thesis’s main research content and contributions are as follows:(1)This thesis proposes a novel multimodal abnormal DanMu detection framework that includes eye-tracking features.The framework extracts pattern information from the sequence of eye-tracking data through sequential pattern mining methods.It uses eye-tracking patterns corresponding to DanMu to assist the semantic representation of the DanMu text.The representation of DanMu with eye-tracking features is then fused with corresponding video keyframe features,effectively improving the model’s ability to detect abnormal DanMu.Additionally,this thesis designs an efficient modal fusion method for eye-tracking features and text features,which mitigates the problem of optimization imbalance between eye-tracking modality and text modality.(2)Prior to this,there is no publicly available dataset containing eye-tracking features for abnormal DanMu detection tasks.This thesis collects and releases a labeled video DanMu dataset containing eye-tracking data,which provides convenience for researchers who hope to introduce human cognitive patterns into abnormal DanMu detection tasks.(3)Based on the above methods,extensive comparative experiments and ablation experiments were conducted on the collected real-world dataset in this thesis.The abnormal DanMu detection framework designed in this thesis outperforms other baseline models in almost all classification indicators,demonstrating the importance of introducing human cognitive patterns into abnormal DanMu detection tasks through eye-tracking modalities.In addition,the modality ablation experiment also verified the effectiveness of visual modalities.At the same time,in the comparative experiment of fusion methods,the fusion method proposed in this thesis is also significantly better than other baseline methods in terms of convergence and performance.Lastly,addressing the limited scalability of the eye movement modality representation method used,we improved the sequential representation method based on attention mechanisms and verified the effectiveness of this approach. |