| With the rapid development of the Internet and the intensified competition in the media,media professionals or self-media editors often attract attentions and clicks by clickbait news to enlarge the revenue.Clickbait news usually have two essential connotations:luring headlines and low similarity between headlines and target contents.Due to the huge amount and dynamics of online news,automated clickbait detection methods are indispensable.Traditional machine learning-based methods suffered from heavy feature engineering and poor performance.Some deep learning-based approaches considered only part of the news content or directly fuse all the information.In this paper,for better solving the online clickbait problem.We conduct comprehensive analysis and research on the semantic pattern and detection methods of clickbait news in the domain of social media.We design novel models based on the two connotations of clickbait and the click behavior of readers and finally achieve better performance than the state of art baselines on the two datasets.The main work and innovations are listed as follows:(1)Considering both connotations of clickbait news:luring headlines and low similarity between headlines and target contents,and the behavior characteristic when readers judging some news as clickbait,we propose a combined model "LSACD" which simultaneously detects the luring degree of headlines and computes the similarity between headlines and target contents.Besides,it formulates the judging behavior of readers as a novel adaptive weighting mechanism.This kind of detection model which is inspired from the definition of clickbait and the behavior characteristic of readers could obtain better detection performance.(2)For better modeling long Chinese news bodies,we propose a graph neural network-based clickbait detection model named "HDM-GNN".In this model,the long document is modeled as a tree graph structure,the hierarchical text modeling method is used to obtain the semantic features of different levels of chapters,paragraphs and sentences,and the graph neural network is used to fully capture the interaction relationship between different levels of semantics of long text and the special structural features.The model can effectively understand the main events described in the news and improve the detection performance of the model.(3)We construct a novel and real Chinese clickbait dataset which contains nearly 8000 news with high click rate from main social media news platforms in China and made pre-processing,then all the samples are manually annotated by several annotators,finally a Chinese clickbait dataset with high quality is obtained.We conduct in-depth analysis on the semantic pattern characteristics of the Chinese clickbait dataset and the existing open English clickbait dataset,and use multiple semantic pattern features and simple statistical features as the input of the machine learning algorithms to classify the clickbait news.(4)We have done sufficient comparative experiments on both Chinese and English datasets.The main comparative experiment methods include a variety of statistical machine learning algorithms widely used in the industry and deep learning algorithm models that have achieved excellent performance in the field.The comprehensive experimental comparison results verify that the proposed LSACD model and HDMGNN model can achieve better experimental performance than the most advanced models in the field,and can achieve better detection performance. |