Font Size: a A A

The Implementation And Application Of Anti-Textual Spam Algorithm Based On Knowledge Graph

Posted on:2020-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:G H XieFull Text:PDF
GTID:2518305735486784Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Almost all online social application platforms have spams,the type of spam information include advertising,pornography and violence et al,They seriously affect the socially ecological environment of social platforms.As a social platform with more than 400 million monthly active users,Weibo has also been affected by these spams,Improving the spam recognition rate of Weibo text is important for maintaining a healthy social environment.this thesis focus on improving the recognition effect of advertising spam text.Advertising spam text contains a large number of commodity entities,brand names and other proper nouns,understanding the semantics behind them can help to improve the anti-spam effect.This thesis proposes to use the method of entity conceptualization to excavate the semantics behind the short text,using the commodity knowledge graph JdGraph to map the proper nouns in the advertising spam text to the related concept set,and use the concepts to represent the text content.On the basis of entity conceptualization,the concept word embeddings and text word embeddings are concatenated together by using the pre-trained word embeddings to obtain"concepts-short text" embedding,and it is input into the convolution neural network algorithm KPCNN designed in this thesis for feature learning.Experiments are carried out on the Weibo short text dataset,and the categorization effect is validated with variable-controlling method,in this case,we change the algorithms we use while feature inputs are made static,also we change feature inputs while algorithms we used are made static.It can be concluded that knowledge graph is an effective way to excavate the semantic meaning of short text.Through entity conceptualization,the categorization accuracy of KPCNN is 7%higher than that of the best performing baseline algorithm.
Keywords/Search Tags:anti-textual spam, short text categorization, neural network language mod-el, entity conceptualization, knowledge graph
PDF Full Text Request
Related items