Research On Spam Recognition Based On Microblog

Posted on:2020-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:R Liu

Full Text:PDF

GTID:2428330599451304

Subject:Engineering

Abstract/Summary:

Natural language processing has always been a key topic.Identifying useless spam in Chinese short text is very important for user usage and platform maintenance.This paper analyzes Chinese text processing methods and natural language processing methods,then made the following study.The first is to improve the classification effect by improving the input layer and output layer of the classifier,and verify its effectiveness through experiments.The second is to propose a multi-feature fusion text similarity calculation method by summarizing the regularity of spam.Finally,the above two methods are combined to design a spam filtering system.The main contributions of this article are as follows:(1)On the identification of the content,we improve the classification effect by improving the ways of the input and output layers.On the input of the classification algorithm,we achieve a vectorization model about Chinese semantic.In this model,we obtains a matrix of the arcs about the relationship between words by the dependency syntax analysis of Yamada firstly;then decomposition this matrix to get the vector of the text.On the output layer,the pooling layer and the fully-connected layer in the CNN are replaced by the Chunk-max pooling and the hierarchical softmax.(2)On the identification of the account number,we summarized the unusual property of the messages number which sent by this account,and the high similarity of texts,and then propose a recognition method based on the characteristics of suspicious users.This method first checks the abnormality of the information amount by setting a certain window value size;if it is abnormal,we calculate the similarity of the paper by using the multi-feature model for the information which in the abnormal time period,if the similarity exceed the threshold,it belongs to spam it that way.(3)We designed a spam filtering system based on the above two algorithms.When an account sends a message,the platform automatically obtains the information source and the message content.The identification method based on the characteristics of suspicious users and the improved CNN classifier for the information content.In the classification,we compares with the common classification and identification methods,and made two comparisons on the classification accuracy rate and the training time of the model.The text similarity is compared by calculating the cosine of the included angel.The experimental results show that the proposed algorithm has better recognition performance for spam.

Keywords/Search Tags:

Convolutional Neural Network, Text Categorization, Dependency Parsing Analysis, Spam, Text Similarity

Related items

1	Research On The Parallelization Of Text Categorization Based On Convolution Neural Network
2	Study On Text Categorization Method Based On Graph Convolutional Networks
3	Research And Its Application On Chinese Text Categorization Algorithm Based On CHI And Convolutional Neural Network
4	Research On Text Similarity Algorithm Based On WMD Distance
5	Deep Learning For Short Text Semantic Similarity Measures
6	Study On Term Semantic Relationship And Its Application In Text Categorization
7	Research On The Method Of Text Categorization Based On Semantic Similarity
8	Research On The Method Of Short Text Categorization Based On Topical Similarity
9	Research On Chinese Text Classification Based On Deep Learning Theory
10	Sentiment Analysis On Short Text