Research Of Network Pyramid Scheme Based On NLPIR

Posted on:2020-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:P Y Mu

Full Text:PDF

GTID:2428330623956609

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

At present,the network pyramid scheme has become a major tumor that hinders social development.With the advent of the era of big data,the explosive growth of the amount of data in the network,provides a new form of transmission for the pyramid selling,and the network pyramid selling emerges at the historic moment.Due to the fast dissemination speed,hidden transmission mode,disseminators scattered in the country or even outside the country,electronic evidence is difficult to obtain,providing the network pyramid selling soil for offenders.At the same time,but also to the functional departments of regulation and crackdown has brought no small challenges.There's plenty of network pyramid scheme sites spread pyramid scheme information through the network in the form of text.How to effectively mine from the massive text information and determine which is the network pyramid scheme text has become an urgent need.In the process of text classification of network pyramid scheme,due to the diversity of text features,the plenty of noise data are generated.Therefore,the training text cannot well fit the distribution of the entire feature space.To accurately classify and identify network pyramid scheme text,the traditional classification algorithm is not reliable.In addition,the format of network pyramid scheme text is disordered,and good text preprocessing will directly affect the classification results.Feature selection can affect the accuracy of text classification,but some obvious features can hardly represent the characteristics of network pyramid scheme text.This study proposes a joint topic model,Paragraph Vector Latent Dirichlet Allocation(PV_LDA),based on the characteristics of high-yield,high rebate,hierarchical salary and text topic diversity described in the text.The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of "high-interest rate" and "hierarchical salary" from the network pyramid scheme text.The Gibbs sampling is used to derive the "pyramid scheme" topic distribution matrix represented by the two features,which is used for classification processing by the classifier.For the above core technology points,the research content includes the following three points:(1)After preprocessing the text with NLPIR,the subject model based on LDA model is used to summarize the characteristic information in the text through clustering.Gibbs sampling and iterative calculation of model parameters are used to effectively obtain the subject distribution matrix from the network pyramid scheme text.(2)Will two to represent the theme of the network pyramid scheme feature fusion,improve the theme distribution matrix of network pyramid scheme characteristics of generalization ability is the key of the research will be two to represent the theme of the network pyramid scheme feature fusion,improve the theme distribution matrix of network pyramid scheme characteristics of generalization ability is one of the focus of research,the method adopts the Hadamard product,and the introduction of a joint residual vector,to incorporate two classes of subject distribution matrix,the generated joint subject distribution matrix of network pyramid scheme text more representative.(3)This study will comprehensively consider the indicators of the classifier and choose a text classifier with high accuracy and fast processing speed through cross experiment comparison.The experiment shows that the theme model proposed in this paper can capture the characteristics of network pyramid scheme more reasonably,and the generalization ability of the model is guaranteed while considering the effect of theme mining.

Keywords/Search Tags:

Network Pyramid Scheme, Topic Mining, Topic Model, LDA model, Text Classification

PDF Full Text Request

Related items

1	The Text Categorization And Structure Of Theme Words Network Based On Topic Models
2	Research And Application Of Topic Evolution Model Based On LDA
3	Research On Classificational Model Of Text Sentiment Based On Topic
4	Research On ’Topic+View’ Extraction Method Based On WSO-LDA For Micro Blog Topic
5	Topic Analysis And Recommendation System Based On Scientific Research Documents
6	Network News Hot-Topic Detection And Discovery Based On Cloud Platform
7	Research On Short Text Topic Information Mining Technology
8	Text Classification Algorithm Based On Chinese And English Topic Space
9	Short Text Topic Mining Based On W-BTM And Text Classification Application
10	Research And Application Of Text Classification Model Combining Character Features And Topic Features