Data Mining Of Text Information In Online Public Opinion Review With R

Posted on:2018-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:J L Lv

Full Text:PDF

GTID:2348330533465252

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

The development and application of internet provides a convenient carrier for people to obtain and exchange their information.This makes the internet become a data resource of a variety of massive information.The use of news client allows its internet users to access easily to news and information,and also to express conveniently their own viewpoints and dissemination of information at the same time.But this may be a breeding ground for rumors,leading to social contradictions,affecting social order in a certain sense,and especially,breaking the law and committing a crime.The viewpoints of internet users are the information resources of internet public opinions;they have the characteristic of complex and big data.It is of important theoretical and applicable meaning to study them statistically by the mining methods.In this dissertation,by taking the micro-blog users' comments of the telecommunications fraud as an example,the data mining of Chinese text is studied with the help of R language which has powerful drawing,data analysis function and rich expansion package.Through a series of unsupervised learning techniques and supervised learning ones,a regression model and a classification model for the micro-blog users' review points are established.Firstly,the micro-blog users' review points which we collected are segmented completely.Chinese text corpus is constructed and the corpus data is cleaned,including the removal of Chinese stop words and punctuation and set threshold to lower sparsity of corpus.For the cleaned data,the document-entry relation matrix is constructed and is as the basis for our following analysis.Then,the text data which contains the time and the number of praise are separated,and the time series diagram is analyzed basically.By using the document entry relation matrix,a regression analysis including the construction of multivariate linear regression model,decision tree model and random forest model is given for the internet users' number of praise.These three models are compared and the optimum model is found by defining a function.Furthermore,the optimum model is applied to predict the number of praise so as to improve the accuracy of tendency analysis for internet users' review points.Finally,the document entry matrix is used to classify the text data by unsupervised classification learning.Eight topics of the text data are found according to the results of clustering analysis and mixed topic model.Models are constructed by applying the support vector machine,random forest and maximum entropy learning method to the classified document-entry relation matrix.The unclassified documents are predicted classifiably by using the models,and a tendency analysis is given for the micro-blog users' review points.

Keywords/Search Tags:

Internet public opinions, collection and processing of text data, document clustering and classification, analysis of network comment

PDF Full Text Request

Related items

1	Application Of Intelligent Information Processing Technology In The Analysis Of Internet Public Opinions
2	Research On Data Processing Technology For Commodity Comment Text
3	Internet Public Opinion Monitoring And Analysis System To Achieve
4	Research Of Police Public Opinions Analysis System
5	Research Of Short Text Classification And Clustering In Public Opinion Analysis
6	Studies On Clustering Analysis And Visualization For Public Opinions Of Discrete Text About Certain Topic
7	Visual Analysis For Fast Understanding Of Document Collection
8	Research On Topic Detection And Tracking In Internet Public Opinion
9	The implementation of dynamic document organization using the integration of text clustering and text categorization
10	Research And Implementation Of Network Public Opinion Monitoring And Analysis System Based On Text Mining