Font Size: a A A

Research On The Application Of Feature Screening And Clustering Algorithm In Text Mining

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:D C LiFull Text:PDF
GTID:2518306539953329Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of the Internet,data not only shows the accumulation speed in quantity is faster and faster,but also its form is gradually diversified.Since text data is used most frequently,it is a worthy topic to obtain valuable information automatically from massive text data.In terms of text data mining,this paper summarized its significance and present situation,and then explained the basic flow and common methods of text analysis.Next,the method of dimensionality reduction based on marginal Bayes classifier was improved.Based on the original method,the first stage of dimensionality reduction of text data was realized via considering different misjudgments that might contribute to different losses.Meanwhile,the linear discriminant analysis method was used to fulfill the second stage dimension reduction of text data.Experiments showed that this method could not only effectively reduce the dimensionality of text data,but also retain as much information that was beneficial to classification in text data as possible.On the other hand,this paper proposed a grid-based SVM nearest-neighbor clustering algorithm.The model of SVM was used to learn the boundary of the class cluster.And dynamic combination of similar points was realized according to the KNN algorithm.After numerical simulation,it can be found that the NMI and ARI of this algorithm when processing complex structure data are both above 0.9,which exceeds other algorithms.It can be considered that this algorithm has advantages in processing complex structure data.Finally,through the text dimension reduction method and clustering algorithm proposed in this paper,the review content of Huawei Mate Book X Pro on JD platform was classified in terms of text emotion.It can be found that the loss of classification is reduced by at least 0.17 comparing with other algorithms,which proves the effectiveness of the algorithm designed in this paper in solving practical problems.
Keywords/Search Tags:Text Data Mining, Text Data Feature Screening, Marginal Bayes Classifier, Grid Clustering
PDF Full Text Request
Related items