Research On The Application Of Feature Screening And Clustering Algorithm In Text Mining

Posted on:2022-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:D C Li

Full Text:PDF

GTID:2518306539953329

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Nowadays,with the rapid development of the Internet,data not only shows the accumulation speed in quantity is faster and faster,but also its form is gradually diversified.Since text data is used most frequently,it is a worthy topic to obtain valuable information automatically from massive text data.In terms of text data mining,this paper summarized its significance and present situation,and then explained the basic flow and common methods of text analysis.Next,the method of dimensionality reduction based on marginal Bayes classifier was improved.Based on the original method,the first stage of dimensionality reduction of text data was realized via considering different misjudgments that might contribute to different losses.Meanwhile,the linear discriminant analysis method was used to fulfill the second stage dimension reduction of text data.Experiments showed that this method could not only effectively reduce the dimensionality of text data,but also retain as much information that was beneficial to classification in text data as possible.On the other hand,this paper proposed a grid-based SVM nearest-neighbor clustering algorithm.The model of SVM was used to learn the boundary of the class cluster.And dynamic combination of similar points was realized according to the KNN algorithm.After numerical simulation,it can be found that the NMI and ARI of this algorithm when processing complex structure data are both above 0.9,which exceeds other algorithms.It can be considered that this algorithm has advantages in processing complex structure data.Finally,through the text dimension reduction method and clustering algorithm proposed in this paper,the review content of Huawei Mate Book X Pro on JD platform was classified in terms of text emotion.It can be found that the loss of classification is reduced by at least 0.17 comparing with other algorithms,which proves the effectiveness of the algorithm designed in this paper in solving practical problems.

Keywords/Search Tags:

Text Data Mining, Text Data Feature Screening, Marginal Bayes Classifier, Grid Clustering

PDF Full Text Request

Related items

1	Data Mining Systems And Their Applications - Improve The Performance Of The Naive Bayes Text Classifier, Associated Characteristics
2	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
3	Research And Application On The Technology Of Web Text Mining
4	The Cluster Analysis On WEB Text Mining
5	Key Techniques Of Text Ming On Criminal Cases
6	The Research And Implementation Of Automatic Text Categorization For Chinese Web Documents
7	Research On Text Mining Based On MapReduce
8	The Study And Application Of Web Text Data Mining Technology Based On The Approximate Pages Clustering Algorithm
9	Analysis Of Laptop Network Scoring Based On Text Mining
10	Text Mining Method And Application