Font Size: a A A

Text Feature Selection And Clustering Based On Full Covering Granular Computing

Posted on:2019-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZouFull Text:PDF
GTID:2348330569479537Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In today's information explosion,text information tends to rise at an exponential rate,and humans are immersed in the ocean of information.In the face of such huge data,how to quickly and accurately obtain relevant content of interest from these data is a great challenge for mankind.It is not only high cost but also lack of timeliness for sorting text information manually,so it is a hot topic to solve this problem by machine learning method.Text clustering is one of the key technologies in text mining,and it is widely used in automatic document collection and search engine.Text clustering is an unsupervised machine learning method,text feature selection is one of the preprocessing steps of text clustering,this dissertation proposes an improved feature selection algorithm and an improved text clustering algorithm aiming at the problems that the existing text feature selection efficiency is not high and the text clustering randomly selects the initial cluster center which leading to the disadvantage of low clustering accuracy.Granular computing is a new way to deal with key problems in machine learning and text mining,under the premise of preserving thevalue and information contained in the data,it can significantly reduce the dimensions of the data,which is an effective tool for processing large-scale text data.Full covering granular computing is a special case of granular computing,including full covering theory,information granulation and granularity calculation,which provides a new style for text feature selection and text clustering.The main research works of this dissertation include:1.This dissertation proposes a text feature selection method based on the full covering granular computing by expanding the location of the key factors,word frequency,part of speech to the TFIDF(Term Frequency Inverse Document Frequency)algorithm,named TFIDF_SP(Term Frequency Inverse Document Frequency_Speech and Place)algorithm,bLDA(background Latent Dirichlet Allocation)topical model that calculates semantic of information,at the same time combines TFIDF_SP and b LDA by the linear,get a set of feature words that match the text content,finally,information granulation performs to a set of feature words,and get on knowledge reduction of full covering granulation under the premise of keeping the text information unchanged,then obtains a more streamlined feature set.The experiment shows that compared with other feature selection algorithms,the feature selection algorithm in this dissertation is more consistent with the actual meaning of text expression.2.This dissertation proposes a K-medoids text clustering algorithm based on full covering granular computing,using the Singles-Pass algorithm to coarsely cluster text sets,selecting initial clustering center candidate set from coarse clustering by full covering granular computing theory,and choosing initial clustering center from the initial clustering center candidate based on the theory of density and maximum minimum distance.Experiments show that compared with other improved K-medoids algorithm,the initial clustering center of this dissertation selected are more in line with the actual clustering center,so the clustering quality is more good,at the same time,the improved feature selection algorithm and improved clustering algorithm are combined with the improved feature selection algorithm and traditional clustering algorithm,the results show the feasibility and effectiveness of the feature selection algorithm and clustering algorithm.
Keywords/Search Tags:full covering granular computing, feature selection, text clustering, K-medoids clustering
PDF Full Text Request
Related items