Font Size: a A A

Research And Application Of Multilayer Feature Selection Algorithm Based On Clustering

Posted on:2016-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhuFull Text:PDF
GTID:2308330464454267Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the improvement of document retrieval capabilities, More and more users are accustomed to quick retrieved from the China National Knowledge and Digital Library, to obtain the necessary documentation. But in the era of knowledge update accelerating, New theme, new things, new disciplines appeared in large numbers, The kind and quantity of information increases greatly. Making the number of scientific literature to greatly increase near-exponential speed yearly. Such a mass of scientific literature, the reader will not only consume a lot of time, but also large numbers of new concepts resulting in not collecting required literature comprehensively.Text feature selection is an important part of the scientific literature classification. the merits of feature selection algorithm will directly affect the ultimate effect of the classification system. Therefore feature selection is a key technology of restricted scientific literature classification performance. This paper introduces the relevant basic knowledge of scientific literature classification, focuses on several common feature selection algorithm. comprising: MI, IG, 2c, DF, ECE. Then constructing four mining model based on the structural characteristics of scientific literature. proposing text feature selection algorithm for the scientific literature--Multi LM-FE. This method combines the K-means algorithm and Apriori algorithm. K-means algorithm is applied to the front three of four mining model. Apriori algorithm is applied to the fourth. In order to improve the accuracy of the text feature selection algorithms, this paper proposed to correct the distance between objects function by entropy method in terms of accuracy and performance, to select high-quality initial clustering centers. By processing interfering information in dynamic clustering processing, to reduce the number of iterations, so as to achieve high accuracy and good performance. The improved Multi LM-FE algorithm improves the efficiency and precision of text feature selection. compared with previous methods, the algorithm has greatly improved, especially in terms of the scientific literature is more suitable.Finally, on the basis of the relevant technical research of scientific literature automatic classification, developed a classification system of scientific literature, this paper achieves a classification system of these modules: Document segmentation, Chinese word processing, stop word processing, feature selection, weighting calculations, text representation, classification, etc. Comparing the performance of the classifier of the above several feature selection algorithm by experiments. Experimental results show that the proposed text feature selection algorithm has better classification performance whether KNN classifier, or SVM classifier, shows that the method is effective and feasible in the scientific literature classification.
Keywords/Search Tags:Scientific literature, Feature selection, K-means algorithm, Apriori algorithm, Text Categorization
PDF Full Text Request
Related items