Research And Application Of Multilayer Feature Selection Algorithm Based On Clustering

Posted on:2016-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Zhu

Full Text:PDF

GTID:2308330464454267

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the improvement of document retrieval capabilities, More and more users are accustomed to quick retrieved from the China National Knowledge and Digital Library, to obtain the necessary documentation. But in the era of knowledge update accelerating, New theme, new things, new disciplines appeared in large numbers, The kind and quantity of information increases greatly. Making the number of scientific literature to greatly increase near-exponential speed yearly. Such a mass of scientific literature, the reader will not only consume a lot of time, but also large numbers of new concepts resulting in not collecting required literature comprehensively.Text feature selection is an important part of the scientific literature classification. the merits of feature selection algorithm will directly affect the ultimate effect of the classification system. Therefore feature selection is a key technology of restricted scientific literature classification performance. This paper introduces the relevant basic knowledge of scientific literature classification, focuses on several common feature selection algorithm. comprising: MI, IG, 2c, DF, ECE. Then constructing four mining model based on the structural characteristics of scientific literature. proposing text feature selection algorithm for the scientific literature--Multi LM-FE. This method combines the K-means algorithm and Apriori algorithm. K-means algorithm is applied to the front three of four mining model. Apriori algorithm is applied to the fourth. In order to improve the accuracy of the text feature selection algorithms, this paper proposed to correct the distance between objects function by entropy method in terms of accuracy and performance, to select high-quality initial clustering centers. By processing interfering information in dynamic clustering processing, to reduce the number of iterations, so as to achieve high accuracy and good performance. The improved Multi LM-FE algorithm improves the efficiency and precision of text feature selection. compared with previous methods, the algorithm has greatly improved, especially in terms of the scientific literature is more suitable.Finally, on the basis of the relevant technical research of scientific literature automatic classification, developed a classification system of scientific literature, this paper achieves a classification system of these modules: Document segmentation, Chinese word processing, stop word processing, feature selection, weighting calculations, text representation, classification, etc. Comparing the performance of the classifier of the above several feature selection algorithm by experiments. Experimental results show that the proposed text feature selection algorithm has better classification performance whether KNN classifier, or SVM classifier, shows that the method is effective and feasible in the scientific literature classification.

Keywords/Search Tags:

Scientific literature, Feature selection, K-means algorithm, Apriori algorithm, Text Categorization

PDF Full Text Request

Related items

1	Multi-class Scientific Literature Automatic Categorization System
2	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
3	Research Of Text Categorization Based On The Theme Mining And Covering Algorithm
4	Design And Implementation Of Kazak Text Categorization System
5	The Research Of Text Representation And Feature Selection In Text Categorization
6	Text Representation Model And Feature Selection Algorithm
7	Research And Its Application On Chinese Text Categorization Algorithm Based On CHI And Convolutional Neural Network
8	Research On Multi-Topic Partition Method For Scientific And Technical Literature Set Based On Surface Text Information
9	Text Categorization Algorithm Based On Machine Learning
10	Improved Feature Selection Algorithm And ITS Application In Text Categorization