Font Size: a A A

Research Of Literature Feature Minining And Classification

Posted on:2013-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:N KongFull Text:PDF
GTID:2248330374472740Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The discipline aspect of multiple science literatures are more deepening and refined accompanied with the development of science and technology. The mass of science literatures and data plague the researcher, they could hardly find the literature paper they concerned instantly. The smart retrieve of science literatures is in great need to enhance the efficiency of science researchers and the utilization of science literatures. Base on this point, the research of science literatures mining and classification is conducted in this paper.In the research part of subject characteristic, the public characteristics and private characteristics are found by the statistic of corpus. The public characteristics and private characteristics together define the interdisciplinary and disciplinary status of science literatures. A statistic tool is developed by Jython programming language for the research of disciplinary characteristic statistic method. This tool possesses regex expression and word traversal, whole PubMed file and every abstract two options of statistic. The regex expressions are confirmed using this tool in this paper.In the research part of literature classification,4abstract files of "cheek","chin","eyebrows","eyelids" are downloaded from PubMed as the literature characteristic mining and classification research object. Base on "not all alphabet" and "not only the first letter in uppercase" two term statistic methods, to extract the characteristic of literature file. The rough set tool is used to reduce the statistic terms. Finally, two feature sets are got with the dimension of607and202for two term statistic methods respectively. Feature vectors are constructed according to these two feature sets. The feature vectors are divided into two pieces averagely, one is used to train classifiers and the other one is used to test the classifiers.4targets classifiers and2target combination classifiers base on decision tree, artificial neural network, and support vector machine respectively. The experiment result indicates that the4targets artificial neural network classifiers perform best, the classifiers with the first term statistic method are better the ones with the second term statistic method.
Keywords/Search Tags:Literature Mining, Literature Classification, Rough Set, DecisionTree, Artificial Neural Network, Support Vector Machine
PDF Full Text Request
Related items