Research Of Literature Feature Minining And Classification

Posted on:2013-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:N Kong

Full Text:PDF

GTID:2248330374472740

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The discipline aspect of multiple science literatures are more deepening and refined accompanied with the development of science and technology. The mass of science literatures and data plague the researcher, they could hardly find the literature paper they concerned instantly. The smart retrieve of science literatures is in great need to enhance the efficiency of science researchers and the utilization of science literatures. Base on this point, the research of science literatures mining and classification is conducted in this paper.In the research part of subject characteristic, the public characteristics and private characteristics are found by the statistic of corpus. The public characteristics and private characteristics together define the interdisciplinary and disciplinary status of science literatures. A statistic tool is developed by Jython programming language for the research of disciplinary characteristic statistic method. This tool possesses regex expression and word traversal, whole PubMed file and every abstract two options of statistic. The regex expressions are confirmed using this tool in this paper.In the research part of literature classification,4abstract files of "cheek","chin","eyebrows","eyelids" are downloaded from PubMed as the literature characteristic mining and classification research object. Base on "not all alphabet" and "not only the first letter in uppercase" two term statistic methods, to extract the characteristic of literature file. The rough set tool is used to reduce the statistic terms. Finally, two feature sets are got with the dimension of607and202for two term statistic methods respectively. Feature vectors are constructed according to these two feature sets. The feature vectors are divided into two pieces averagely, one is used to train classifiers and the other one is used to test the classifiers.4targets classifiers and2target combination classifiers base on decision tree, artificial neural network, and support vector machine respectively. The experiment result indicates that the4targets artificial neural network classifiers perform best, the classifiers with the first term statistic method are better the ones with the second term statistic method.

Keywords/Search Tags:

Literature Mining, Literature Classification, Rough Set, DecisionTree, Artificial Neural Network, Support Vector Machine

PDF Full Text Request

Related items

1	A Comparative And Analysis Research Of Neural Network In The Classification Algorithm Of Data Mining
2	Research Achievement Orientied Literature Management And Mining
3	Artificial Intelligence And Big Data Technology Literature Information Mining
4	Research On Intelligent Recognition Method For Academic Literature Contents Based On Text Mining
5	Design And Implementation Of The Technical Text Categorization System
6	The Research On The Relationship Between Literature Publishing And Literature Development On Transition
7	Research On Support Vector Machine Based On Rough Set And Its Application
8	Association Based Evaluating System For Entities Of Literature
9	Electrocardiogram Classification Based On Rough Sets And Support Vector Machine
10	Study On Text Classification Based On Rough Set And Support Vector Machine