Font Size: a A A

Automatic Categorization Of Chinese Journal Papers Based On Machine Learning

Posted on:2014-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:P YeFull Text:PDF
GTID:2248330395495921Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the increasing numbers of electronic journals, the number of articles in electronic journals is increasing rapidly, how to classify electronic journal articles reasonably has become an urgent problem. In the environment of big data, traditional manual categorization appears to be inadequate, there is also the problem in the area of articles categorization, applying the automatic categorization method to articles categorization can effectively solve this problem.The appearance of the concept of machine learning promotes the automatic categorization developing rapidly, this paper will apply the idea of machine learning to the field of automatic categorization of journal articles, I will choose the data from China Knowledge Resource Integrated Database as the experimental samples of my paper, the samples were divided into a training sample and a test sample after cleaning, use the concept of supervised learning in machine learning "first to learn, after the test" to do the experiment, first try to learn the training samples, and then use the learned classifier to classify the test samples, compare the real classes with the result of the testing experiment in order to analyze whether machine learning is suitable in the area of the automatic categorization of journal articles.Machine learning algorithms for the automatic categorization experiment in my paper are support vector machine algorithm and BP neural network algorithm, these two algorithms are compared in terms of the correct rate, the amount of training samples and experimental time by experiments, support vector machine algorithm is proved to be more suitable, through comparative experiments I get the most suitable parameters for support vector machine algorithm in this paper. Thus, I get the best machine learning environment for experiments.Under the good environment of machine learning, the experimental samples come from the information of articles in China Knowledge Resource Integrated Database, including the title, keywords and abstracts, the combination of these three characteristics source shows higher performance in improving the results of journals articles automatic categorization experiments, and through the comparative experiments I find a set of appropriate weighting for my experiment.The traditional categorization of journal articles is Chinese Library Classification, which has more complex classification and Categories, and this is clearly not suitable for the automatic categorization system. So we use the concept of hierarchical classification method to transfer the Chinese Library Classification into a three-tier categorization system and do categorization experiments respectively. From the first layer to the third layer, the experimental accuracy rates are95.05%,92.89%and89.02%, and the integrated correct rate is close to80%, which is a considerable experimental result. It proves the feasibility of machine learning in the automatic categorization of journal articles and gives us a new thinking in the categorization of journal articles.
Keywords/Search Tags:Machine Learning, Journal Papers, Automatic Text Categorization, Support Vector Machine, Hierarchical Classification Method
PDF Full Text Request
Related items