Font Size: a A A

Design And Implementation Of The Technical Text Categorization System

Posted on:2014-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:2268330425958711Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, scientific literature has become an important source to access information for more and more people. The classification of science literature, which is a very meticulous and certain academic work, is always done by artificial traditional processing method. It has become a hot research topic to elicit potential, valuable information fastly and accurately from the large amounts of texts. Therefore, research and application in this area has important realistic significance.The paper is devoted to studying how to realize the automatic classification system of science literature with KNN and SVM classification algorithm based on Vector Space Model. The main work in this paper is as follows:(1) The paper analyses the unique language style and characteristic format of the science literature, including the title, abstract, key words, text and so on. The title, abstract, key words, which are related to the core of content with simple words, can reflect the ideas and the core of the article. According to that analyzed format, we should fully consider the unique characteristic of the science literature when designing the automatic classification system module.(2) The paper studies and describes detailly the key technologies of the system, including text preprocessing, feature representation, feature selection, classification algorithm, evaluation criteria and so on.(3) The paper design and realize the automatic classification system of science literature with KNN and SVM classification algorithm, introduce detailly the system of the design method, general architecture and processing procedure, and achieve text preprocessing module, feature selection module, weight calculation module, training classifier module, classification module, evaluation criteria module, ect.(4) The function and performance of the system is tested, which shows that it can meet the daily demand of the automatic classifying of science literature and has improved accuracy and recall test of the classification system.
Keywords/Search Tags:Science Literature, Text Classification, Vector Space Model, FeatureSelection, K-nearest Neighbor, Support Vector Machine
PDF Full Text Request
Related items