Font Size: a A A

The Method Of Text Categorization Scheme Selection And Development Of A Prototype System

Posted on:2007-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y FuFull Text:PDF
GTID:2178360212457452Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
The contents of this paper are based on the National Natural Science Foundation of China (NSFC). The projects' pertinence analyze is based on the project proposals' content, in which text categorization is a basic problem.Text Categorization involves a wide range of applications such as text modeling, category arithmetic, feature selection and term weight setting. We should choose different scheme toward different case. The purpose of this paper is to analyze different types of the text, then analyze and conclude feasible schemes, also design the plan to find out the most optimum scheme. Then we design and implementation a text categorization prototype system, which integrate some common text categorization skills. The paper takes the different category structure texts as the instance, using exam data to test and gain the most optimum scheme.Firstly, we analyze the characteristic of different text categorization skills, including text modeling, category algorithm, and feature selection method and term weight. We analyze their advantage and disadvantage by deeply study their definition and refer to some literature.Secondly, we analyze the characteristic of text set, and construct the text categorization scheme. Then we analyze the feasible scheme for each kind of text, and institute the standard evaluation index and method for selecting the most optimum scheme.Thirdly, we expound the process of analyze and design the text categorization prototype system. Also explain the key category algorithm; design the system by analyzing the process of text categorization also considering its safety, fallibility as well as easy-maintenance. We focus on analyzing the prototype system's structure design, module design and actualization.Finally, we implement the prototype system and use two different kinds of exam text sets to test the performance of feasible schemes. According to the result of experiment and different schemes' performance, we conclude the most optimum scheme for different kinds of text in category structure.
Keywords/Search Tags:Text category algorithm, Feature selection, Term weighting, text modeling, Patten design
PDF Full Text Request
Related items