Font Size: a A A

The Research And Implementation Of Text Auto-categorization System

Posted on:2004-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YaoFull Text:PDF
GTID:2168360092992088Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
My graduate thesis's title is "the research and implementation of text auto-categorization system". This project is supported by the National Natural Science Foundation of China under Grant No.60173014 and the Beijing Municipal Natural Science Foundation under Grant No.4022003. This paper is the summary of my research work during my graduate student period.With the rapid development of Internet, the information resources have been much enriched. Through Internet more and more information are delivering to everywhere of the world, and more and more information are congregated in Internet. At the viewpoint of developmental trend, network will be the main source from which people get information. But the Internet's organization is very disordered, the information's hugeness and confusion make it more and more difficult to get interesting information from it.Information categorization is necessary for locating the information accurately and rapidly. Text is the main information carrier on Internet, so a well text auto-categorization system can organize and manage the information availably, supporting the information extracting effectively.In order to research on the text categorization technique, we established a text auto-categorization system as testing platform firstly. This system shows a good result on the "the 4 Universities Data Set". We test various feature selection algorithms and text categorization algorithms, and compare their characters. Based on the analysis of experiment result on the feature selection algorithms, we propose a measure for feature selection, and evaluate existing feature selection methods by this measure. Based on the experiment result we summarize the principle of feature selection. According to this principle we propose a new feature selection algorithm, and then we evaluate it and compare it with other feature selection algorithms.
Keywords/Search Tags:text categorization, feature selection
PDF Full Text Request
Related items