Font Size: a A A

Research Of Hierarchical Text Categorization System Based On VSM And Rule Matching

Posted on:2007-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z T BaiFull Text:PDF
GTID:2178360212955244Subject:Information Science
Abstract/Summary:PDF Full Text Request
Along with the popularization and rapid development of Internet, digitalized electronic information resources get greatly abundant and rapidly circulate. Effective organization and processing of the enormous information on the Internet are the major challenges in the Internet era. Automatic classification of the information according to their categories is the hot spot of research in current library information and computer fields, and large amount of research have been carried out in this respect. However, practical and feasible systems are rare. This paper systematically studies the methodologies and relevant technologies for feature extraction, expression and categorization of the enormous digital resources, and provides a solid basis and feasible program for automatic processing of the digital information resources.Systematic development is focused on the following aspects during construction of the categorization system. Effective methods of feature extraction and selection are proposed and the expressing mode of their weight, especially the building and maintenance of key word glossary, are found out. Ways that can integrate two different categorization methods, statistics and regulation, to bring the advantages of both methods into play and to improve the efficiency and accuracy of the categorizers. The differences and common points of linear categorization and hierarchical categorization are analyzed and the advantages of hierarchical categorization are revealed and its feasibility is demonstrated through experiments. The practicability of automatic categorization technologies in real environment is studied and the problems in its realization are solved. This paper presents the solutions of these problems and the algorithms and flow charts of processing, as well as relevant data structures. In view of the problems occurred in the process of study, several new algorithms and concepts are presented based on the study achievements in the relevant fields:Referring to the principles of rotated keywords and in combination with the relevant statistical models, the original word-extracting dictionary is compressed and optimized in both the positive and negative directions, so as to achieve the goal of reducing the number of dimensions and accurately expressing the themes.
Keywords/Search Tags:Text Categorization, VSM, Hierarchical Text Categorization, classification rules, Rough Set
PDF Full Text Request
Related items