Font Size: a A A

Design & Implementation Of Automatic Classification System Based On The Knowledge Base

Posted on:2008-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2178360242478821Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Along with the speedy development of the technology of computer and internet, the text information on internet increases abundantly. However, the abundant data put users in a dilemma because of the difficulty of utilizing these resources caused by lack of order of those data. A simple and efficient means is obviously needed to order the information so as to enable the users to obtain what they need quickly. Automatic text classification is such a means, which has become a very important technology of great practical value. Now much attention in the field of information processing has been focused on automatic text classification and some means, including theories of statistics and machine learning, have been employed for this end.This thesis, based on experiential data, presents the design and implementation of automatic classification of Chinese text with all its aspects of establishing knowledge base, automatic word division, character choice, and automatic classification. There are five chapters in the thesis.Chapter one gives a briefing of background and meaning of the research, the home and foreign situations of similar research, and the main content and structure of the thesis.In Chapter Two, the author describes her classification knowledge base, based on the system of Chinese Library Classification (CLC). The data comes from sources of CLC, CNKI, and the index data of Xiamen University Library.In Chapter Three the author introduces some popular arithmetic of word division. Considering the technical difficulties of Chinese word division, this paper adopt the word segmentation strategy base on dictionary, and describes in detail the organization of dictionary, the arithmetic of word division, the weight calculation of character, and the combination of characters. How to identify new words is also introduced briefly.In Chapter Four, the author improves the weight calculation of character and brings forward the Arithmetic of Automatic Classification based on the concept assembled theory. This paper adopts the improved Dice coefficient as the calculation method of relation; consider the aspect of the influence on keywords' weight, the author imports two parameters into the formula. Based on the concept assembled theory, the classification with the maximum weighted was best one to the document which needs to classify.The automatic Chinese text classification based on knowledge base is designed in Chapter Five. The testing result is also analyzed with shortcomings and basic ideas of revision stated.
Keywords/Search Tags:Knowledge base, Automatic indexing, Automatic classification
PDF Full Text Request
Related items