Design & Implementation Of Automatic Classification System Based On The Knowledge Base

Posted on:2008-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Wang

Full Text:PDF

GTID:2178360242478821

Subject:Systems Engineering

Abstract/Summary:

Along with the speedy development of the technology of computer and internet, the text information on internet increases abundantly. However, the abundant data put users in a dilemma because of the difficulty of utilizing these resources caused by lack of order of those data. A simple and efficient means is obviously needed to order the information so as to enable the users to obtain what they need quickly. Automatic text classification is such a means, which has become a very important technology of great practical value. Now much attention in the field of information processing has been focused on automatic text classification and some means, including theories of statistics and machine learning, have been employed for this end.This thesis, based on experiential data, presents the design and implementation of automatic classification of Chinese text with all its aspects of establishing knowledge base, automatic word division, character choice, and automatic classification. There are five chapters in the thesis.Chapter one gives a briefing of background and meaning of the research, the home and foreign situations of similar research, and the main content and structure of the thesis.In Chapter Two, the author describes her classification knowledge base, based on the system of Chinese Library Classification (CLC). The data comes from sources of CLC, CNKI, and the index data of Xiamen University Library.In Chapter Three the author introduces some popular arithmetic of word division. Considering the technical difficulties of Chinese word division, this paper adopt the word segmentation strategy base on dictionary, and describes in detail the organization of dictionary, the arithmetic of word division, the weight calculation of character, and the combination of characters. How to identify new words is also introduced briefly.In Chapter Four, the author improves the weight calculation of character and brings forward the Arithmetic of Automatic Classification based on the concept assembled theory. This paper adopts the improved Dice coefficient as the calculation method of relation; consider the aspect of the influence on keywords' weight, the author imports two parameters into the formula. Based on the concept assembled theory, the classification with the maximum weighted was best one to the document which needs to classify.The automatic Chinese text classification based on knowledge base is designed in Chapter Five. The testing result is also analyzed with shortcomings and basic ideas of revision stated.

Keywords/Search Tags:

Knowledge base, Automatic indexing, Automatic classification

Related items

1	Study On The Theory & Practice Of Automatic Indexing Of WWW Science And Technology Information Resources
2	The Design & Practice Of Network Based Intelligent Knowledge Service System
3	A Study On The Automatic Classification Of Cultural Resources Of Minority Nationalities
4	Automatic Commonsense Knowledge Base Construction And Completion For Chinese
5	Research Of Automatic Indexing In Economic Bibliographical Database
6	Research And Implementation On Automatic Indexing Method Of Texts
7	Research On Automatic Indexing System Of Economic News
8	Research Of Automatic Knowledge Base Construction Based On Hierarchical Multi-labels
9	Research On Short Question Classification Based On Automatic Question And Answering
10	A Research To CRF-based Automatic Subject Indexing For Chinese Books