Font Size: a A A

Research On Chinese Text Classification Technology

Posted on:2008-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360215980735Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fact that the network universal develops at full speed, the network resource already becomes a common whole world information treasure-house , all human being information resources are connected each other with hitherto unknown way and degree within the whole world , it increases by as a open amounts information speed also with the exponent in distributing the dyadic information space。How make use of a computer to carry out information processing becomes the hot spot and focal point in the past few years. Because of the emphasis differently, it can be sorted several fields: Information Retrieval, Information Extract, Version Classification, Version Abstract etc. Chinese version classification is one of most important problem, it is paid close attention because of its a lurking applies value.There are a lot of methods about Data Mining; we carry out the experiment and the function comparison on several kinds classification methods (such as Boolean Model, Vector Space model, BP neural networks). And we have also studied parameters such as vector dimension; threshold value interposes the effect to model of all kinds. It provided the theory basis to algorithmic improvement。Among many methods of text classification based on Count Language Model, we mainly studied Maximum Entropy Model and Decision Tree Model.ID3 algorithm is widely used in information filter. We have realized a program of text classification based on ID3 algorithmic. Procedure can carry out effective classification on Chinese text. The experiment indicates that Decision Tree is one kind of effective classification technology indeed. We have brought forward one kind of the ID3 algorithm improving in the papers.Zipf Law have revealed relationship between word frequency and word serial number in the west language firstly, it has very broad application value. The experiment verifies that Zipf Law can also be used in Chinese word distribution and we proposed a way to estimate and to verify the parameters in the papers.
Keywords/Search Tags:Text Classification, Vector Space Model, Maximum Entropy Model, ID3 Algorithm, Zipf Law
PDF Full Text Request
Related items