Font Size: a A A

The Study Of Chiniese Text Classify Base On Semantic Concept

Posted on:2007-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:W P ChenFull Text:PDF
GTID:2178360182483076Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The classification of text is base and center of text's mining. As thehotspot of data mining and net mining in the recent years, it plays an importantroles in many place ,such as in the foundation of traditional informationretrieval,net-index system structure and retrieval of Web information and soon.In this task, at first we did some work on common method of the keyproblems about the classification of text, at the same time, we expatiate thekernel technique and system structure of test-class system, and give thedescriptions of application of text classifications by the way. Then pay muchattention on introducing of a Chinese text classification's system which isbased on semantic, expatiating the foreclose of text, the distilling of features,eliminating discrimination and the reduction of feature semantic dimensionality,the expression of feature vector space and the training of text machine.In Chinese text classification's system which is based on semanticconcept ,beginning we did pretreatment on text including participle label ,deleting stop word,extracting feature document and counting documentfrequency.The emphases of system is eliminating discrimination and the reduction offeature semantic concept, looking the semantic concept analysis method as aspread of VSM. Obtaining the orthogonality between documents as much aspossible, we combine Hownet to deal with the concept of documents byeliminating discrimination and the reduction of feature semantic concept, sothat we can express the keywords of text in smaller semantic space and thecorrelation text is more closer in the new making semantic space.This system uses SVM to classify texts. SVM is a fashion and speedinessway in class-method .To train texts, we use the labeled texts to train, thengetting the fixed classified knowledge storage. Input the waiting text's featurevector, then win the classify result by utilizing the classified knowledgestorage.At last, we did a close testing on Chinese text classification's system thatbased upon semantic concept, experimentation results display that this methodhas higher accuracy and recall.
Keywords/Search Tags:Text classify, SVM, Feature extract, Semantic concept, Eliminating discrimination and the reduction of feature semantic dimensionality
PDF Full Text Request
Related items