Font Size: a A A

The Text Classification Research Of Chinese Technology Text

Posted on:2007-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:2178360212477459Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With increasingly recognition of technology and society development, science domain is oriented to pluralistic and modern trend. In this case, the researcher need the high-effect, complete and convenient science information more urgently than ever. In the conformance of age request, Chinese technology text classification research takes on the high theoretical value and applied foreground.This paper carries through research by the numbers in allusion to special style format and language color of the Chinese technology text. The whole paper mainly develops in three aspects of pretreatment, character retrieval and classification arithmetic and emphasis on two aspects containing natural language processing and classification arithmetic based on bed-classification model.In the aspect of the pretreatment, this paper contains two approaches: the one is basic text data pretreatment, and the other is Chinese participle. In the aspect of character retrieval, this paper consists of character denotation and character optimize. But emphasis put in the character denotation.In the aspect of natural language processing, this paper set up a new natural language processing model. In the syntax, this paper puts forward a new chunk analysis strategy based on the estimate rule of the part of speech in order to progress the phrase logical analysis of the divide and rule; in the semantics, in the light of strong domain's character of Chinese technology text, sets up a domain concept-tree model. And on the basis of this model, puts up the notional analysis so that solve synonymy's phenomena ulteriorly; In the pragmatics, puts up a context analysis method based on the similar degree and the associated degree between word and word .The experiment makes clear that choosing domain concept as character item is better than others .Its F1 of macro-average is 79.35%, and its F1 of micro-average is 88.00%.In the aspect of classification arithmetic, this paper puts up a new level-classification model. Its basic idea consists of three steps: firstly, carries through three-layer partite processing. Secondly, puts the most correlative element hang together to put up the classification processing. Thirdly, according to different classification's request, selects and unites different classes by the threshold value of...
Keywords/Search Tags:Natural language processing, Level-classification model, Classification arithmetic
PDF Full Text Request
Related items