Font Size: a A A

Research On Chinese Text Categorization Algorithms Based On Technology Text

Posted on:2008-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2178360212995315Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text categorization appears initially for text information retrieval systems; however text data increases so fast that traditional research methods have been improper for large-scale text categorization. So text data mining emerges, and text categorization becomes more and more important as a major research field of it.Content and format technical text has special characteristic. However, as one kind of special text, technical text categorization only got little attention. At the same time, the necessity for technical text categorization increases steadily. Considering above situation, we proposed study on computer technical text categorization.Firstly, Chinese text categorization algorithms are studied from algorithms application and classification effects. Categorization thought, text pretreatment methods, feature vectors selection and feature representation methods of all kinds of algorithms are analyzed. At last theorems and methods of evaluation and contrast all kinds of algorithms are put forward based on this content.Secondly, special characteristics of technical text are analyzed; keywords representation selection algorithms are put forward. Titles, abstract and keywords of technical text simply response to the most important content of text, and have little the description of irrelevance. Algorithms select keywords from this information, replace of Chinese segmentation.Thirdly, a Chinese text categorization algorithm based on technical text is put forward. Computer technical text hierarchical categorizations are realized. It bases characteristic of technical text, selects keyword sets from main information, and applies hierarchical model constructing algorithm and auto text classification algorithm to Chinese technical text categorization. Thisalgorithm can improve categorization precision.Experimental results show that the algorithms proposed in this paper are more efficient than the current ones, and have much higher precise and recall.
Keywords/Search Tags:Text categorization, Vector space model, Word segmentation, Feature selection, Weight
PDF Full Text Request
Related items