Font Size: a A A

Text Classification Based On Confidence Machine

Posted on:2012-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:N G ChenFull Text:PDF
GTID:2248330395485440Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the social informatization and networking promoted gradually, Internet hasbecome the world’s largest information resource. However, the rapid expansion ofinformation allows the user to be overwhelmed by a massive number of irrelevant data,How to classify data effectively is becoming one of the challenging topic ofinformation technology field.Traditional text classification mehod,based solely on thecorrectness of the final assessment of the results, can not meets the currentclassification requirements anymore. Therefore, a further precise text classificationmechanism based on confidence machine was studied in this dissertation.The confidence machine theory was elaborated firstly in the dissertation,confidence machine is a new classify method evolving from the algorithm randomnesstheory. A confidence value is attached to the classification results by the confidencemachine, and the machine can be divided into two concerete subcategories-inductiveconfidence machine and transductive confidence machine. Combine with the testclassification, the classification property of the transductive confidence machine isresearched. It can be demonstrated from the experiments which employ the Sougoulab artificially editing text categorization corpus that transductive confidence machinecan effectively distinguish different category text and produce a confidence value forthe classification result. For KIII model can accomplish the spatial transformation ofthe Eigenvectors, a new text classification algorithm combine the KIII model and theconfidence machine was proposed. KIII model was applied by the algorithm tocompleting the space transforming of the text feature, and then identify them by theconfidence machine. Compared with the K nearest neighbor and KIII mode, it can beseen from the experiments results that the proposed algorithm can provide a moreextensive classification information and has a superior performance. Finally,considering the efficiency of mass text classification, a text categorization modeltaking account of the confidence machine and the clustering algorithm was suggested,the texts will be clustered before the confidence machine classification. Theexperimental results show that this proposed model has good performance.
Keywords/Search Tags:algorithm randomness, confidence machine, Text classification, KIIImodel, clustering algorithm
PDF Full Text Request
Related items