Font Size: a A A

A Semi-supervised SVM Text Classification Research Based On Information Entropy Weighted Denoising

Posted on:2015-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2298330431993053Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the information increases fast on the network, it presents a variety of forms,the text categorization’s task is classifing the network information, let us search theinformation esay. The technology of search Chinese text classification system andclassification is mature, it uses in more fields widely, the classification system use inall walks of life widely. But the new data is messy and no rule, when we want tosearch information, it is difficult to search the information you need from a largenumber of messy and no rule data,so it is very important to search and develop theclassification system.There are lots of various noise data, the noise data impacts accuracy of theclassification, it can improve the quality of the data after process the noise, Now themethod of processing noise data needs to find out these data then deletes them. Thisway may be lost some important data, this article uses the information entropyweighted method to detection and reclassify the noise data, it can improve the qualityof the data and the classification accuracy.The article considers it from two aspects, it does the relevant research. The First,it combines with a semi-supervised thought, when it researchs the traditional svmclassification algorithm, it expands the training set with the semi-supervised thought.The precision is very poor when it chooses little logo sample to train classifier, Thearticle bases on a semi-supervised thought, it can use a large number of not labelsamples to improve the classifier performance, the experimental result shows that itcan improve the accuracy of classification. On the other hand, the article combineswith the denoising idea, after it classifys the no tag samples, there may exist wrongsamples, the article uses the information entropy weighted denoising method, it canreduce noise samples to affect the performance of the classifier. The experimentalresult proves it can improve the accuracy of classifier.This topic chooses2000*10texts as corpus from the internet as sina and sohu.The article tests the method, and it compares with the no denoising svm algorithm.The experimental results show that the method can detect out the noise data, itreclassifys the noise data and joins the training set, the method can make theperformance of the classification system better.
Keywords/Search Tags:SVM, Denoising, Entropy
PDF Full Text Request
Related items