Font Size: a A A

Research On Progressively Semi-supervised Text Classification Based On Markov Random Walk

Posted on:2013-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:X P ChenFull Text:PDF
GTID:2298330377459850Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology and data storage technology,a large amount of data has been accumulated in the field of scientific research andsocial life. To get useful information by analyzing and mining the data, graduallybecome the common needs of almost all areas. Traditional machine learningmethods usually use the labeled data or only unlabeled data, however in the practicalproblems, both labeled data and unlabeled data are coexist,thus how to effectivelyuse these data becomes almost all areas of concern. As the key technology that caneffectively solve this problem, semi-supervised learning has aroused High degreeof attention in the field of machine learning and data mining.Semi-supervised Learning can be divided into two categories, semi-supervisedclassification and semi-supervised clustering respectively according to its studyingpurpose. Its main idea is that how can we combine the labeled training set with smallnumber and the unlabeled ones with large number to improve the performance of theclassification. We discuss semi-supervised classification mainly in this paper.The algorithm based on Markov random walk is available for the representationof low-dimensional data in the form of the probability, and has a very powerfullearning ability, which is widely used in semi-supervised learning problems. Thispaper firstly presents a Semi-supervised Text Classification model based on MarkovRandom Walk(referred to as SMRW), which has been improved on the traditionalclassification model based on Markov random walk, we calculates the migrationprobability of samples to be marked, considering only samples of the appropriatecategory, while ignoring the other classes of samples; meanwhile we take advantageof the decay function to restrain the migration probability of the different migratorysteps. The results on20newsgroups dataset in the experiment show that the modelhas better classification performance.With few of marked training samples, the performance of the classifier got inthe initial training phase is not high,and the errors, caused by the misclassifiedsamples in the iterative process of semi-supervised learning, would continue to beenlarged in the subsequent iteration, thus affecting the accuracy of the model. Toaddress this issue, a progressively semi-supervised text classification model based Markov random walk (referred to as PSMRW) is proposed, which combines theprogressive learning with semi-supervised classification, trying to "correct"semi-supervised learning iterative process generated by the "wrong", and thusimprove the prediction accuracy. The results on20newsgroups dataset in thisexperiment show that the proposed method can improve the accuracy ofsemi-supervised classification.
Keywords/Search Tags:semi-supervised classification, progressive learning, Markov randomwalk, iterating
PDF Full Text Request
Related items