Font Size: a A A

Research On Method Of Chinese Text Classification Based On Cloud Computing

Posted on:2013-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhouFull Text:PDF
GTID:2248330395985085Subject:Digital media
Abstract/Summary:PDF Full Text Request
With the developing of Internet and the increasing of users, the Chinese text forthe Internet are growing quickly, how to abstract invaluable information frommassive data becames an important problem and need to be solved. Now the maintext-classifying methos can be devided in two kinds as Knowledge-engineering andStatistical learning.The method based on Kowledge-engineering is mainly dependedon rules that defineted by the professionals, then considering wethere the textbelongs to which class by matching the text and the rules. Statistical learning use thetext as material, and use computers to abstract classifying rules, then use this rules toclassify automatically for unknown texts. Recently, Statistical learning has been themain method to deal with the classifying for text. But this method will be constraintedby speed of computing processing and memory, especially in much text processing.For solving the problem in text classify based on Statistic-studying, this essaywill use the Cloud-computing technology and started with the difficulty of computingprocess and memory. Using the skill on Cloud-computing can search the metric ofcomputing and story easily and how to classify the text using the Map/Reduce dataprocessing models. In this essay, we use an method called SVM, which represent itsadvantages different from others in dealing with lineness-undevision and littlesamples problems. The nature of the SVM algorithm is transforming the textclassification problem into an inequality constrained quadratic programming problemswhich try to seek the the largest margin with the geometric constraints. Theimprovement of the SVM algorithm in the title is that converse the quadraticprogramming inequality constraints to the equality constraint,and that make thesolving process more simple.This study focus on how to use the open source Hadoop cloud computing systemsto build a cloud platform, and how to use MapReduce model to achieve the improvedSVM classification algorithm on the cloud computing platform. The finalexperimental results show that the new algorithm is better than the SVM algorithm toimprove the pre processing efficiency.
Keywords/Search Tags:Cloud Computing, Text, Support Vector Machine, MapReduce
PDF Full Text Request
Related items