Font Size: a A A

Research And Implementation Of Chinese Web-page Classification Based On Web Data-mining

Posted on:2013-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:W M GanFull Text:PDF
GTID:2248330362970896Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the information-age today, network has become the main channel for obtaining all kinds ofinformation for human. Now, the amount of web-page which is the main carrier of networkinformation is very big and grows every day,its contents is also mixed-up.In order to organize andanalyse them effectively,to classify them according to their contents becomes the primary task.However,the problems brought about by web-page development affect the performance of thecategorizing system.This paper did a deeply learning and analysis on the web-page classification technology andsummarized the shortcomings,then researched and improved the categorizing system according tonoise problem and speed problem which affect its performance.Web-page noise badly affects theaccuracy of classification results.Thus,this paper took web-page purification as a independant modeland applied method of web-page purification in the categorizing system.The method combinedstructure rules with semantic rules that can filtrates the noise effectively. Focused on the need ofenhancing the speed of the categorizing system,this paper adopted the SVM algorithm based on polykernel function.During the training,the promoted binary tree training method was used which basedon hypersphere decision radius,so that can improve the training speed. And then, this paper alsooptimized the computing method of the decision function during the classifying,the calculatedquantity was reduced and the classifying time complexity was droped. Through increasing the speedof training and classifying, the work effeciency of system was improved.In the end, this paper implemented the system and tested the whole system and all themodels.The result shows that this system has ideal practicability and effectiveness.
Keywords/Search Tags:Web-page Categprization, Web-page Purification, SVM, Binary Tree, Decision Function
PDF Full Text Request
Related items