Font Size: a A A

Optimizing Web Page Classification Algorithm By Using Hyperlinks

Posted on:2015-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:F L ShiFull Text:PDF
GTID:2298330452953308Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of the network technology, we haveentered an information rich era. In this background, due to its efficient and convenientcharacteristics, the search engine has obtained people’s favor and has become themain way to obtain information. However, since the search engine results alwayscontain many irrelevant web pages, which seriously affect the quality of search, it isan urgent problem to solve. Web page classification not only can solve this problemeffectively, but also can make the organization of information resources morereasonable, and can help question answering systems, information filtering and so on.Web page classification has become an important research topic. Therefore this paperresearches the web page classification algorithm by using hyperlinks, and a web pageclassification system of the improved was implemented.The main research work of this paper can be summarized as follows:(1) We utilize the similarities between pages to optimize the original algorithm isproposed. In order to solve the problem of interfered by the noisy neighbor pages, weset different similarity thresholds for the different link relations.When we classify thetarget web page, the neighbors can participate to calculate only they meet theconditions. Doing so can reduce the influence of noisy neighbors.(2) We use Support Vector Machine to improve the effectiveness of classification.The text of web page contains a wealth of information, and if using proper cancontinue to improve the precision of classification. Support Vector Machine is a veryeffective classification algorithm. The results of text classification by Support VectorMachine are also used.(3) According to the optimization method presented, a web page classificationsystem was designed and implemented. In the part of outline design, we illustrated theprinciple, the goal, the development environment and the overall structure of thesystem. In the part of detail design, we explained the functions, the sub-modules, theprocesses and the implementation details of each module.(4) In order to verify the effectiveness of the proposed optimization method, thetwo reference classifiers was implemented, one is based on Support Vector Machine,the other is based on the original web page classification algorithm by usinghyperlinks. We used three kinds of methods to classify the experimental data, and thencalculated and compared the precision, recall and F1value.The experimental results show that, the optimized algorithm has a good effect,the performance is much better than before.
Keywords/Search Tags:web page classification, link relation, Support Vector Machine, similarity
PDF Full Text Request
Related items