Font Size: a A A

Research And Realization Of Labeling Techniques Of Internet Website

Posted on:2013-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:D Q GuFull Text:PDF
GTID:2268330398470516Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and industry, there is an ever-growing amount of data involved in various applications and fields in the Internet. Although the prosperity of Internet enriches our daily life, it increases the difficulty of getting specific target information from a mass of data. Website labeling, which is greatly helpful to analysis Internet by labeling websites accurately and comprehensively according to different themes, has enormous significance for people to find information more quickly and accurately.This paper improves technique of website labeling based recent researches. The main contributions in this paper are presented as follows:Firstly, a dynamic extraction strategy for website key resources is presents. By combining the classification of key resources and qualified reptiles crawler, web pages which best represent the theme of the website could be found. By dynamic extraction strategy, most of the critical resource would be obtained by downloading few web pages.Secondly, an improved algorithm is proposed according to the practical requirement for website multiple labeling. This algorithm has a good adaptability for multi-field website data with incomplete fields. Experimental results show that the effect of website for multi-label classification has been significantly improved by using this algorithm.Thirdly, in the experiment, a website labeling system is built based on the dynamic extraction strategy and website multiple labeling algorithm mentioned above. The system dynamically extracts the key resources of the website by taking a website seed links as input. The improved multi-label classification algorithm is applied to identify multiple tags for the website. Experiment results show that the performance of website labeling system is good, and the accuracy is improved.To sum up, this paper improved the existing technology for website labeling with lower resource cost and higher accuracy.
Keywords/Search Tags:website labeling, key resource, multiple data fields, multi-labeling
PDF Full Text Request
Related items