Font Size: a A A

The Research On Tag Library For Labeling The Internet Website

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2248330398472146Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the amount of information on the Internet has undergone explosive growth. The web pages store a lot of valuable information covering various fields. Site classification need to deal with large amounts of data while the correct rate is not high and the design of category is another problem. The website information extraction is not applicable to a big amount of different websites because it can only process site with the same structure, even if the template is got.In web2.0era, the tag has been widely used by blog system, forum, video site and it becomes a new form of infromation classification and organization.This paper study about website labeling and tag library’s design to save as well as to organize information.The main work of this paper include two parts:research and design about tag library and the method to label website. The first part study the traditional classification and tag taxonomy, then analyze and compare category and tag cloud, propose a tag library with multi-faceted complex hierarchical structure, select six properties to describe the website:the principal nature,form, industry, topic, region, language. The second part, after the analysis of structure of the website, propose a method combining web page classification and the web automatic indexing to extract tag in accroding to the structure of the tag library. Create a site tree by analyzing the site topology structure and classify the webpages according to the tree, then extracte tags by automatic indexing content pages. Final design several experiments, the results prove that the method has achieved good performance.Website is an important information carrier of information, it is also very important to information retrieval and other research. This paper design tag library structure and propose a method to label extraction website which has achieve good performance, but there are still some problems that need further research, and it motivates us to study more.
Keywords/Search Tags:website label, tag library, webpage classfication, automatic indexing
PDF Full Text Request
Related items