Font Size: a A A

Research And Realization Of Web Information Mining Model Based On Topic Features

Posted on:2014-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2248330398972155Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the ever faster development of the Internet, the web, itself has already been a warehouse filled with disperse and unstructured massive data. It is the most urgent for web information mining to handle this problem-how to acquire the worthiest information from numerous and complicated web resources.On the basis of present status of web information mining, on the one hand, our subject is devoted to study on the related methods of processing of topic features; on the other hand, it provides us with plenty effective methods to process information based on our research in thematic dimensions of the websites, classification of webpage’s structure and fine-grained access control for web pages. The paper’s major work and achievements are listed as followed:Firstly, propose a model of vertical mining of theme sites with both consideration of web structural mining and content mining, which can deal with different kinds of web pages from numerous theme sites, extract information with a higher fine granularity and also support incremental learning.Secondly, realize the extraction of thematic dimensions and structures and classification of webpage’s structure. It puts forward a new URL representation of triple structure consisted of URL string, URL anchor text and in-degree of URL web page, which can effectively analyze thematic dimensions and structures, and ultimately accomplish the classification of webpage’s structure.Thirdly, design a new method to extract webpage’s information, which exactly extracts text by usage of web page content extraction based on title and content Dependency Tree, gets fine granularity attributes of different kinds of web pages on the basis of semantic rules and ultimately realizes the accurate acquisition of web pages’information.By usage of the above achievements and keeping in accordance with practice requirements, the paper constructs a system of website vertical mining. It is a verified model that can mine topic websites’information intelligently, completely, effectively and exactly.
Keywords/Search Tags:topic features, structural mining, content extraction, granularity theory, web pages attributes
PDF Full Text Request
Related items