The Research On Focused Web Information Extraction

Posted on:2016-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Dai

Full Text:PDF

GTID:2308330479995434

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Internet has become a largest of information carrier which contains huge value. F or example, Google, Baidu, etc., can provide precise and high-efficient service through Internet information. But how to use web information effectively has become an important research topic. The massiveness, dynamics and heterogeneity characteristics of web information in cross-domain poses challenges for web information extraction. In order to improve the expandability, this paper researches on methods of web information access and information extraction. The main contents are as follows:(1) We propose an effective unsupervised focused crawler based on URL structure filtering(UURLSF), which guides the implementation of reptiles by analyzing the URL, and has a higher efficiency than others. And its unsupervised weight mechanisms can improve the portability of focused crawler.(2) We propose a visual unit-based extraction method which extracts news content according visual units. The visual units are identified by a top down approach based on visual features and text features. And the visual unit is independent of html and it is can improve the probability of method, meanwhile, it has a good effect.(3) We propose a modeless approach which called web information extraction based on increment clustering. It is a modeless and data-driven reasoning mechanism, and it issues global-based and local-based stability clustering evaluation methods respectively. The results of experiment prove that our approach has a good adaptability with the rapid growth of Internet data.

Keywords/Search Tags:

Information extraction, Focused crawler, Visual unit, Increment clustering

PDF Full Text Request

Related items

1	Research On Topic Focused Web Crawler And Related Technologies
2	Research And Implementation Of Focused Crawler Based On URL Patterns
3	The Design And Implementation Of Enterprise Information-Oriented Web Focused Search
4	Research And Application On The Key Technology Of Focused Crawler
5	Design And Implementation Of Topic-focused Crawler For Education News
6	Design And Implementation Of Multi Information Web System Of Automotive Industry Based On Focused Crawler
7	Research And Implementation Of On Semi-automatic Ontology Construction Base On WordNet And Focused Crawler
8	Research On Entity-level Search Crawler And Information Extraction
9	Research On Focused Hidden Web Crawler
10	The Study And Implementation Of Focused Crawler Technology For Android Technical Information