Approach On Vison Based Deep Web Data Extraction

Posted on:2015-11-20

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Tan

Full Text:PDF

GTID:2298330431964352

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, Network technology has become more and more popular.With itâ€™sdevelopment, the web has became a huge resources with massive valuable data. Nowmany applications,such as market intelligence analysis,are in an urgent need to minethese data for obtaining useful information,and then the greatest degree of auxiliarydecisions. However, web data has features such as large scale, heterogeneous,autonomous, distributed, etc, which makes the analysis of web data mining hasbecome particularly difficult. It is imperative to integrate them to provide high-qualitydata mining analysis. According to web information inherent in the "depth", the webis composed of Deep Web and Surface Web. Deep Web data is far exceeds the SurfaceWeb on the quantity and quality, and has higher value. So, how to extract Deep Webdata efficiently for effective analysis has important practical significance and broadapplication prospects.Information on various sites on the Internet are independent,So,it is hard tocomplete Deep Web data collection. In this case, the usual search engines play anegligible role in data mining. Writing rules by hand to complete the informationextraction has low technical threshold,though high accuracy. But for thediversification of information resources and potential revision risk, the manual way ocan not meet the needs of people access to information. Combined with the abovebackground, we can see that the implemention of web information automaticextraction technology is in a very urgent need to address the problem. To solve thisquestion, this paper do some in-depth and systematic research on Deep Webinformation extraction automatically technology, including vision-based webinformation, machine learning training model, Deep Web information automaticallyextracted, and other aspects of the alignment of data items, and develop the system ofWeb information extraction automatical system.In this paper, specific research workand research results are as follows:(1) Based on visual features, getting a visual-block tree through splitting webpages, and then based on the visual-block tree, integrating the visual attributesthat the data region positionning needed, getting the machine learning trainingset. (2) Using effective tool of training machine learning,combining mannua rulesto remove duplicate and noising informaition, accurately complete the DeepWeb data region location.(3) Proposed effective alignment rules to improve the alignment accuracy of thedata item.(4) Based on the above research, develop the Deep Web information automaticallyextraction system, system implementation features include:1)web page visualtree transformation;2)data region automatic position;3) data items completelyextraction;4) generating Wrapper;5) Auto flip function completionAchieved show that the proposed technical approach can extrac rich list pagesdata basically with no human intervention and quickly and automatically.

Keywords/Search Tags:

deep web, data extraction, visual features, machine learning

PDF Full Text Request

Related items

1	The Research Of Features Extraction Based On Improved Deep Neural Network
2	The Deep Learning Of Visual Features In Image Search Applications
3	Global Features Learning And Absolute Scale Recovery In Monocular Visual Localization
4	Block Ciphers Identification Scheme Based On Machine Learning
5	Research On No-reference Image Quality Assessment Algorithm Based On Visual Perception Features
6	Research Of Semi-structured Data Extraction Based On Feedback Learning
7	Research On Visual Features Extraction Of Industrial Robot Based On Deep Learning
8	Data Catagory Oriented Research Of End-to-end Machine Reading Comprehension Based On Deep Learning
9	Research On Collaborative Filtering Recommendation Algorithm For Fusion Item Visual Features
10	Model Design And Optimization Of Deep Learning For Visual Tracking