Algorithms For Extracting The Web Hidden Database And Skyline

Posted on:2018-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Shang

Full Text:PDF

GTID:2428330542497617

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays,the Internet has become one of the important symbols of the 21st century.Internet users around the world have also increased dramatically,and the information resources on the Internet have become increasingly abundant.The Internet is a huge and sharing information space,it also have those characters such as global and distributive.More and more information is stored in the backend databases of the major websites for Internet users to use.Network has become the platform on which the information be query,and at the same time,huge amounts of information was hidden in the query limited Web backend database(also known as hidden Web database),users are unable to obtain these high quality information data records effectively.Aiming at this problem,we put forward the research of this topic,which will helps users to extract useful information from mass data efficiently,and returns the usefull information to the user,providing users with convenient services.Unlike regular orders of magnitude,the extraction of large-scale data has many problems and difficulties.In particular,the problem of extraction of Deep Web data is limited by the number of free query times of the Web page and the number of return results.In the face of these problems,we need to consider the use of tools and programs,the allocation of system resources,data mining methods and techniques,and how to store access to data.The main problem is how to estimate and control the number of necessarily query,and realize the extraction of the whole hidden Web database.The current Web data mining area has taken this issue as a research hotspot.There are many ways to use to solve the Web data extraction problem.In this paper,we made depth analysis on the characteristics of the data in Hidden Web database,also carried on the thorough analysis to the existing extraction algorithm and put forward some improving,in the part of experiments verified the effectiveness and superiority of the improved algorithm.The Imain content and work of this paper include the following aspects:(1)In this paper,the hidden Web database is divided into three categories on the basis of predecessors' research,they are numerical attributes,classification attributes and hybrid attributes,and we made an in-depth study on the problem based on these three types respectively.On this basis,we study the Skyline extraction algorithm of the Web hidden database,so that we do not have to extract all the data first then to calculate its Skyline.(2)For the problem of extracting the numerical attributes in Web hidden database:on the basis of the method for dividing the spatial partition of numeric data sets,we proposed a multidimensional dynamic partitioning algorithm based on distribution,MDPA for short.(3)For the problem of extracting the classification attributes:we proposed an improved heuristic slice-cover algorithm(AHSCA),which can divide the data space that constituted by all data points belong to classification attribute.The algorithm can choose the attribute object of the next partition flexibly,there by reducing the query cost and improving the efficiency of the algorithm.Combining AHSCA and MDPA,a hybrid extraction algorithm based on hybrid attributes is proposed.(4)For the problem of extracting the Skyline of the hidden Web database,we proposed a heuristic query decomposition algorithm based on the definition of the intersection element query tree and the complete intersecting nature of the Skyline group.We structured the query tree by means of the depth first traversal or breadth first traversal,then we get the Skyline of the hidden Web database D meanwhile.(5)The validity and superiority of the above algorithms have been verified in this paper.

Keywords/Search Tags:

Web data extraction, Web hiding database, AHSCA, MDPA, Skyline

PDF Full Text Request

Related items

1	A modular data pipelining architecture (MDPA) for enabling universal accessibility in P2P grids
2	Research On Skyline Query Processing Techniques
3	Data Distribution Level Skyline. Distributed Database Computing
4	Research On Index-Based Skyline Algorithms
5	Study On Skyline Aggregation Queries
6	Research And Implementation Of G-Skyline Query Algorithm On Massive Data
7	Research And Implementation Of Skyline Query Algorithm In LBSN Environment
8	Study On Skyline Query Processing Techniques In Wireless Sensor Networks
9	Research On Methods Of Multi-k-Dominant Skyline Query
10	The Research Of Skyline Queries Algorithms Based On MapReduce