Font Size: a A A

Efficient K-dominant Skyline Query Based On Dominate Hierarchical Tree In MapReduce Environment

Posted on:2020-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2428330578950920Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deeply influenced by the development of the information age,the processing and use of massive data has penetrated into the development of business,science and technology,finance,education and other industries.Information forecasting,e-commerce,financial statistics and other new information industries have a large amount of user data information,but also face a huge amount of information complexity.It is necessary to extract effective key information from a large number of mixed data.Skyline query is a key technology of information processing in the field of big data.It is widely used in many fields,such as prediction of friendly relationship(big data of social network),vehicle record of Expressway(big data of traffic),commodity recommendation(big data of e-commerce),and so on.As the amount of data increases dramatically,because skyline queries have no control over the choice of information,the results returned will also increase due to the influence of data volume and data distribution.K-domimated skyline query can remedy the above shortcomings.It controls the selection of attributes by controlling the parameter k,so as to achieve the purpose of controlling the size of the result set.However,due to the variability of parameter k,the general k-domination algorithm can only be selected according to the user's requirements in a certain aspect,which depends on the user's degree and has little variability.Meanwhile,In the aspect of missing attribute values,when there is a lack of attributes in the acquired information,we deal with incomplete information in a special way,so that incomplete information can also be recommended.In addition,faced with the increasing amount of data,centralized operating environment is diff-icult to deal with large-scale data processing.Therefore,the proposed distributed parallel computing framework solves this problem well.Through continuous research,the improved recommendation algorithm can well adapt to the parallel computing framework,so that the processing efficiency has been significantly improved.Based on the above research,this paper proposes an effective K-dominant skyline query method based on dominant hierarchical tree in MapReduce environment and K-dominant skyline query method on incomplete data in MapReduce environment.The main contents of this paper are summarized as follows:(1)This paper proposed a dominant hierarchical tree structure DBH-Tree(Dominant based Hierarchical Tree),which is based on the number of dimensions dominated by data objects.The tree index structure is constructed.The data is divided into different leaf node subspaces and queried on the sub-nodes to improve the query efficiency.(2)This paper proposed a query algorithm based on MapReduce DBHA(MapReduce-Dominant based Hierarchical Algorithm,MR-DBHA).According to the domination,Map function divides the data into parallel subspaces.Reduce function is used to further analyze the domination in subspace,and k-dominant skyline query results are returned directly.(3)This paper proposed an index structure on incomplete data ID-DBH-Tree(Incomplete data Dominant based Hierarchical Tree).Applying the "bucket" strategy,the incomplete data are divided into different "buckets" according to the dimension of missing attributes,and the control situation inside the "bucket" is analyzed,and the k-dominated result is obtained.(4)This paper proposed a query algorithm ID-DBHA for incomplete data in MapReduce environment(MapReduce-Incomplete data-Dominant based Hierarchical Algorithm,MR-ID-DBHA).Firstly,the MapReduce process is used to divide the bucket.Secondly,the data in the bucket is allocated to the subspace according to the dominant condition by Map function.Reduce function controls the data according to the key value and returns the k-dominant skyline query result.
Keywords/Search Tags:Big data, k-dominant, skyline query, MapReduce
PDF Full Text Request
Related items