Font Size: a A A

Research Of Skyline Preference Query Based On Incomplete Dataset

Posted on:2019-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z ShiFull Text:PDF
GTID:2428330545454774Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the dramatic development of information technology such as Internet,Internet of things and so on,the way of producing data is also increasingly diversified.An important aspect of data availability is integrity.Unfortunately,incomplete datasets are a frequent phenomenon due to machine abnormalities,privacy,human error and widespread use of automated information extraction and aggregation.How to efficiently acquire user's information from incomplete data has become an important issue that should be resolved.Skyline query can provide effective decision analysis and preference query results that meet the needs of users,so it is widely applied in a lot of fields,such as Multi-objective decision making,environmental monitoring,market analysis,data mining and so on.Data cleaning,repair and other pre-processing is a common incomplete data processing method,then data query is performed on the cleaned and repaired data.These methods not only bring great time cost,but also may introduce new 'noise',which leads to a number of deviations of result and the result cannot meet user's demand.At present,obtaining personalized information from incomplete dataset lacks efficient and accurate strategy.In this paper,a Skyline preference query algorithm based on incomplete dataset is proposed,Which can extract personalized information based on user preferences on incomplete data sets and improve Skyline query efficiency.Firstly,clustering the sub datasets with different importance after partition by using different strategies.In clustering,Skyline query space can be shrank by pruning some tuples which dominated by others.Then,executing two algorithms the two query subspaces for clustering that are the Skyline query algorithm based on tuple sorting which can assure the accuracy and the Skyline query algorithm based on domination degree simplifies processing that is very efficient.As the result of two Skyline algorithm,two local Skyline results could be obtained.Last but not least,the global Skyline query results are selected based on whether the intersection of two results is empty.If intersection is not empty,the intersection is returned to the user as the global optimal solution.If the intersection is empty,generalization center classification is applied to the union of two results for abtaining suboptimal solution.A great quantity of experimental results show that the proposed SPQ-I can obtain results that meeting user needs according to different user preferences.The accuracy is high,and the efficiency of SPQ-I in dealing with the high dimensional incomplete data is remarkable compared with SIDS and CDSkyline.
Keywords/Search Tags:incomplete data, Skyline query, user preference, clustering, dataset partition
PDF Full Text Request
Related items