Font Size: a A A

Research On The Query Semantic Of Personalized Top-k And Entity-oriented U-Topk And Its Processing

Posted on:2016-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z B MaFull Text:PDF
GTID:2428330542457266Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Top-k query technology is widely applied in various fields,which is to find out the highest k tuples according to the user-defined scoring function in advance.Often time,user can not precisely the scoring function,which means that the scoring function is uncertain.At the same time,the results returned by the Top-k query of many users are same,which makes it harder to apply to personalized query.However,with the development of data acquisition and management technology,the uncertain data is discoveried in more and more fields,such as Wireless Sensor Networks(WSN),RFID system and data integration,etc.Uncertain data is gradually getting the attention from acdemics and becoming a hot topic.In traditional deterministic datasets,Top-k query has a definite semantic,which can get the final result only through the scoring function.Nevertheless the result of Top-k query must consider both the value of the scoring function and the corresponding probability.Therefore,Top-k query technology on deterministic dataset can not be directly immigrated to the uncertain dataset.In response,the acdemics have already defined a variety of the semantics of the Top-k query on uncertain database according to different application scenatios,which have achieved the good effect in pratical application.However,there are still some shortages in the semantics of Top-k query on uncertain database,and mainly for the following:(1)With the increase of the number of mutually exclusive tuples in the probabilistic database,the size of the possible worlds instance will grow exponentially,which leads to a huge amount of calculation and low efficiency of query process,and even be uncompleted in some case;and(2)The result obtained can not fully meet the semantic of user query.The entity set is the query target of users,rather than tuple set which returned by U-Topk query.The tuple-oriented result can partly represent entity set,but not comprehensive;and(3)the aggregated probability of tuple-oriented result is very small.So it is hardly convincing that user accept the result with low credibility.It can be seen from the above analysis that the traditional Top-k query technology is already difficult to apply to the personalized query and the results of uncertainty Top-k query have some clear failings.And the main work is summarized as follows:Firstly,we propose the semantic of personalized Top-k query and its processing.The personalized Top-k query we proposed can customize the result for users according to their attributes.And the basic idea of the algorithm is to analyze the effect of the attributes set of users on the datasets,then to determine the influence degree of individual attribute of user on tuple using the chi-square test,and finally to customize the result for user according to his attributes.Secondly,aiming at shortages of tuple-oriented semantic of U-Topk query,the thesis proposes an entity-oriented U-Topk query as well as query processing algorithm.The basic idea of the algorithm is converting tuple-oriented probabilistic database into entity-oriented probabilistic database.In this process,some exclusive tuples that meet the pre-defined rules will be merged.The algorithm of entity-oriented U-Topk query has two advantage:firstly,it can greatly reduce the size of probabilistic database;secondly,it can truly reflect whole state of entities,and avoid the one sidedness of the tuple-oriented U-Topk query.This thesis has carried on the detailed processing illustration and algorithm descriptions,and the efficiency and quality of algorithm in the thesis are vertified by experiments using real data.
Keywords/Search Tags:personalization, Top-k query, U-Topk query, possible world model, entity query
PDF Full Text Request
Related items