Font Size: a A A

Scalable Query Technology Over Probabilistic Databases

Posted on:2016-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330479953412Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A great deal of applications have recently emerged which need to manage large, imprecise data sets, such as the information retrieval, fuzzy object matching, data integration, sensor networks, social networks and so on. The traditional relational database is helpless for the query in uncertain data because it only deals with the precise data. Therefore it is a new demand to process the probabilistic data efficiently and exactly. And it directly contributes to the study on the probabilistic database.Data representation model and query algorithm for uncertain data is a hot research topic in the field of database in recent years. The BIDL(Block Indenpent Disjoint with Lineage) model adds the lineage to it based on the BID(Block Indenpent Disjoint) model. The lineage information can not only record the source of the data,but also can provide the feedback to the user. The basic query algorithm is generally divided into two categories, namely, semantic based query algorithm and the query algorithm based on extended. The query algorithm based on semantic can ensure the accuracy of query results, but its calculation is very complicated. However, the query algorithm based on extended runs more quickly, but does not guarantee the accuracy of the query results. ST(Split Tuple) algorithm makes the probabilistic reasoning on the lineage and analysis of the relation between tuples, and then make the relation between tuples clear and easy to process through splitting the corresponding tuples and then take the effective probability algorithm to calculate the query,and finally return the query results to the user. It is scalable to the probabilistic reasoning based on the lineage.The advanced query algorithm concludes Top-k, Skyline, KNN(K Nearest Neighbors), path query, threshold contour query, join query and so on. KNN is a kind of important application in spatio temporal databases, such as weather forecast, senser networks etc… PKNN(Probabilistic K Nearest Neighbors) algorithm analysis the property of the data and speed up the calculation by recording the intermediate results. The experimental result shows that the algorithm is scalable.
Keywords/Search Tags:probabilistic database, query algorithm, scalable query, the relationship between tuples, KNN
PDF Full Text Request
Related items