Font Size: a A A

Uncertain Graphs Cleaning For Reachability Queries Via Crowdsourcing

Posted on:2019-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y C WuFull Text:PDF
GTID:2428330566960653Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of Big Data,one of important and nuclear roles is played by graph databases,among which is Uncertain Graph with wide applications such as co-authorship network,biological network and social network.The subject studied by this thesis is Uncertain Graph whose uncertainty is embodied only in existences of its edges,and the goal is to improve the reliability of the reachability query on the uncertain graph.However,current works on uncertain graphs cleaning focus only on distance-constraint reachability queries and are based on the assumption that the crowdsourcing is accurate to handle the cleaning tasks.Therefore,this thesis first studies the problem of uncertain graphs cleaning for distance and label-constraint reachability queries via the accurate crowd.Based on that,considering the influence of the crowd noise on cleaning results,study how to clean uncertain graphs for the reachability queries.To this end,survey related works on uncertain graphs cleaning,basic concepts and reachability calculation,and then present the preliminary theories for cleaning uncertain graphs.Secondly,for the situation where the crowd does not have noise,design a frame about uncertain graphs cleaning via accurate crowdsourcing.A fast algorithm to accurately compute the reachability of an uncertain graph is proposed.To reduce the range for searching for the optimal edges to clean,an improved path-searching algorithm to extract effective edges is designed.The single-edge cleaning and multi-edges cleaning algorithms for uncertain graphs are designed in succession,and the efficiency and effectivity of the algorithms are verified via experiments.Then,for uncertain graphs cleaning via inaccurate crowdsourcing,establish an uncertain-graph-cleaning model which combines the crowd noise with the query result.Investigate the numerical influence of the crowd noise on cleaning results,and then find that if the accuracy of the crowd is kept over 50%,the result of the reachability query can still be improved after cleaning the graph.Next,to investigate the different improvement of reliability of the reachability query result after cleaning the different edges,the objective function about quality improvement of the query result(?Q)is deduced.By the analysis on the effect of inaccurate answers from the crowd on the query result,the fact is presented and proved that the edge correlatorP_e~*can repalce query result quality improvement?Q as the new standard of investigating effects of different edges'being cleaned,which forms the theoretical basis of edges cleaning algorithms.After that,single-edge cleaning algorithms and the multi-edge cleaning algorithm are separately designed based on P_e~*.Optimizing techniques for reducing searching space for objective edges are proposed to further improve the efficiency of the algorithms,and then the algorithm aiming for avoiding massive calculation of reachability is designed.Experiments results demonstrate that the proposed algorithms can accurately and efficiently select optimal edges to clean.At last,the extensive application of uncertain graphs cleaing in this thesis on other uncertain graphs queries is simply illustrated via two case studies.
Keywords/Search Tags:Data Cleaning, Uncertain Graphs Cleaning, Reachability Query, Reachability Calculation, Crowdsourcing and Crowd Noise, Social Network
PDF Full Text Request
Related items