Font Size: a A A

Research On Evaluation Of Data Currency Based On User Confidence

Posted on:2016-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:L L AnFull Text:PDF
GTID:2308330461468122Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of network and information technology, if there are too many legacies of history data in this warehouse, the information which users attain may be mistaken. The data with poor currency may badly influence the people’s daily life and business decision. Therefore that highlights the needs of the data currency evaluation to improve the data quality.There are some works on evaluating data currency. the current method for the data currency evaluation can be divided into three categories:the method based on timestamps, the method based on uncertain rule, the method based on certain rules.Some works require timestamps which are always incomplete or invalid.Some works on uncertain rules are to express the uncertainty of domain knowledge, which do not rely on the redundant records. That can improve the recall rate, but lose accuracy. Other works based on certain rules are dependent on redundant tuples and time constraint relationship, which only consider the subjective weights of attributes coming from users. In this paper, the evaluation of data currency based on rule method is in-depth study. We define the User Confidence which considers the subjective and objective weights of attributes, and put forward the corresponding improvement method based on the user confidence in order to improve the qualities of data currency queries results. In view of incomplete data in the study of data currency evaluation, we further put forward the method, which is based on the user confidence, to evaluate the data currency for incomplete data.(1) To solve the problem of the data currency model and the evaluation of data currency that not takes the dependent relationships among attributes into consideration, an improved method is proposed based on user confidence. The proposed method takes the four factors:the users’subjective weight, the dependent relationships between attributes and attributes, the redundant records and currency constraints into consideration. We define the user confidence factor to improve the quality of query results. The user confidence is determined by each user’s subjective weight and dependent relationship among the attributes of the the redundant records in datasets. And the method experiments on the real datasets and the virtual datasets, the results show that the proposed algorithm, which can improve accuracy and recall rate and currency value, is superior to traditional data currency evaluation on the complete data. And it indicates that the user confidence is reasonable and effective in dealing with the attributes weights for complete datasets.(2) In order to overcome the shortcoming of the traditional data currency model in processing incomplete datasets, we further propose an improved data currency evaluation algorithm for incomplete datasets based on the user confidence. Firstly, an incomplete data preprocessing method is used in incomplete data sets, the method based on the currency constraints relationship of the redundant records. The data preprocessing for incomplete data sets, which is closer to the complete datasets after processing, can help build the currency graph and evaluate the data currency. Secondly, the data currency model for incomplete data is to build based on the currency constraints. For the current value query, the null value can not be the current value. If the erroneous data are current, they are all in the collections of the latest value. If they are not the latest value, they can not fluent the data currency values. For the currency sequence query, they are as a node of the currency graph.Thirdly, we compute the user confidence based on the users’subjective weight and the objective weight of attributes in redundant records.Finally, the proposed algorithm is demonstrated on the true datasets and virtual datasets. The experments show it is effective and feasible with the time complexity in polynomial time.
Keywords/Search Tags:Data Currency, User Confidence, Data Currency Model, Evaluation Algorithm
PDF Full Text Request
Related items