Font Size: a A A

Research And Application On Robust Algorithm Of Outlier For Truth Discovery

Posted on:2020-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y C HuFull Text:PDF
GTID:2370330578460301Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of information explosion,information is often associated with a particular entity will be provided by multiple different data sources.For example,different meteorological observation sector description of meteorological elements in the same area,the radioactive medical image segmentation of image threshold,and so on.However,the errors made by sensor observations,the deviation of radioactive drug injection measurement,which will make specific information about the entities described in different sources have different biases.Therefore,truth discovery aim to obtain the most reliability information from multiple biases and conflicts sources,then assign the reliability score to the different sources.Recently,truth discovery has received wide attention from the academic and business circles concerned.But,the traditional truth discovery method often on the assumption that: high reliability data sources are unlikely to produce large error observations,even if the observation errors generated by low reliability sources are within a defined interval.However,within the applications of many dataset,this assumption dose no establish: even high-reliability meteorological observatories located in urban centers may occasionally report abnormal values with large errors due to physical damage,and the image segmentation threshold generated by experience in a batch of medicinal experimental rats may not be applicable because of individuals with intolerant drugs.This type of data,which leads to the assumption that the non-establishment in the traditional truth discovery method,is called outlier data in the existing true discovery literature,and needs to be preprocessed by manual work.Therefore,this paper intends to study the robust of truth discovery algorithm for outlier data and its applications.The main contributions of this paper are as follows:At first,in the study of truth discovery,multiple reliability truth is exactly existing in real world.The kernel density estimation(KDE)method summarizes multiple candidate observations that are considered truth,and estimates the reliability of the source by analyzing the probability distribution of the observation.Multiple observations were used to source reliability evaluation and estimation of truth achieved a certain success.However,the occurrence of outliers will have a boundary effect on the probability model,thus having a great impact on the estimation of the truth.Therefore,on the basis of this research,this paper fits the anomaly data by means of local linear regression,and uses the method of missing value interpolation to repair the outlier.Experimental results show that the proposed method can reasonably analyze the reliability of the source and effectively improve the accuracy of the truth estimation.Secondly,this paper adopts the strategy of iterative dynamic threshold filter to exclude outliers to improve the efficiency of truth discovery.In the big data environment,traditional truth discovery generally holds that the method of using mean as truth is not feasible,while the median is recognized.In real world,the outliers in the small sample data tested affect the data distribution,resulting in a difference between the median and the mean.And,the difference between the median and the mean is the main reason for the low quality of the sample data.Therefore,in this paper,an iterative dynamic threshold filtering method is proposed to filter out the outliers quantitatively to improve the data quality.The experimental results prove the feasibility of the theoretical method.Moreover,compared with the existing method of truth discovery,the above method not only reduces the error of truth estimation,but also improves the efficiency obviously.
Keywords/Search Tags:Truth discovery, Data conflict, Outliers, Local linear regression, Dynamic threshold
PDF Full Text Request
Related items