Font Size: a A A

Researches On Data Collection And Data Selection In Crowdsening

Posted on:2021-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:1368330602494427Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advancements of micro-sensor,WiFi,and 4G/5G mobile communica-tion technologies,the penetration of mobile devices such as smartphones and wearable devices has experienced rapid growth.While providing convenience for users and im-proving efficiency,it also empowers a novel sensing paradigm,namely crowdsensing,which monitors the physical world effectively by outsourcing the sensing tasks through the Internet to smartphone users located at different places.Unlike traditional data col-lection methods,crowdsensing allows users to act as the sensing units and provide the information they perceived or the sensor reads.Therefore,it can accomplish large-scale sensing projects effectively and cheaply.In recent years,both academia and industry have paid extensive attention to crowdsensing and proposed many successful applica-tions.However,in crowdsensing,users are not experts,and there exists a large number of low-quality or even malicious users.The low-quality,random,or even intentionally wrong data provided by low-quality users conflicts with that of high-quality users,re-sulting in noisy data for the system and inevitably affecting the quality of services that the crowdsensing system offers.To solve this problem,we not only need to dynamically assign tasks based on user reliability during task assignment,to avoid assigning tasks to low-quality users and malicious users,thereby improving the quality of collected data.We also need to filter the collected sensing data and select reliable data to improve the quality of the final service.Therefore,how to evaluate the reliability of users and utilize it in data collection and data selection to improve the data quality is a crucial challenge for the quality control of the crowdsensing system.This thesis mainly studies the data collection and data selection problems in the crowdsensing system,proposes task allocation algorithms and truth discovery methods with the research goal of optimizing data quality.We analyze the limitations of existing user reliability models,introduce a general fine-grained reliability model based on the cluster structure in users and tasks,design a truth discovery method based on the fine-grained reliability model.We also introduce cluster membership to solve the problem of insufficient data and improve the quality of data after selection.Last,for quality-aware online task allocation,we forecast the improvement of data quality when users complete tasks and dynamically assign each user an optimal set of tasks with the highest quality gains and improve the overall data quality.Through the above methods,we improve the data quality of the crowdsensing system and ensure the quality of the service it provides.The main contributions of this thesis are summarized as follows:(1)Fine-grained truth discovery method based on co-clustering reliability.Existing fine-grained user reliability models in crowdsensing all have limitations for the task forms,and can only handle the data that users provide for specific tasks.It is essential to design a more general fine-grained user reliability model.Therefore,we propose a fine-grained user reliability model based on the co-clustering structure in crowdsensing tasks and users.On the one hand,we designed the kl-means algorithm to capture the co-clustering structures and estimate the fine-grained reliability of users on different task clusters.On the other hand,based on the captured fine-grained reliability,we designed a truth discovery method to estimate the task's truth while inferring the task's cluster label.Experimental results based on real datasets show that this method can effectively learn the user's fine-grained reliability,and it shows better accuracy than the existing general truth discovery methods.(2)Fine-grained truth discovery method based on fuzzy cluster membership.Since most crowdsensing users have only completed limited tasks,evaluating user reliability based on limited historical data will cause a significant estimation error.It is necessary to design a fine-grained reliability model with limited user data.Therefore,we propose a Bayesian co-cluster reliability model based on cluster membership while considering the co-cluster structure in users and tasks.On the one hand,we use cluster membership to describe the user/task's fuzzy membership/tasks to multiple user/task clusters,which makes it possible to characterize users and tasks better and avoid data reduction that caused by assigning unique cluster labels.On the other hand,we propose a probabilistic graph model to describe the co-cluster structure based on mixed membership,then use the Gibbs-EM method to learn the co-clustering reliability and truths simultaneously.Besides,comparison experiments based on real datasets show that this method can better learn the user's fine-grained reliability and improve the data quality of estimated truths.(3)Online task allocation mechanism based on the latent topic model.Given the existence of low-quality or even malicious users in crowdsensing,indiscriminately as-signing tasks to users will not only cause a decline in data quality but also cause a waste of system resources.Hence,it is necessary to design a quality-aware online task allocation mechanism.We use the underlying cluster structure to describe the users'fine-grained reliability,capture the latent topic structure in tasks based on the users'behavior patterns,and estimate users' topical-level fine-grained reliability.We propose two quality-aware online task assignment mechanisms,dynamically forecasting the data quality gains when users complete tasks.Then we assign an optimal set of tasks to each online user with maximum expected gains or maximum expected and potential gains,which finally improve the quality of the collected data.The experimental results on real datasets show that the online task allocation mechanism can improve both the qual-ity of collected data estimated truths,while significantly reducing the number of task assignments.In summary,we have analyzed the limitations of existing crowdsensing data col-lection and selection mechanisms,designed a variety of fine-grained reliability models and truth-discovery algorithms to solve the task form limitations problem and the in-sufficient data problem,further utilized them in quality-aware online task assignment.We evaluated the performance of the above mechanisms,both theoretically and exper-imentally,which can provide substantial theoretical and technical support for the data collection and data selection in crowdsensing.
Keywords/Search Tags:Crowdsensing, Reliability Model, Truth Discovery, Online Task Assignment
PDF Full Text Request
Related items