Font Size: a A A

Research On Incentive Based Data Labeling Technologies And Their Applications

Posted on:2019-11-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J SunFull Text:PDF
GTID:1368330590466702Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the further development of artificial intelligence has made machine learning,especially supervised machine learning,widely used both in academic and industrial fields.However,in the context of the emergence of large amounts of data such as mobile Internet and Internet of Things,the data provided to traditional active learning algorithms presents some novel features such as a wider range,a larger number,more types and heterogeneity,thereby making the knowledge needed to properly label these data far outweighing the breadth and complexity the experts have.If we still use hiring experts' ways and means in active learning algorithms to label these data,the machine learning system also will probably get these corresponding data that are not labeled as a ground truth.Thus,it is no longer consistent with the noise-free labeling(the ground truth labeling)required by experts in traditional active learning algorithms.Moreover,labeling by hiring experts,no matter from cost or operability,become infeasible.More recently,the universality of mobile devices worn by the ordinary users enables massive labeled data by applying the theory of mobile crowdsensing.However,accomplishing the task will face with a series of issues such as the consumption of a large amount of mobile devices' resources(communication,computing capabilities,and energy,etc.),the leakage of the privacy of users wearing these devices,and the security and credibility of payment,thereby bringing the low users' willingness to participate.Hence,it is very urgent to design a mechanism satisfying the extensive user participation in labeling with high quality and the elimination of selfish behaviors such as the anxiety of the privacy leakage,thereby transferring human experience,knowledge and intelligence to machine learning system as much as possible.Although a few data labeling technologies based on the incentive mechanisms are proposed,they are still at the initial stage of development.Based on these existed works and the demands of the development of the artificial intelligence's applications,in this thesis,we mainly focus on research of data labeling technologies based on the incentive mechanisms.Combining the statistical theory and considering these characteristics of sampled data such as noise,redundancy,heterogeneity and privacy,we design a series of efficient data labeling technologies,then apply them to visual objects' classification of the applications of the augment reality.The main contributions of this thesis are listed as follows:(1)Considering the leakage risk of users' privacy clinging to labeled data and the issue of the payment credibility from the platform publicizing the data labeling task,we firstly present a general verifiable privacy-protection data labeling technology for the sceanros with the model of an offline homogeneous and heterogeneous sensing job.Then,we propose a more complex verifiable privacy-protection data labeling technology for the sceanros with the offline submodular sensing job model.(2)Considering users' heterogeneity,different preferences,selfishness,and so on,to incentivize the extensive user participation,we derive a closed-loop expression of marginal quality in the light of the monopoly convergence.Based on the expression,we design marginal quality-based long-term data labeling technologies for fulfilling the high-quality data labeling from a perspective of the average redundancy constraint.(3)We explore the periodic data labeling problem under the limit of the given tasks and design semi-online and online periodic data labeling technologies based on the frugal incentives so as to achieve the minimal payment of fulfilling the given tasks from a perspective of each sample's redundancy constraint.(4)Considering the uncertainty and diversity of multi-label instances,we explore multiple labeling issues under the budget limit,and design online posted-pricing based and biding-model based multi-label data labeling technologies respectively from a perspective without redundant constraints,and then apply an integrated framework of motivation and deep learning to visual object classification tasks in the field of augmented reality.In summary,this thesis not only designs a series of data labeling techniques based on the incentive mechanisms,but also presents theoretical analysis and extensive experiments for verifying their effectiveness,thereby providing theoretical and technical supports for classification and recognition applications in the field of artificial intelligence.
Keywords/Search Tags:machine learning, active learning, users' incentive, multi-label labeling, deep learning
PDF Full Text Request
Related items