Text Data Truth Discovery Based On Self-confidence Of Sources

Posted on:2022-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:F X Yang

Full Text:PDF

GTID:2518306509984979

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the era of big data,it is convenient for crowdsourcing platforms to collect data,the same question can get many answers from different sources,and these answers may conflict with each other.Therefore,how to get the true information(i.e.,the truths)from many answers has been a hot research topic.Many scholars have proposed various truth discovery methods,but the existing methods can only handle the categorical data or numerical data,while performs bad on text data.Based on the above questions,this paper aims to study how to handle Chinese text data.Chinese words have the special semantic characteristics.Different words may express the same or similar meanings.It is unreasonable to treat Chinese texts as categorical data.This paper,we maps Chinese words to the word vector space,and characterizes the similarity between different words according to the distance of words in the vector space.In addition,consider that the Chinese text data contains not only the question answer,but also some implicit information.Although these words have nothing to do with the answer,they can reflect the self-confidence degree of source.In this paper,we make full use of these implicit information and propose a Chinese text data truth discovery using self-confidence of Sources.The method contains two processes: preprocessing the Chinese text data and discovering the truth.The preprocessing of Chinese text data first needs to segment the Chinese text data;then create a self-confidence enhancement dictionary and a self-confidence weakened dictionary,and use dictionaries to extract the source self-confidence information from answer descriptions and construct a self-confidence matrix;finally use the Word2 vec model to build Chinese word vector and extract answers of questions from text descriptions.In this paper,we propose a three-step iterative optimization algorithm,using the self-confidence matrix and Chinese word vector to further improve the performance of truth discovery.We perform experiments using two real-world Chinese text data sets.Comparing with the state-of-the-art methods,our method performs better,which demonstrates the superiority of our proposed framework.In addition,the comparison of experimental parameters shows that considering the implicit information in the Chinese text can further improve the accuracy of the truth discovery results.

Keywords/Search Tags:

Truth Discovery, Self-confidence, Chinese Text Data

PDF Full Text Request

Related items

1	Research On Algorithm Of Truth Discovery Based On Estimation Of Statistical Feature Parameters
2	Research On Truth Discovery Method Based On Information Relation And Label Confidence Clustering
3	Research On Truth Discovery Based On Bayes Model In Web Data Integeration
4	Research On Key Technologies Of Truth Discovery On Dirty Data
5	Research And Implementation Of Truth Discovery Under Local Differential Privacy
6	Research On Truth Discovery Algorithm Based On Open Source Information
7	Hidden Markov Model Based Multi-truth Discovery
8	Privacy Protection And Truth Discovery Protocol In Data Collection
9	Algorithms Of Copy Detection And Truth Discovery For Multi-relational Data
10	Research On Truth Discovery Methods Based On Reliability In The Multi-source Environment