Font Size: a A A

Text Data Truth Discovery Based On Self-confidence Of Sources

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:F X YangFull Text:PDF
GTID:2518306509984979Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,it is convenient for crowdsourcing platforms to collect data,the same question can get many answers from different sources,and these answers may conflict with each other.Therefore,how to get the true information(i.e.,the truths)from many answers has been a hot research topic.Many scholars have proposed various truth discovery methods,but the existing methods can only handle the categorical data or numerical data,while performs bad on text data.Based on the above questions,this paper aims to study how to handle Chinese text data.Chinese words have the special semantic characteristics.Different words may express the same or similar meanings.It is unreasonable to treat Chinese texts as categorical data.This paper,we maps Chinese words to the word vector space,and characterizes the similarity between different words according to the distance of words in the vector space.In addition,consider that the Chinese text data contains not only the question answer,but also some implicit information.Although these words have nothing to do with the answer,they can reflect the self-confidence degree of source.In this paper,we make full use of these implicit information and propose a Chinese text data truth discovery using self-confidence of Sources.The method contains two processes: preprocessing the Chinese text data and discovering the truth.The preprocessing of Chinese text data first needs to segment the Chinese text data;then create a self-confidence enhancement dictionary and a self-confidence weakened dictionary,and use dictionaries to extract the source self-confidence information from answer descriptions and construct a self-confidence matrix;finally use the Word2 vec model to build Chinese word vector and extract answers of questions from text descriptions.In this paper,we propose a three-step iterative optimization algorithm,using the self-confidence matrix and Chinese word vector to further improve the performance of truth discovery.We perform experiments using two real-world Chinese text data sets.Comparing with the state-of-the-art methods,our method performs better,which demonstrates the superiority of our proposed framework.In addition,the comparison of experimental parameters shows that considering the implicit information in the Chinese text can further improve the accuracy of the truth discovery results.
Keywords/Search Tags:Truth Discovery, Self-confidence, Chinese Text Data
PDF Full Text Request
Related items