Font Size: a A A

Based On Text Feature And Non-text Feature Question Retrieval

Posted on:2018-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:R L TaoFull Text:PDF
GTID:2428330590477675Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the Internet has been widely used in our life.Thanks to Web2.0,Community Question Answering(CQA)system attracts a large number of users.In result,CQA has accumulated a lot of information and gradually becomed one of the most popular social networking applications,in which people access knowledge and information.Compared to the search engine,which returns a large number of related documents and then generates the answer automatically,or makes users choose answers manually,users can ask questions directly,and the quality and relevance of the answers are higher than results given by the search engine.CQA provides a quick and easy way for people to access information,but there are still some problems,such as long time for users to wait answers,too much similar questions,zero answer questions.In order to reduce redundancy,shorten the time for people to wait for answers,and improve the performance of CQA Q&A retrieval,we combine the textual and non-text features of the questions to provide the askers with similar questions which have been qualifiedly answered.This paper mainly focuses on the similarity of questions and the quality of Q&A pairs.Usually the question's similarity is based on perspective of lexical,grammatical or semantic considerations,where semantics will usually show better results.So we use Word Mover's Distance(WMD)which is based on word2 vec that learn semantically meaningful representations for words from local cooccurrences in sentences.Compare to traditional method,WMD achieved better performance at p@N and MAP.Secondly,we study the Q&A addtional information,the quality of the answer and the author,which are related to the Q&A quality,and consider their intrinsic characteristics and the importance of the feature.Finally,combined with this three features to calculate the CQA quality score,the experimental results performed well at ROC and F1.Finally,based on the similarity of text retrieval questions,combined with the quality of these questions.The model is tested on the actual data sets from Zhihu,experimental results showed that the proposed model improved the performance,and WMD achieved best performance.
Keywords/Search Tags:Community Question Answer, word2vec, non-text feature, Quality of Q&A pairs
PDF Full Text Request
Related items