Font Size: a A A

A Study Of Ranking Methods For Searching In Community Question Answering

Posted on:2018-12-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C WuFull Text:PDF
GTID:1318330512485619Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the last decade,many community question answering(CQA)sites have emerged and accumulated a large number of questions and answers.Search in CQA has become an important branch in information retrieval.The research directions can be divided into two parts:research on relevance ranking in question search,and research on quality estimation of question-answer pairs.In the former,ranking is related to query,and thus it can be called dynamic ranking.In the latter ranking is not related to query,so it can be called static ranking.However,both dynamic ranking and static ranking have several challenges.The major challenges in dynamic ranking are as follows:In question search,queries may be verbose so that key phrases can be mismatched;and queries may be short so that misunderstanding users' search intent can occur.Meanwhile,the major challenges in static ranking research are as follows:existing methods mainly focus on mining high-quality answers and experts in CQA,but overlook the harm of low-quality answers and spammers,and the relationships between user authority and answer quality are usually not valued.Therefore,in this study,we address the above two problems by considering the four aspects to improve the overall performance of CQA systems.Firstly,we propose a user-intent based language model for query search to handle the short query challenge in dynamic ranking.Existing methods mainly focus on long and syntactically structured queries.However,when an input query is short,the task becomes challenging,due to lack of information regarding user intent.In this chapter,we mine different types of user intent information from various sources for enriching short queries.With these intent signals,we propose a new intent-based language model.The model takes advantage of both state-of-the-art relevance models and the extra intent information mined from multiple sources.We further employ a state-of-the-art learning-to-rank approach to estimate parameters in the model from training data.Experiments show that by leveraging user intent prediction,our model significantly outperforms the state-of-the-art relevance models in question search.Secondly,we propose a new approach to query segmentation for relevance rank-ing to handle the long query in dynamic ranking problem.In this chapter,we try to determine how best to improve state-of-the-art methods for relevance ranking by query segmentation.Query segmentation is meant to separate the input query into segments,typically natural language phrases.We propose employing the re-ranking approach in query segmentation,which first employs a generative model to create the top k candi-dates and then employs a discriminative model to re-rank the candidates to obtain the final segmentation result.The method has been widely utilized for structure prediction in natural language processing,but has not been applied to query segmentation,as far as we know.Furthermore,we propose a new method for using the results of query segmentation in relevance ranking,which takes both the original query words and the segmented query phrases as units of query representation.Our experimental results on large scale web search datasets and query search datasets show that our method can indeed significantly improve the performances of relevance ranking.Thirdly,we propose an unsupervised approach for low-quality answer detection to handle the answer quality prediction in static ranking.CQA sites such as Yahoo!Answers provide rich knowledge for people to access.However,the quality of answers posted to CQA sites often greatly varies from precise and useful ones to irrelevant and useless ones.Hence,automatic detection of low-quality answers will help the site man-agers efficiently organize the accumulated knowledge and provide high quality contents to users.In this chapter,we propose a novel unsupervised approach to detect low-quality answers at a CQA site.The key ideas in our model are:(1)most answers are normal;(2)low-quality answers can be found by checking its "peer" answers under the same question:(3)different questions have different answer quality criteria.Based on these ideas,we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores.Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.Fourthly,we propose a reinforcement model for user authority estimation in the static ranking problem.Intuitively,user authority is positively correlated with the an-swer quality.Meanwhile,a highest-quality answer is picked as the best answer after comparison by askers or viewers,which constructs competition relationships between the best answerer and the asker,and the best answerer and other answerers.In this chap-ter,we propose an iterative reinforcement model based on three models:user authority estimation model,answer quality prediction model and competition model.These three models iteratively reinforce each other,and simultaneously generate final user authori-ty estimation and answer quality prediction.Experiments show that our reinforcement model significantly improves the performances in user authority estimation and answer quality prediction.
Keywords/Search Tags:Community question answering, question search, user intent, query seg-mentation, answer quality, user authority
PDF Full Text Request
Related items