Font Size: a A A

Research On Candidate Answers Ranking For Temporal Question

Posted on:2016-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:C TanFull Text:PDF
GTID:2308330503951119Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information retrieval is the process that provides users with the most relevant documents from the mass of data. In recent years, time has been gaining increasing importance in search process, creating a new field of research known as temporal information retrieval(TIR) which contains a lot of different challenges. General speaking, the goal of TIR is to satisfy search needs by combining the traditional document content relevance with temporal relevance.The amount of resources in the Internet, however, makes TIR as a difficult task. First, since the web is constantly changing, maintaining up-to-date indexes is becoming increasingly difficult. Second, because of the complexity of semantic and the diversity of temporal expressions, it is more difficult to identify the temporal intent of the query. Third, it is difficult to retrieve documents so that their temporal dimension will meet the user temporal intent underlying the query.Not only in information retrieval, but also in other domains, time has been applied successfully. The time information was introduced to build a time-sensitive question answering system. The system is divided into the following modules to be studied: question temporal intent classify, time entity recognition, combined with question and answer time information to sort the candidate answer and answer quality determination from the semantic level.First, the analysis of question temporal intent and the recognition of time entities. For the analysis of question temporal intent, questions are divided into four categories: Past, Recency, Future and Atemporal. By combining time rules and machine learning methods, this method obtains accuracy of 75% in four categories classification task. For the task of time entities recognition, a mixed strategy method is designed, merging the current results of different entity labeling methods, to improve the effect of the time entities recognition.Second, the candidate answers ranking. Against the problem of search and sort the answers of question, this paper using solr to get the content relevent answers, then using the learning to rank algorithms to sort it. This paper analyzes the importance of the time factor in answer ordering of time-sensitive question. The result under the evaluation nDCG @ 10 being 0.5062 means that with the introduction of time factor, the quality of answers ordering of time-sensitive question can be improved.Third, the determination of answers quality. Each questions has different answers in CQA System. This paper extracts different semantic features and designs a hierarchical method that combined different machine learning algorithms. Each answer quality determineed by the hierarchical method. The method designed of this study reaches 58.47% in Macro F1 in three categories classification problem. This paper also analyzed the importance of different features in the answer quality determination task.
Keywords/Search Tags:temporal intent, time entity, answer ranking, answer quality determination, question answering system
PDF Full Text Request
Related items