Font Size: a A A

A Study On The Transfer Learning Method Of Chinese Q & A

Posted on:2016-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:B YangFull Text:PDF
GTID:2208330470968021Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Now, The Internet has become an indispensable part of human life. All kinds of information exchange and access methods for emerging technologies, are greatly facilitate the exchange of people, the Internet has been the development of simple information dissemination to everyone involved. Community Q & A (Yahoo! Answers, Baidu Know, Tencent Soso, Sina Ask, etc.) is one of a relatively new way of sharing information. Community user who is not only to obtain information, but through certain incentives, allowing users to share their knowledge and experiences with each other. This facilitates the information interaction to a certain extent, users are still the information provider. Accurate understand queries and recommend related issues have been an important research point. Question Classification is in this context of the implementation of the study assigned user’s query to a pre-configured category; the process of puting question to a specific category in community is different with traditional text classification and the paradoxical problem classification in TREC. If we directly apply the existing classification methods to cQA there will lead to significantly lower classification accuracy, where the difficulties are as follows:(1) Traditional classification methods classify text or questions into a limited number of categories, while the number of categories in cQA usually more. Baidu Know, for example, category level roughly divided into three layers, the number of top-level category is 13, second is 141, and third is 289. Experimental results show that if we directly use the existing methods, still not upgraded, after a substantial increase in the number of categories, the classification results are not satisfactory.(2) Different with plain text or document, questions in cQA mostly are usually shorter and involved less useful information, information between the questions can be shared and individual question can provided are relatively scarce, thus presents a challenge on similarity measure in classification algorithm. Sparse data resulted in the traditional classification methods based on bag of words has not good application in cQA.In response to these problems, research conducted on the key issues:optimize word feature extraction, applied transfer learning on unlabeled data, obtain similar probability distribution samples in kernel space, etc. The main research work completed as follows:(1) Question semantic extensions. During the process of training model, construct wiki knowledge with the help of wikipedia, and then extract multiple semantic relations: synonym, polysemy, and hypernym, associative relation, use this relation to resolve data sparse.(2) Refined features methods. Compared to the text, questions generally shorter thus can not able to provide useful information, refine feature selection methods on the basis of wiki knowledge, use proper noun and categories exclusive features, remove words which is single and the frequency is below 10. Such that the selected features can be more closed to the appropriate category. Experimental results show that the method effect is remarkable.(3) The Application of Kernel Function mapping in Question Classification in cQA. When labeled data is little or is difficult to obtain, then we should use transfer learning to make full use of unlabeled data. When sample space probability distribution between source domain and target is different, effective knowledge transfer between the two is minimal, sometimes even resulting in negative transfer. So we should map both domain data into common kernal space, select sample in the common space to construct model, achieve the purpose of knowledge migration.(4) Select the sample whose conditional probability distribution is the same. Through kernal function mapping we can mapping sample into the common space, thus they share the common space probability distribution, however only sample whose conditional probability distribution are the same can be used to construct model to transfer knowledge, we choose cluster algorithm K-means.
Keywords/Search Tags:community-based QA, Transfer Learning, Wikipedia, Question Classification
PDF Full Text Request
Related items