Font Size: a A A

The Research Of BaiduZhidao Content Extraction

Posted on:2015-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:S H YuFull Text:PDF
GTID:2268330428968419Subject:Education Technology
Abstract/Summary:PDF Full Text Request
In addition to search engine, Social Question&Answer Community is another common access to information. It is an online question and answer service that is the way one can ask questions, answer questions and interacting. And BaiduZhidao is known as the world’s largest Chinese interactive community, widely distribution, including a great number of data. So this study chooses BaiduZhidao as the research object.With the rapidly development of BaiduZhidao, there is a growing concerning on the quality of the answer. This article makes a research on the quality of questions and answers,takes "BaiduZhidao" for example.First of all, Using artificial mark analysis to determine the quality of the Q&A(Questions and answers) on BaiduZhidao, and make sure weather "the questioner adoption" and "User adoption" is the best answer. Then using SVM classifier to assess the quality from five areas:text features of Q&A, the statistic-based features of the Q&A、timing-based features of the Q&A, user-based features of Q&A, questions and answers correlation characteristics.According to the result,we concludes that the quality of Q&A in BaiduZhidao is high. So the recommended answers of BaiduZhidao can be estimated as the best answer. The marking results shows that the most problems of BaiduZhidao is of high quality, the minority of them are low-quality question. The proposed classification is good. After adding these five characteristics in turn, the overall accuracy, recall and AUC of the answer are growing. AUC is not less than0.05except for the first one, and is growing along with the time, so the classification performance is good. When adding the Q&A feature on the statistical information and time-based features, it occurs biggest growth. However, the accuracy of the feature based on users’ characteristics is in lowly increasing. As the same time the recall rate and AUC of the feature based on users’characteristics is reduced.Considering that the passage of time, more and more users recommend the correct answer is looked as the best answer, that are anonymous answers. And more and more are choosing to use the answers provided by an anonymous way,taking the factors of this layer into account is reasonable, based on user characteristics accuracy rate is low.So overall,"BaiduZhidao" as the largest Social Question&Answer Community in our city has high quality. And the proposed five characteristics can fairly well to predict which the best answer is.
Keywords/Search Tags:Social Question&Answer Community, BaiduZhidao, Crawler, SVM, Q&A quality
PDF Full Text Request
Related items