Font Size: a A A

Research Of Quality Assessment On Internet Encyclopedia Articles

Posted on:2015-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2308330479479486Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the internet gaining increasing popularity in the twenty-first century, there has been an incredible growth of information. While people benefit from the easy access to immense information, they are confused when faced with information of varying quality. Wikipedia is a case in point. A great many of articles are being created and modified every day, but the quality assessment work is lagging behind. Since Wikipedia applies a manual quality assessment system, both the efficiency and speed of the process are seriously limited by human. Therefore a large proportion of articles on Wikipedia remain unassessed, leaving the users unaware of the quality. Although Wikipedia introduced user rating mechanism, it did not achieve satisfactory results due to subjectivity. Aimed at the efficiency and subjectivity problems of quality assessment on Wikipedia, this paper studies automatic quality assessment on Wikipedia and designs various machine-based techniques.To lay the ground work for quality assessment, this paper first studies the features that are related to article quality. The features can be divided into two categories. Firstly, a good Wikipedia article should possess the general features of an encyclopedia, such as completeness and accuracy. Moreover, Wikipedia differs from traditional encyclopedia in the way of creating articles. Wikipedia uses crowdsourcing which involves far more people in the editing work than a traditional encyclopedia does. Consequently, we can derive features from the editing history of the articles. Through a complete analysis of features derived from both the content and the history of articles, we select those that can be assessed by machine as the features for quality assessment.The quality assessment work on Wikipedia articles is studied trough classification and ranking respectively. By classification, we have developed an SVM-based classifier that can distinguish between featured articles candidates and ordinary articles. We can apply this method to select high quality articles as featured article candidates, therefore reducing the amount of work required by human. Another problem of classification is about whether featured article candidates can be promoted during review. We find that machine-based classification does not work well enough, so it cannot replace human efforts during review.We also study quality assessment through ranking. We use PageRank to model the bipartite graph of editors and articles. By calculating the PageRank value of articles until the results converge, we find that featured articles do not rank atop, which indicates that this ranking is ineffective. Therefore we use another way of ranking. We use featured articles as features for measuring the quality of editors, then we rank the articles according to different levels of editors. We also develop Weighted PageRank based on features derived from the article history, which significantly improves ranking results.This paper studies Wikipedia as a typical example of internet encyclopedia, the results and conclusions of which may also help the quality assessment work on other encyclopedias and crowdsourcing websites.
Keywords/Search Tags:Wikipedia, quality feature, quality assessment, SVM, PageRank
PDF Full Text Request
Related items