Font Size: a A A

Mining The Quality Of The Content In Wikipedia

Posted on:2014-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:T S ChangFull Text:PDF
GTID:2248330398450251Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Wikipedia is a wiki-technology based multi-language online encyclopedia. With the rapidly development of Web2.0, the number of Wikipedia articles and contributors grows at a very fast pace. It seems that most of the articles in Wikipedia are reliable, due to which Wikipedia has provided users more and more convenient services. At the same time, Wikipedia is also the most attractive knowledge base in academic and industry circles. With the explosive growth, it comes up with the problem of how to ensure the reliability and the accuracy of the information in Wikipedia. From this point of view, we aim to identify controversial articles and detect vandalism in Wikipedia in order to mining the content quality of Wikipedia.In the respect of identifying controversies, we know that many of Wikipedia articles are written by up to thousands of authors who have contradicting opinions. Indentifying these controversial articles then resolving conflicts shows significant meaning to maintaining the high quality of the content in Wikipedia, because the extreme behaviors caused by conflict may affect the accuracy of the content. It draws clues from the edit history page, which dealing with edit history becomes more efficient than the traditional way of mining from the articles directly. Our Model takes into account the contributors of the corresponding article to compute controversial scores. Experiments on16745Wikipedia articles with the metadata from edit history show that our methods perform a lot better than the other baseline Models.Vandalism detection in Wikipedia attracts a lot of attention in recent years. Human beings are invited to detect vandals in old days which are inefficient and a waste of resources. For the purpose of improving information quality on Wikipedia and freeing the maintainer from such repetitive tasks, machine learning methods have been proposed to detect vandalism automatically. However, most of them focused on mining new features which seem to be inexhaustible to be discovered. Therefore, the question of how to make the best use of these features needs to be tackled. In this paper, we leverage feature transformation techniques to analyze the features and propose a framework using these methods to enhance detection. Experiment results on the public dataset PAN-WVC-10show that our method is effective.
Keywords/Search Tags:Wikipedia, Controversy Rank, Vandalism, Social Network Analysis
PDF Full Text Request
Related items