Research On Knowledge Extraction And Management Of The Big Data In Wikipedia

Posted on:2014-01-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Xiao

Full Text:PDF

GTID:1268330425468249

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

At present, we have entered the Age of Big Data. Manufacturing, living, researching, serving are all changed by big data. At the same time, the process of knowledge creating and the model of decision making,"data�'information�'knowledge�'wisdom�'decision", are facing adverse conditions. Big data is so large, and has too many models, and cannot be distinguished whether it is genuine or fake, and changes so frequnently. Only by transforming the so large and complex data sets into informations and knowledges can we make right choices. Practices show that the methods based on group intelligence, such as mass collaborative methods, nonlinear methods, decentralized methods, can help people hunt valuable knowledges.Wikipedia is a typical platform which creates knowledges based on mass collaboration, as well as a typical example of big data. As a mass collaboration platform, knowledge qualities are always uneven. The main goals of this paper are extracting high-quality domain knowledges and managing the knowledges. The contributions are as follows:(1) The characteristics of the mass collaboration environment of Wikipedia are summarized, including article editing tasks, article quality rating system, and the voting process of high-quality articles.(2) The impacts of mass collaboration behaviors on article qualities are analyzed. The editor netwoks are built based on the User Talk Pages. The impacts of the attributes, such as the ratio of conversational editors and the clustering coefficient of editor network, on the speed of quality promotion are clarified. It is the groundwork of the knowledge quality management task.(3) A new method of knowledge quality management in Wikipedia is proposed. This method employs both article attributes and editor attributes, and can assess article qualities of all quality levels. Because all the attribute values can be extracted from the Wikipedia database, this method can be used to detection article qualities of any languages. (4) High-quality articles of the specific domain were extracted from Wikipedia by using the quality detection method. After that, the degree of domain relevancy of every article was analysed. The closely related articles were used as concepts of ontology. Then the relations of the concepts were also extracted to build domain ontologies. The domain ontologies were used in a domain modeling tool, O-RGPS, in order to annotate the domain models Role, Goal, Process and Service. On the other hand, the domain ontologies were used in the platform, S2R2, which can support the semantic annotation and semantic search of web services.

Keywords/Search Tags:

wikipedia, big data, mass collaboration, knowledge extraction, knowledgemanagement

PDF Full Text Request

Related items

1	A Study Of Online Collaboration Based On Wikipedia
2	Research On Knowledge Production And Communication Of Internet Knowledge Collaboration Platform
3	Mining Semantic Knowledge From Chinese Wikipedia
4	Knowledge Extraction And Reuse In Wikipedia
5	Design And Implementation Of Material Knowledge Extraction System Based On DBpedia
6	Research On Knowledge Extraction Method For Wikipedia Multimodal Data
7	Research On Network Characteristics And Evolution Model Of Wiki Knowledge Network
8	Research And Implementation Of The Knowledge Search System Based On Wikipedia
9	Working within wikipedia: Infrastructures of knowing and knowledge production
10	Conflict Management And Improved Design In Online Mass Collaboration System