Font Size: a A A

Research On Knowledge Extraction And Management Of The Big Data In Wikipedia

Posted on:2014-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:K XiaoFull Text:PDF
GTID:1268330425468249Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, we have entered the Age of Big Data. Manufacturing, living, researching, serving are all changed by big data. At the same time, the process of knowledge creating and the model of decision making,"dataâ†'informationâ†'knowledgeâ†'wisdomâ†'decision", are facing adverse conditions. Big data is so large, and has too many models, and cannot be distinguished whether it is genuine or fake, and changes so frequnently. Only by transforming the so large and complex data sets into informations and knowledges can we make right choices. Practices show that the methods based on group intelligence, such as mass collaborative methods, nonlinear methods, decentralized methods, can help people hunt valuable knowledges.Wikipedia is a typical platform which creates knowledges based on mass collaboration, as well as a typical example of big data. As a mass collaboration platform, knowledge qualities are always uneven. The main goals of this paper are extracting high-quality domain knowledges and managing the knowledges. The contributions are as follows:(1) The characteristics of the mass collaboration environment of Wikipedia are summarized, including article editing tasks, article quality rating system, and the voting process of high-quality articles.(2) The impacts of mass collaboration behaviors on article qualities are analyzed. The editor netwoks are built based on the User Talk Pages. The impacts of the attributes, such as the ratio of conversational editors and the clustering coefficient of editor network, on the speed of quality promotion are clarified. It is the groundwork of the knowledge quality management task.(3) A new method of knowledge quality management in Wikipedia is proposed. This method employs both article attributes and editor attributes, and can assess article qualities of all quality levels. Because all the attribute values can be extracted from the Wikipedia database, this method can be used to detection article qualities of any languages. (4) High-quality articles of the specific domain were extracted from Wikipedia by using the quality detection method. After that, the degree of domain relevancy of every article was analysed. The closely related articles were used as concepts of ontology. Then the relations of the concepts were also extracted to build domain ontologies. The domain ontologies were used in a domain modeling tool, O-RGPS, in order to annotate the domain models Role, Goal, Process and Service. On the other hand, the domain ontologies were used in the platform, S2R2, which can support the semantic annotation and semantic search of web services.
Keywords/Search Tags:wikipedia, big data, mass collaboration, knowledge extraction, knowledgemanagement
PDF Full Text Request
Related items