Font Size: a A A

Research On Semi-automatic Construction Of Application Ontology Based On Chinese UGC Information Source

Posted on:2015-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H HuFull Text:PDF
GTID:1318330428474802Subject:Information Science
Abstract/Summary:PDF Full Text Request
Nowadays, with the increasingly growing popularity of the social media in internet activities, UGC information has become a flush. The contradiction between the explosion of UGC information and its value in use has become a problem that should be urgently solved in social media. The management and utility of UGC information source is an opportunity and challenge for the development of the information science. Thus, the effective organization of knowledge in UGC information source is significantly important. As a method and technology of knowledge organization, Ontology can not only effectively organize the knowledge in UGC information source, but also support semantic retrieval in social media. Traditional domain ontology is constructed aiming at domain field knowledge, therefore, cannot effectively support the social media based on user knowledge. Meanwhile, due to the unique characteristics of UGC information traditional ontology construction method cannot directly be used in the knowledge organization of UGC information source. Consequently, the need of social media semantic knowledge organization should be met by the application ontology construction of UGC information source. This paper improves and developes partial ontology construction based on the comparative analysis of current ontology construction methods and the classification and features of UGC information sources. Also, the methodology of ontology construction is proposed and realized based on the combination of semi-automatic Wikipedia information in UGC information source and the non-structural textual information resources in UGC information source. This methodology and its realization not only provide reference value to the general method of application ontology construction in social media semantic retrieval, but also have directive significance for the future research on ontology.This paper basically searches out the suitable UGC information source for application ontology construction according to the features of UGC information source, and comparatively analyzes the applications of various ontology construction methodologies in UGC information source aiming at finding a suitable method of application ontology construction in UGC information source. On this basis, the methodologies are partly improved in order to realize the application ontology construction based on UGC information source. What is essential in this paper includes the analysis and selection of UGC information source features, the acquisition of ontology concept based on UGC information source, the acquisition of the ontology relations of UGC information source, the ontology maintenance of UGC information source, etc.. This paper is enclosed with56figures and33diagrams, containing sixty thousand words,7parts, as follow:The first part discusses the relative methodology basis of ontology and ontology construction. To begin with, this part introduces Ontology in information science domain, describing and defining ontology concept, illustrating the features of ontology in knowledge description and knowledge sharing, introducing the basic elements of ontology and the types of ontology. The standards of the three ontology description language is discussed in detail, including XML, RDF and OWL. The principle, methodology and implement of ontology are also discussed. Then, according to the method of ontology construction, the basic knowledge of linguistics, logics and dissipative structure which is related to the concept extraction, relation acquisition and maintenance of ontology is illustrated.The second part mainly discusses the concept and definition of UGC information source, with the analysis on the content form and releasing form of UGC information source and its classification based on communication science and psychology. The suitable information source of application ontology construction is selected according to different distribution of UGC information sources.The third part is constituted with two sections. Firstly, the method of ontology concept extraction based on Wikipedia is discussed, including the analysis on the entity concept modle in Wikipedia and the extraction of ontology concept. Then, the method of ontology concept extraction based on the text in UGC information source is discussed. On the basis of this method, the acquisition model of ontology concept is proposed and the acquisition and pre-process of UGC corpus is discussed, and the chosen Chinese word segmentation technology in this paper is illustrated. Furthermore, the ontology concept of UGC information source is extracted following the ontology concept extracted from Wikipedia and the rule base of word property combination based on UGC text word property principles. The independence and completeness of the ontology concept is filtered according to the concept filtering method of mutual information and entropy of information, and then this filtering method is improved by the purifying of ontology concept extracted on the basis of head-word concept supplement. Finally, the essential concept in ontology concept is acquired by using the filtering method of domain relevancy and consistency.The fourth part falls into two sections. The classification relations in Wikipedia is firstly analyzed, and the classification relations is extracted by the method of head-word matching, mutual indexing and catalog listing. The classification relations in UGC information source text library is extracted based on the method of inclusion principle, template matching and hierarchical clustering. According to the non-classifying relations in UGC information source, the concept couplets are extracted from Wikipedia and the concept couplets are extracted from UGC information source texts by the association rules, while the verbs are extracted in these concept couplets. These verbs, then, are filtered by the method of CVF*IVF in order to find suitable verbs to become the predication of concept couplets, and to find out the suitable ontology triad model by using log likelihood ratio methodology.Based on the research above, the fifth part of this paper formalizes the ontology concepts and ontology relations, and proposes the capture model of ontology changes based on Chinese UGC information source according to the general process of ontology maintenance. The algorithm of ontology maintenance cost based on UGC information source is suggested according to the method of ontology maintenance operation and the analysis on the consistency restriction of ontology maintenance. Then, the change needs in UGC information source is analyzed and illustrated by providing examples in application.The application ontology construction proto system based on UGC information source is constructed in the sixth part of this paper. The specific requirements of word segmentation, concept acquisition and relation acquisition of the system are proposed. In addition, the system is designed and the function and its interface are displayed. This paper divides the proto system into three parts and ten functional modules, illustrating in detail the system interface and function of each functional module.The final part makes a conclusion of this paper, pointing out the weakness of the research and drawing a vision of future research which introduces the author's basis and direction of future researches.
Keywords/Search Tags:UGC, the extraction of ontology concept, the extraction of ontologyrelations, ontology maintenance
PDF Full Text Request
Related items