Font Size: a A A

Wikipedia Based Conceptual Graph Model And Its Application

Posted on:2015-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y WanFull Text:PDF
GTID:2268330428967684Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text representation and its semantic comparisons are important basic research fields of natural language processing. They affect the performance of many intelligent systems, such as auto text classification, information retrieval, machine translation and question and answering system (QA system). Conventional machine learning methods only use information provided by text itself to modeling and computing. These methods can hardly understand text semantics when the texts are two short or polysemy appears. This is because traditional text representation methods are based on word of bag model (WOG), which relies on matching between words or phrases. Besides, short text contains very few word, they usually cannot provide enough information to build a reliable model.The most popular solution to the above problem is to use knowledge beyond text itself. However, current modeling methods do not make full use of the information which knowledge provides to us. To take Wikipedia for instance, most models ignore the semantic relatedness between articles and user annotated information. These attributes sometimes can be a great help for the understanding of text semantics, if not, it can also provide an information gain for better semantic comprehension. Therefore, to design a model which can make full use of rich information from knowledge is a quite important research subject.In this thesis, we propose a novel knowledge representation model. This model overcomes the shortcomings of previous models by considering the semantic relatedness between knowledge as well as user annotated information.The main work of this thesis includes the following three aspects:First, this paper presents a graph structure like knowledge representation model. In this model, knowledge is no longer viewed as a separate semantic entity. They are linked with their sematic relatedness and formed a graph-like structure. Take Wikipedia as knowledge base for instance, we call the model as conceptual graph in the following thesis. Each entry is regarded as a concept, and they are treated as conceptual graph nodes. Semantic relatedness between concepts constitutes the edges between them. Edge weights represent the degree of semantic relatedness between concepts, its value is measured by the comprehensive consideration of multi information, such as title, anchor text, content text, hyperlinks and category labels. It should be particularly noted that although the thesis used the Wikipedia as an external knowledge to build the model, our method is not limited to it, the model is equally applicable to other qualified external knowledge base.Second, this paper presents a novel text representation approach based on the conceptual graph. Thus, successfully convert text from word frequency vector space to the concept vector space and solved the synonym problem. We begin to map text to some related concepts by comparing lexical similarity between concept and text. Then we adjust these mapped nodes by the semantic relatedness between them in the conceptual graph. And a set of node which can best represent the semantic meaning of the original text will be derived. Finally, semantic similarity between texts can be calculated by comparing the concept vectors in the conceptual graph.Third, in order to apply the proposed model to practical applications, we devised a flexible way to build a small conceptual graph by corpus features. First, we random sample the target corpus, then we use multi feature selection techniques to extract features from them, third, we pick concepts which are semantic related with these extracted features and we only use these picked concepts to construct our graph model. Thus, we can control the scale of conceptual graph to a reasonable size which improves computation efficiency and make the conceptual more convenient to use. We apply the above model and semantic relatedness method to text categorization, the international standard corpus20newsgroup is used in our experiment, similar methods are used as baseline to compare to ours. Experiment results show the effectiveness of the proposed method.
Keywords/Search Tags:Knowledge representation, Wikipedia, Conceptual graph, Personalized PageRank, Semantic relatedness
PDF Full Text Request
Related items