Font Size: a A A

Clustering Concepts Based On Encyclopedia Entries

Posted on:2017-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:X CaoFull Text:PDF
GTID:2348330512975274Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Ontology has been researched in wide range of applied areas,and there is an increasing need for domain ontologies in various fields,such as knowledge management,artificial intelligence and semantic Web.However,building ontology is a labor-intensive and time-consuming task,especially when obtaining ontology relations.Ontology learning is a process of building ontology using machine learning methods(semi-)automatically.According to the study objects of ontology,ontology learning mainly includes the learning of the set of concepts and the set of relations between concepts.Among them,the relation learning is trying to find relations between concepts(semi-)automatically and quickly by the computer.In this information era of rapid growth,new ideas emerge in endlessly,and relations between concepts are changing.In this paper,we present a preliminary study on a concept clustering method based on encyclopedia entries for obtaining ontology relations automatically.The main contents of this article include the following three parts.(1)The research of concept vector model based on the encyclopedia entries.First of all,using the domain concepts set determined by the domain experts to obtain the encyclopedia entries text respectively.There are a lot of pretreatment works to do with the text corpus.And for each domain concepts,it is necessary to count the word frequency and store it after building a table in the database in order to provide data basis for calculating concept vector distance during the concept clustering process.Then calculating information entropy of each word of each domain concept,and filtering the word that is not independent.Finally through key words of the field,the vector model of the domain concepts is constructed,and the component of the vector is a word frequency of the key word.The whole corpus is regarded as the co-occurrence window can improve the accuracy of the clustering algorithm.(2)The research of concept clustering based on the distance discrimination method.The distance discrimination method is applied to the concept clustering,namely using the Mahalanobis distance to calculate distance between concepts and the gravity distance to calculate the distance between concept and the class center.After iterations many times,the clustering process is stopped until the clustering results are no longer changed.Clustering the concepts through the concept vector model,and the clustering results obtained can be expressed as the concepts set that have semantic relations.For the test data of three areas of e-commerce,knowledge management and management information system,use the distance discrimination clustering method proposed in this paper and k-means clustering method to cluster concepts,and output the clustering results.By comparing and analyzing the experiment results,in the respects of the matching degree of clustering,accuracy,F-Score and the similarity of clustering results,namely RI indicators,the concept clustering methods proposed in this paper are higher than k-means clustering algorithm.(3)An applied research experiment system about concept clustering method based on the encyclopedia entries verifies the distance discrimination clustering method proposed in this paper is feasibility and validity.In general,experiments of clustering domain concepts of three sets demonstrate that the concept clustering methods proposed in this paper shows better results and more stable compared with classical clustering methods.
Keywords/Search Tags:knowledge management, ontology, ontology relation learning, concept clustering, distance discrimination method
PDF Full Text Request
Related items