Font Size: a A A

Research And Implementation Of Conceptual Network Construction System Based On Wikipedia And Co-occurrence Analysis On Web

Posted on:2012-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:M M XuFull Text:PDF
GTID:2178330332467450Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosively growing of knowledge, it has become a difficult problem for users to find out useful information. Information retrieval system and recommendation system try to solve it in different aspects. Information retrieval system analysis the document content, and provide all users the same general interface to access. On the other side, recommendation system analyzes the relation between users and content, to push personalized information to different users. However, these two solutions cannot give a global view of the knowledge. That's to say, it does not know the component parts, either the relations between them.Conceptual Network is a significant tool to describe the structure of knowledge, including the components and their relations. A well-generated conceptual network shows users the inside relations in a visual mode. It's helpful to discover implicate knowledge, which is usually used to improve the performance of an information management system.With detailed comparison of the current Knowledge Base systems, including structured semantic Knowledge Bases and semi-structured knowledge sets, we propose a full solution to handler the initial construction and maintain works, based on the features of Wikipedia and Co-Occurrence analysis techniques. The main research content in this paper is as follows:Combining the processing techniques of Wikipedia and Co-Occurrence analysis, this paper proposes the conceptual network construction architecture CACN-WCA (Construction Architecture of Conceptual Network Based on Wikipedia and Co-occurrence Analysis). CACN-WCA is made up from two different parts, one is the initial processor which uses Wikipedia documents to generate a conceptual network from nothing; the other is the maintain processor which updates the network based on web documents. In the initial steps, we exploit plenty of the semantic information of Wikipedia to identify concepts, and recognize the related couples. In the maintain steps, a real-time web set used to draw the trends of elements is regularly updated by tracking, the co-occurrence information in massive existing network. Under the guide of CACN-WCA, this paper describes the following four algorithms in detail. They are Concept-Importance Algorithm based on Wikipedia, Related Concepts Recognize Algorithm based on Wikipedia, New Element discovery Algorithm and Concept-Relation feature Adjustment Algorithm.During initial stage, database files from Wikipedia are used as analyses set. Firstly, the refined London rule is used to remove stub documents, which is considered incomplete. In the remanding documents, each concept is marked an importance degree, which is measured by completeness and reliability. Then, Related Concepts Recognize Algorithm based on Wikipedia processed each document to recognize related couples. Usually, a Wikipedia document is indicated with star-model. For each candidate couple, the relation weight is measure based on the consumption, that the importance of a concept reflects the relation weight. So, the relation weighting issue is converted into a common concept-importance problem.While, during the maintain stage, co-occurrence analysis based on web set is employed. The two algorithms, New Element discovery Algorithm and Concept-Relation feature Adjustment Algorithm, respectively deal with new element discovery and existing relation feature's adjusting. For new element discovery, excluding the standard of frequency of occurrences, accumulation gain is introduced to describe an important element from its trend. In order to adjust the relation weight, Relation attenuation model and relation impulse model are proposed. These two models simulate the relation trend of an element, which is decreased naturally, and be promoted on the same time by the occurrence information. These influences help to maintain an up-to-date conceptual network.Since this architecture has been implemented in our project, this paper describes the main parts of the system briefly in the last section. Also, some experiments have been done to measure the algorithms described above. The results show that, CACN-WCA, which is based on Wikipedia and Co-Occurrence analysis in web set, can generate a preferable conceptual network.
Keywords/Search Tags:conceptual network, Wikipedia, Co-Occurrence analysis, concept relation recognize, relation adjustment
PDF Full Text Request
Related items