Font Size: a A A

Research And Implementation Of The Knowledge Search System Based On Wikipedia

Posted on:2013-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:C Z WuFull Text:PDF
GTID:2248330374475325Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet technology, Wikipedia hasbecome one of the largest open content knowledge platforms. The number of knowledge itcontains is being updated and increased almost all the time, which makes Wikipedia can beapplied in more and more fields. Research in natural language processing which useWikipedia as natural large-scale corpora has made a lot of achievements.With the number of contributors being constantly increased, the scale and number ofarticles of Wikipedia are keeping growing steadily, and there are more and more people wouldlike to use Wikipedia to find what they want. However, the search engine within theWikipedia is still searching in the way of traditional full-text matching, although everydocument contains lots of internal links which link to other documents, most of them have nosemantic relationship with the current document. This paper argues that searching processshould be based on semantics, so how to add semantic functionality during the searchingprocess in the Wikipedia is a search priority.As to adding semantic functionality in the searching procedure, an ordinary method maybe that doing the search work meanwhile searching in the other documents and computing therelatedness between the two documents. But due to the Wikipedia’s huge amount of data andthe time complexity of the algorithm of computing the semantic relatedness, the wholeprocess will spend a lot of time, which will make a negative effect on retrieval efficiency anduser experience. To solve this problem, this paper proposes a method, which uses theWikipedia corpus resources to build a semantic knowledge base, to improve query efficiency.Firstly, we have done a detailed study on the characteristics of Wikipedia, including itsclassification structure, page structure, page link structure as well as a variety of data storageformat. And then a set of processes, which can effectively extract the structured informationfrom the Wikipedia’s backup data, have been developed, resulting in achieving the basiccorpus resources which is the base of the research and the semantic relatedness algorithmproposed in this paper. Then we deeply studied the traditional semantic knowledge base’ssemantic features and the manifestation of a semantic knowledge, and then built a knowledgebase. Finally, on the basis of the knowledge base we built, a simple knowledge search systemwas implemented, which allows the user to find some knowledge and that semanticallyrelated in a convenient way.
Keywords/Search Tags:Wikipedia, semantic relatedness, knowledge base, knowledge search
PDF Full Text Request
Related items