Font Size: a A A

Semantic Web Search Technology Based On Wikipedia

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:P J LiuFull Text:PDF
GTID:2308330461983629Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Because the current Web search technology is based on keywords, it often fails to meet user’s needs due to information’s recall ratio and accuracy far below people’s expectancy. For lacking of readable semantic information during Web search, therefore it limits the ability of computer to automatically analyze and further process intelligently. Thence, aiming at improving the accuracy and intelligence of search engine, we would like to shift our search engine from a traditional based on keywords to a more intelligent based on semantic searching. Semantic search needs to be supported by a semantic space network having concepts. Wikipedia, as an open online encyclopedia, is the largest knowledge resource including a great amount of human knowledge and semantic relationships in the world. So how to take fully advantage of the Wikipedia to add semantic processing ability for the present search technology and optimize the information retrieval process, is being studied in this paper. The main work of this paper is as follows:Firstly, we extract semantic information according to Wikipedia’s information organization and structural characteristic. Wikipedia’s data processing technique origins from the technique used by processing the big data, so we construct a cloud platform based on Hadoop. By establishing a set of application programming interfaces based on object model, some semantic information which we are interested in are fetched from the Wikipedia topic pages, including concept, category, link and abstract paragraph(the first paragraph of the topic pages). This provides necessary structural and content information for the following calculation on semantic relatedness. In addition, it also could give researchers some guidance on processing the Wikipedia’s big data in the future.Secondly, a clever method, named WLA algorithm(Wikipedia Link and Abstract) is proposed to calculate semantic similarity. On the above basis of the extracted relevant information from Wikipedia, we focus on the link relation and the content of the abstract paragraph. The character of common words between link relationships(including link-in and link-out) and abstract paragraph well reflects the connection of concepts. By endowing different weights, it turns out to be a rather satisfactory result, naming that the Spearman correlation coefficient can reach 0.68.Finally, we develop a prototype of semantic search system. By integrating the proposed WLA algorithm into the system, it makes the platform be able to serve the normal user and semantic researchers. Against the background of the semantic explanation of Wikipedia to words, the system includes three parts, namely semantic computing, semantic concept query and text annotation. The function of semantic computing can compute the semantic relevance of terms; the system of semantic concept query works just like the dictionary of Wikipedia, it not only can explain partial words, polysemy, ambiguous words to help people enlarge knowledge, but also can improve the search engine’s ability to process query; text annotation is to note for the proper noun appeared in short text. In other words, as long as the text entry has a corresponding topic page in Wikipedia, then the system will annotate the entries and add the function of link. This prototype can be used as a test platform for the related semantic search research.
Keywords/Search Tags:Wikipedia, semantic computing, search engine
PDF Full Text Request
Related items