Font Size: a A A

The Full-Text Semantic Annotation System Based-on Chinese Wikipedia

Posted on:2013-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2248330392957828Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
While Internet presenting all kinds of information to people in different forms,greatly huge information make it difficult for people to search the knowledge they aremost needed from the vast ocean of Web pages. Internet we are using currently has twoobvious deficiencies: first of all, machine cannot understand the content of document.Secondly, information exist disordered, the relationship between various kinds ofinformation inherently lack of organization. Semantic web technology is proposed toresolve these problems. It aims to transform existing documents tomachine-understandable information by organizing them using a well-defined semanticknowledge base. Adding machine-understandable semantic information for a variety ofresources is realized by semantic annotation technology.Wikipedia is a free wiki-based, collaborative, multilingual encyclopedia edited bypeople all over the world, which is considered as a having rich semantic relations andfixed template corpus. A well defined Chinese semantic knowledge base can be extractedfrom Chinese Wikipedia with combined TF-IDF and Google distance method from theweight and occurrence of semantic links between terms. Semantic relations betweenwords in a document not only exist between adjacent words, words in different paragraphsalso may be relevant. Thus, ignoring the fixed structure of the original document, adocument can be considered as interrelated set of words. Based on the semanticknowledge base extracted from Chinese Wikipedia, we design vertex and edge featurefunction which obey full-text semantic annotation logic in the framework of model ofconditional random fields, semantic annotation system can accomplish the full-textannotation task for a Chinese document.Experiments show that the accuracy of calculating the semantic relation betweenwords by using the combined of TF-IDF and Google distance method reaches85%andmore, and the accuracy can reach95%when compute general or low semantic relationwords. The semantic annotation system which based on Chinese Wikipedia can accuratelylabel words which have more than one field semantic.
Keywords/Search Tags:Wikipedia, Semantic Annotation, Conditional Random Fields
PDF Full Text Request
Related items