Font Size: a A A

Study And Implementation Of The Web Semantic Data Extraction And Knowledge Fusion

Posted on:2018-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q DiFull Text:PDF
GTID:2348330518998974Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the network,the semantic web is widely spread as an opportunity of Web 3.0.By adding “metadata” to the documents in the world wide web,semantic web makes the Internet and its data semantic and knowledge,so that the computer can understand a variety of data.An increasing number of experts of Natural Language Processing and computer field study and analyze semantic and semantic data to construct intelligent networks.Recent years have witnessed a proliferation of large-scale knowledge bases,including Wikipedia,Freebase,YAGO,Microsoft's Satori,and Google's Knowledge Graph.However,these can not meet the needs of the rapid development of data for intelligent systems and semantic web.In order to obtain more comprehensive and accurate knowledge from the semantic web,it is essential to design an efficient method to extract the semantic data from web pages and fuse knowledge.Through the deep analysis of the web page,there are two kinds of data in the web page: semi-structured structure information and form information,unstructured text information.The present methods of semantic data extraction can only deal with one of them.With the help of natural language processing technology machine learning ideas,they achieve good results on a number of standard datasets.However,there are still some problems that the performance of semantic data extraction and the availability of generating knowledge is low in the complex structure and complex sentences of the web pages in open-domain,In this paper,we propose a new semantic data extraction pattern on web pages.Firstly,it obtains and processes the web data to get the structure information and text information from noise information.On the one hand,by mapping analysis of web structure information and domain ontology,an initial core ontology is obtained.And core ontology is expanded by a study of the structure information and semantic information to extract semantic data from structure information.On the other hand,the paper proposes a method named Multiple Order Semantic Relation Extraction(MOSRE),which applies multiple order,a conceptual expression used in formal logistics,to build semantic dependency structure pattern for extracting information from hybrid unstructured texts in open-domain with deep semantic analyses.MOSRE splits and reconstructs sentences to a strict hierarchical binary structure,called Multiple Order Semantic Tree,to convert each semantic data into a binary structure.And these semantic data become unified and normative entity after semantic refinement process.Besides,the paper implements the sorting based entity linking and concept hierarchy based knowledge expansion to improve the availability of the knowledge generated by these semantic data.Through the calculation of the similarity between the characteristic words of the ambiguity pages and entity,an entity is linked to a concept in Wikipedia.The knowledge is expanded by concept hierarchy of conceptual dependency networks.Besides,statistics based knowledge fusion is applied to these knowledge to build a knowledge base.Finally,the paper applies these methods to a knowledge extraction system,achieving the semantic data extraction,eneity linking,semantic expansion,and fusion of knowledge.And the function in each module of the proposed system is verified by experiments on the Wikipedia page.Furthermore,MOSRE has been tested on SENT500 and KBP datasets,achieving the F1 value of 83.8% on SENT500 and 35.5% on KBP,and over the existing methods.
Keywords/Search Tags:Semantic web, Semantic data, Entity relation, Knowledge base
PDF Full Text Request
Related items