Font Size: a A A

Web Information Integration Based On Synonymous Entities Recognition

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XuFull Text:PDF
GTID:2308330473460233Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The abundance of Web information makes it easier to gather information, and it becomes increasingly important way for enterprises to build and organize business data via Web information. To accurately and efficiently integrate massive Web information is also the important basis of analysis applications with functions such as information dynamic aggregation, market intelligence analysis, public opinion analysis, business intelligence, etc. However, the problems existing in Web information, such as multisource, mass, heterogeneous, make the integration more difficult. Besides, due to different resources, more than one entities may refer to a single entity, which is called the synonymous entity problem. This problem results in the huge redundancy of system data, which affects not only the final service data quality, but also the user experience. Therefore, how to reduce the problem of synonymous entity in data integration, becomes a big challenge of Web information integration.(1) This thesis introduces the related work and techniques in data integration, especially in data gathering, data extraction and data fusion. In data fusion, it introduces the research background and current research of identifying synonymous entities.(2) This thesis proposes a similarity calculation algorithm based on search engine. It uses Snippets returned by search engine to calculate the similarity between named entities, and it then uses the similarity to further realize the algorithm of synonymous entity recognition (FSE) based on search engine. We use the named entity data collected from real world to conduct our research, and compare it with search engine-based similarity algorithms. The F value of FSE reaches to 93.59%, and it is higher than the second highest algorithm-VarientDice by 1.8%, and it is higher than the lowest algorithm-VarientJaccard by 3.15%.(3) A framework of Web information integration based on synonymous entity recognition is also designed in this thesis. It also applies search engine-based synonymous entity recognition algorithm to the framework. Based on this framework, we develop an agricultural information integration system based on Web information integration-Huinong Agricultural Information System.
Keywords/Search Tags:Web information integration, synonymous entity recognition, similarity calculation, search engine
PDF Full Text Request
Related items