Font Size: a A A

Research On Discovering Relationship Between Web Entities

Posted on:2017-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhaoFull Text:PDF
GTID:2308330485480015Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the number of sites and pages on the Web is growing at an exponential rate of rapid growth. The Web has become a huge data source having amounts of data. Web Integration Systems aim to integrate data from various Web sources efficiently and provide high quality data for applications such as market intelligence, business intelligence and public opinion analysis. How to efficiently, comprehensively and accurately integrate the valuable information on Web in order to provide data for market intelligence analysis, search engine, intelligent question and answer and other systems to make the knowledge base of market intelligence analysis and intelligent question and answer richer, help reasoning perfect results and return accurate data from search engine for users, gradually become a research hotspot and difficult point for the research areas, likely data integration, information retrieval, natural language understanding and so on. Web Data Integration Systems (WIS) collect data mainly from big volume, high-quality Deep Web sites and integrate all data into structure data with global schema. Therefore, in the background of velocity and volume of Web data, the data provided by WIS has limitations as follows.1. Things in WIS are inter-related, such as COMPETE and COOPERATE relations between two companies, these relations are valuable for follow-up analysis and decision-making. Nevertheless, for that data in WIS is from limited high quality Deep Web sources, it is hard to get such relations from the structure data residing in the WIS.2. How to accurately and quickly label the relationship of Web entities in Web Data Integration Systems for maximizing the use of the user is a problem.In this paper, we search on mining and labeling the relationship between Web entities in Web Data Integration Systems. This paper researches on entity relationship discovery and label problems. The contributions are as follows:1. This paper presents a semantic relationship discovery algorithm based on clustering and vector feature abatement. The method ensures the accuracy of the single relationship discovery also the multi-relationship discovery.This method uses the search engine to get the external documents and entity information in Web Data Integration Systems, and structure the feature vector for the relation. The method discovers the relationship by the clustering and vector feature abatement, and multi-relationship in vector feature abatement process.2. This paper presents a semantic relationship labeling algorithm based on ensemble learning. This method improves the accuracy of labeling entity-relationship by improving the accuracy of similarity determination of relationship. At the same time, this method reduces the cost of calculation.This method integrates four different methods. For the SVM, the method optimizes the computing. The method does not need to train special SVM and choses several SVM as the candidate to speed the method up. The method uses the determination of whether two relationships being similar to label the relationship. Experimental results show that this method improves the accuracy of determining the similarity between the entity relationships, also the accuracy of the relationship labeling, and reduces the cost of calculation.3. This paper presents a method to mine latent semantic relations between entities based on two-phase Clustering. This method uses a search engine to get the external documents and related entity information in Web Data Integration Systems.The method uses a large amount of external documents to extract the related context and entities and structure the feature vector for the relation. The method gets the clusters have the same relation to the target entity by the first cluster and the clusters have the same relation by the second cluster. Experimental results show that the method gets higher recall and F-measure.
Keywords/Search Tags:entity relationship, relationship similarity, latent relationship
PDF Full Text Request
Related items