Font Size: a A A

Study On Key Techniques Of Web Focused Entity Relation Query Processing And Analysing

Posted on:2014-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:1108330482455752Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, along with the rapid development of the Internet technology and the explosive growth of Web information, it is important to realize the interactions among the users. A lot of researches focus on improving the personalization and intelligence of Web information retrieval. In this condition, the entity and entity relations in the Web environment become the hot research topic. The entities are used to express the specific objects in the real world, such as persons, addresses and organizations; the entity relations are used to express the latent relationships among the entities, such as the friendship, employment and cooperation. By the traditional methods, the relations are extracted by constructing the domain knowledge with regard to a specific relation type. However, in the Web environment, the relations are kinds of unstructured or self-structured data, so it is difficult to maintain the general and accurate domain knowledge. Additionally, due to the dynamic and diversity of the relations, it is necessary to design novel approaches about relation analysis.For this, the paper will explore the researches on the query and analysis of open relations in the Web environment. Focusing on the following problems, the extraction of informative sub-graph, the modeling of dynamic graphs, the measurement of node similarity and the discovery of roles in the dynamic graphs, the contributions are summarized as follows.The SSORE method is proposed to extract the open relations in the Web environment. SSORE is a kind of self-supervised learning method. Firstly, the sentence patterns are used to output the candidate relation tuples. Secondly, some constraint conditions are used to automatically label the candidate relation tuples. Thirdly, the Max Entropy Model is used to train the features and output the relation classifier. Based on the classifier, the quality of the relations is greatly improved. Additionally, the co-occurrence based disambiguation algorithm is used to distinguish the entities with the same name. At last, all the relations are organized in the graph model, so called the relation graph.For the given focused entities, the SISP framework is proposed to search the informative sub-graph which can best explain the relations among them. In the SISP framework, the evaluation function is defined by considering the structure information, and then the process of extraction is converted into the multi-objective optimization problem. Following the theory of Particle Swarm Optimization, the three steps of initialization, calculating of fitness and update are used to obtain the target informative sub-graph. The experimental results prove that the proposed methods have higher precision and efficiency than others.In many real applications, the relations may change along with the time. Therefore, the relation graph has the character of dynamic. To find the evolution traces of the relations and solve related retrieval or mining problems, the dynamic graph is represented by a group of snapshots. Each snapshot is a static graph expressing the relations for a certain time. According to the dynamic graph model, the holistic and evolutionary factors are considered to measure the similarity between two nodes. For different measurements, the top-k searching in the dynamic graph model is further proposed and solved. According to the experiment, the dynamic graph model is reasonable and the top-k results can fulfill with the requirements of the users.In the dynamic graph model, the nodes may have different behaviors in different time points. And the changing behaviors reflect the possible roles for each node. According the behaviors, the BOM framework is proposed in the paper to solve the role mining problem in the dynamic graph model. Firstly, the Markov Random Field is used to model the evolution of the behaviors by considering the dependency between the behavior and latent state. Secondly, the EM algorithm is used to predict the latent state for each node in each snapshot. At last, the nodes are divided into different clusters by the clustering algorithm, and each cluster stands for a unique role. According to the experiment, the proposed BOM framework performs well in both precision and efficiency.
Keywords/Search Tags:open relation extraction, the searching of informative sub-graph, dynamic relation graph, similarity measurement, behavior role mining
PDF Full Text Request
Related items