Font Size: a A A

Key Technology Research On Entity Relation Mining In Big Data Environment

Posted on:2016-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:H L FuFull Text:PDF
GTID:2298330467472567Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Research on entity relation mining is a subject of great significance. Seeking and finding the relations between entities from the text is the fundamental task of entity relation mining. With the continuous development of computer technology and the Internet, the age of big data has arrived. The growth of text amount provides more data of entity relation meanwhile puts forward higher requirements for the methods of entity relation mining.This paper summarizes the research of entity relation mining and focuses on the supervised entity relation mining method which is the dominant method in this area. The related technologies of the supervised entity relation mining method are analyzed. This paper studies the defect of those technologies and the problems in the big data environment. The main work of this paper is stated as follows.1. This paper proposes a new semantic word sequence kernel function based on POS weighting for the computation of the relation similarity. This new kernel function expands the existing semantic word sequence kernel, adding the influence factor of the POS information while retaining the advantages of the original algorithm, such as the analysis of syntactic and the semantic knowledge. It makes the similarity measurement more coincident with the linguistic features. So the result of the similarity measurement is more accurate and the correctness of the relation mining is improved.2. Classification algorithm is the essential part of the supervised entity relation mining method. For the problem of the low computational efficiency of KNN when the training dataset is huge and the problem of the unbalanced training dataset, this paper proposes a training dataset reducing algorithm based on Fisher criterion and hierarchical clustering. The redundant data of the training sample can be eliminated and the distribution of the training dataset can be more balanced. This algorithm improves the computational efficiency while maintaining good precision and recall rates.3. This paper focuses on the characteristics of the processing data of the relation extraction system and designs a big data processing scheme based on the MapReduce which is the current big data processing framework. By applying the design pattern of the MapReduce, this scheme can transfer the operation of relation mining from stand-alone environment to parallel environment. For the vast amount of text data, the computational efficiency can be significantly improved. Finally, the relation mining experiment is designed and implemented by using JAVA language. Study on the above theories is verified, and the experiment achieves good results.
Keywords/Search Tags:Text mining, Entity relation mining, Kernel function, KNN, MapReduce
PDF Full Text Request
Related items