Key Technology Research On Entity Relation Mining In Big Data Environment

Posted on:2016-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:H L Fu

Full Text:PDF

GTID:2298330467472567

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Research on entity relation mining is a subject of great significance. Seeking and finding the relations between entities from the text is the fundamental task of entity relation mining. With the continuous development of computer technology and the Internet, the age of big data has arrived. The growth of text amount provides more data of entity relation meanwhile puts forward higher requirements for the methods of entity relation mining.This paper summarizes the research of entity relation mining and focuses on the supervised entity relation mining method which is the dominant method in this area. The related technologies of the supervised entity relation mining method are analyzed. This paper studies the defect of those technologies and the problems in the big data environment. The main work of this paper is stated as follows.1. This paper proposes a new semantic word sequence kernel function based on POS weighting for the computation of the relation similarity. This new kernel function expands the existing semantic word sequence kernel, adding the influence factor of the POS information while retaining the advantages of the original algorithm, such as the analysis of syntactic and the semantic knowledge. It makes the similarity measurement more coincident with the linguistic features. So the result of the similarity measurement is more accurate and the correctness of the relation mining is improved.2. Classification algorithm is the essential part of the supervised entity relation mining method. For the problem of the low computational efficiency of KNN when the training dataset is huge and the problem of the unbalanced training dataset, this paper proposes a training dataset reducing algorithm based on Fisher criterion and hierarchical clustering. The redundant data of the training sample can be eliminated and the distribution of the training dataset can be more balanced. This algorithm improves the computational efficiency while maintaining good precision and recall rates.3. This paper focuses on the characteristics of the processing data of the relation extraction system and designs a big data processing scheme based on the MapReduce which is the current big data processing framework. By applying the design pattern of the MapReduce, this scheme can transfer the operation of relation mining from stand-alone environment to parallel environment. For the vast amount of text data, the computational efficiency can be significantly improved. Finally, the relation mining experiment is designed and implemented by using JAVA language. Study on the above theories is verified, and the experiment achieves good results.

Keywords/Search Tags:

Text mining, Entity relation mining, Kernel function, KNN, MapReduce

PDF Full Text Request

Related items

1	Research On Text Mining In Biomedical Literature
2	Research And Implementation Of Web Text Mining System Based Mapreduce
3	Research On Named Entity Relation Extraction Based On Web Text Mining
4	Research On Entity Relation Extraction Of Aluminum-silicon Alloy Based On Text Mining
5	Feature Coupling Generalization And Its Application In Text Mining
6	Research Of Chinese International Cooperation Elements And Relation Mining Based On Web Diplomatic News
7	Research On Associated Issues In Biomedical Text Mining Based On Discriminative Models
8	Research On Deep Learning Based Biomedical Entity Relation Extraction Algorithm
9	Research On The Extraction Of Entity Relationship In Chinese Field Based On Multi - Core Fusion
10	Research On Kernel Based Entity Relation Extraction