Font Size: a A A

People Relation Extraction Method Based On Feature Vector

Posted on:2016-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:S S FanFull Text:PDF
GTID:2308330452468987Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of the Internet, the network contains alarge variety of information, such as relationships between the characters and places entity,characters and character entity. However, this information has not been effectively utilized.How to dig out the relationship between the characters from the network is a matter ofgrowing concern. At the moment, entity extraction technology which based on the featurevector is relatively mature, and it is one of the most commonly used method.Extraction method which based on the the feature vector to convert the entityrelationship extraction into classification problem, because SVM (Support Vector Machine,SVM) classification accuracy is very high, so people generally put it in combination with themethod based on feature vectors. According to the defects of the general methods for relationextraction,The main work of this paper is as follows:1. Generic multi-classification SVM methods exists unclassifiable regions.if use it tocharacters relation extracIion will make some relationships are not classified, thus affectingthe results of the characters relation extraction. In response to this phenomenon, DAG-SVMmulti-classification method is introduced to solve the problems. Since DAG-SVM exist "erroraccumulation" defects, in shis paper, the characters relations are divided into two types ofkinship relations and other social relations, and these categories as root to alleviate the"cumulative error" phenomenon. By using general multi-classification method, FMSVMmulti-classification and DAG-SVM multi-classification method for the comparison. Theresults show that the proposed method for extracting character relationships accuracy hasimproved to some extent.2. In the people relation extraction, the spatial dimension of feature is often very high.resulting in sparse vector problem, which will affect the relationship extraction efficiency. Inresponse to this phenomenon, the first,character relationships are divided into six categories,and then Introduced document frequency, information gain, mutual information and χ2statistics of these four feature selection algorithm to educe the dimension of the feature space.Finally, the use of SVM classifier to extract the people entity relationship. Experimentalresults show that the four feature selection algorithm not only can guarantee extractionperformance, but also effectively reduce the vector space dimension drops and dramaticallyimprove the relation extraction efficiency. Which, χ2statistical algorithm works best,followed by information gain.
Keywords/Search Tags:Relation extraction, Support Vector Machine, Feature Selection, Multi-classification, DAG-SVM, FMSVM
PDF Full Text Request
Related items