The Learning Of Single Class Center And The Application In Binary Relation Extraction

Posted on:2009-12-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z S Li

Full Text:PDF

GTID:1118360272485625

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

To extract the binary relation from web is an important research direction in the field of information extraction. Many literatures had presented learning methods based on self-training mechanism. In these methods, an initial system is trained on a small labeled data set. Then the system labels the reliable candidate data to re-train itself for better performance.These Literatures show that the above methods are efficient in extraction of binary relation. But no literature tries to analyze the methods strictly.This paper transforms the pattern learning in the extraction of binary relation into the learning of centre of single text class. In text vector space, the vectors in the small neighborhood region of the initial centre could be labeled as the reliable data. This paper aims to answer the key problem: what nature the data set should owns so that the self-labeled data can definitely improve the learning of single class centre.This paper solves the key problem through the study on the nature of text vector space. For conquer the defects in the description for distribution of"hard"data set by Gaussian mixture model, this paper presents a new model: TGMK model based on the partitions acquired by k-means algorithm, and exposes the relations among the TGMK model, k-means algorithm and Gaussian mixture model. The experiment result shows that TGMK model is suitable as the description for the text data set of multiple classes.Based on k-means algorithm, this paper presents a new algorithm: single-mean algorithm. This paper proves that if the data set of multiple classes is suitable to be described by 1-TGMR model which is the generalization version of TGMK model, the output centre of single-mean algorithm will definitely converge to the actual centre of data set from the initial centre. The above researches solve the key problem perfectly. The experiment shows that single-mean algorithm is efficient on the text data set of multiple classes, which also shows that the learning method based on self-training mechanism is efficient in the extraction of binary relation.This paper creates a formal learning model for the extraction of binary relation based on single-mean algorithm, and presents a new score method for candidates and a new self-labeled method against the particularity of the extraction of binary relation from web. This paper uses the formal model to acquire Chinese Q-A patterns and Chinese-English terminology pairs. Differently with the previous work, this paper presents new methods to learn Chinese Q-A patterns and Chinese-English terminology patterns based on self-training mechanism, which reduces the dependence of labeled data set to the maximum extent. This paper also utilizes the heuristic rules to improve the scoring methods for the patterns and candidates. The experiment results show that compared with the same kind of systems, our systems have better performances on smaller labeled data set.

Keywords/Search Tags:

self-training, the learning of centre of single class, the extraction of binary relation, Gaussian mixture model, learning of Q-A patterns, learning of terminology patterns

PDF Full Text Request

Related items

1	Study On Texture Representation Based On Local Binary Patterns
2	Research On Entity Relation Extraction Method Based On Deep Learning
3	Research On Relation Extraction Based On Deep Learning
4	Causal Relation Extraction Based On Reinforcement Learning And Large Language Representation Models
5	Research On Zero-shot Learning Methods Based On Attribute
6	Research On Transfer Learning Based Relation Extraction Methods
7	Face Descriptor Based On LBP Sampling Learning
8	Research On Nonlinear Process Fault Diagnosis Based On Data-driven
9	Gaussian Mixture Model Guided Dictionary Learning Method For Single Image Deraining
10	Low-resource Speech Representation Learning And Its Applications