Font Size: a A A

Using Word Embedding And Text Feature For Event Extraction

Posted on:2016-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:R Q SongFull Text:PDF
GTID:2348330488973873Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, various medium are constantly emerging. As a kind of data format, the number of articles is increasing, text mining has been a hot research field. Recently, with the rapid development of computer technology,how to make computers learn and understand natural language has become a new research direction in the field of text mining. The difficulty of making the computer to understand text is how the text is converted to digital and how to make your computer to understand the meaning of the text. How to convert text into digital belongs to feature extraction. In this thesis, word is the representative of word embedding and parts of speech, etc. Let the computer understand the meaning of the text is like semantic analysis of sentences. The dependency is used in this thesis. Recently, text mining has been successfully applied in the bio medical field. Text mining is used to extract event. In this thesis, there are three contributions on event extraction as follows:(1) A new method for classification of unbalanced data based on G-measure and ensemble learning is proposed. Due to unbalanced data in our life, we propose a new method for classification of unbalanced data based on G-measure and ensemble learning. This thesis proposes G-measure to optimize the error rate of weak classifier on the training set in the ensemble algorithm. Then the error rate is used to optimize the weights of training samples and the weights of weak classifiers. The proposed method can improve the prediction of the minority class, making the ensemble algorithm more suitable for solving the imbalance classification.(2) An event extraction method based on text features is proposed. In this thesis, we distinguish two phases in a process of event extraction, trigger extraction and relation extraction. In the process of trigger extraction, we propose a method to select sample. In the process of relation extraction, we propose a method to solve cyclic references in event extraction. The proposed method is shown to have good performance across the Bio NLP2013 GE(Genia Event Extraction) task corpora.(3) An event extraction method based on word embedding and text features is proposed. In recent years, bag of word has drawbacks of being high dimensional, sparse and discrete.While word embedding is continuous in and relative to the vocabulary size. It is capable of representing words distributional characteristics. In this thesis, we present the experiments using word embedding as token features to extract complete events including triggers and their arguments. The result demonstrates that the introduction of word embedding improves the result, and is comparable to the state-of-the-art solution. A binding event may have more than one theme. The extraction of binding event themes consists of two steps.Firstly, themes are predicted to be associated with the triggers. Then, the candidate themes are constructed with argument in possible combinations. Then, the combinations are tested by a SVM classifier, and the one with the highest confidence score will be kept. The result demonstrates that the introduction of word embedding improves the result, and is comparable to the state-of-the-art solution.
Keywords/Search Tags:Text Mining, Classification of Imbalance Problems, Event Extraction, Word Embedding
PDF Full Text Request
Related items