Font Size: a A A

Research On Cross-language Information Extraction Based On Deep Learning

Posted on:2017-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2348330488458695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The performance of machine learning based information extraction system relies on the quality and quantity of training corpora. However, labeled data in different languages are very imbalanced. The lack of labeled data limits the research progress in Chinese and other resource-scarce languages. In order to solve this data imbalance problem, cross-language information extraction (CLIE) is proposed, which leverages resources in one language (source language) to improve the information extraction performance in another language (target language). However, the gap between the source language and target language limits CLIE performance. Besides, the errors introduced by machine translation systems inevitably affect the performance of CLIE system. This paper focuses on CLIE based on deep learning technique, the contents consist of the following three aspects:(1) Two-view Cross-language Information Extraction Based on Denoising Autoencoders (Two-view DAE)This paper adopts denoising autoencoders (DAE) for CLIE task. Noises are properly added to the training examples in DAE reconstruction, for enhancing the robustness to the translation errors. Meanwhile, we train the classifiers in English view and Chinese view respectively, and combine the outputs from the two views to obtain the final classification results. The two-view approach could make full use of the complementary advantages in English and Chinese, which bridges the language gap between English and Chinese. The experiments are conducted on cross-language sentiment classification (CLSC) and cross-language hedge cue detection (CLHCD) tasks. The experimental results show that both DAE and two-view approach are effective and could improve CLIE performance.(2) Cross-language Information Extraction Based on Bilingual Word Representations (BWR)This paper proposes an approach to learning bilingual word representations for CLIE. The learning process consists of two phases:unsupervised learning phase and supervised learning phase. In the unsupervised learning phase, DAE is used to learn bilingual word representations at the same time, capturing bilingual semantic information between the two languages. In the supervised learning phase, label information is integrated into bilingual word representations to improve CLIE performance. The experimental results on CLSC and CLHCD tasks show that the learned bilingual word representations could effectively capture both bilingual semantic information and label information. This could overcome the problem that it is difficult for two-view approach to combine English and Chinese semantic information deeply.(3) Cross-language Information Extraction based on Joint Representation Learning (JRL)This paper adopts long short memory term RNN (LSTM) to learn word semantic and context information representations jointly for CLIE. In the learning process, we use word semantic representations and context sentiment (hedge) information representations to learn the semantic information and sentiment (hedge) information of sentiment words (hedge cue) in specific contexts. The experimental results show that LSTM could learn bilingual word semantic representations for CLIE effectively. Meanwhile, joint representation learning approach could further improve the performance of CLIE.In this paper, we conduct the research on cross-language information extraction based on deep learning technique. Two-view DAE approach is proposed to enhance the robustness to the translation errors and bridge the language gap between English and Chinese. BWR approach is proposed to effectively capture both bilingual semantic information and label information. JRL approach is proposed to solve the data sparseness problem and to learn the latent semantic information. Those deep learning based approaches effectively improve CLIE performance, and could provide valuable references for future work on deep learning based CLIE research.
Keywords/Search Tags:Cross-language Information Extraction, Two-view Approach, Deep Learning, Bilingual Word Representation, Joint Representation Learning
PDF Full Text Request
Related items