Font Size: a A A

Algorithmic Studies On Relation Extraction From Chinese Short Texts

Posted on:2021-03-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y WangFull Text:PDF
GTID:1368330629480816Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Due to the heterogeneity,multi-source and varied qualities of massive Web data,it is significantly challenging to harvest knowledge from them efficiently and accurately.Relation extraction,a basic task in Natural Language Processing(NLP),aims at obtaining structured relational facts from unstructured textual data automatically,providing technical support for large-scale knowledge graph construction and intelligent Web knowledge services.With the widespread application of deep learning techniques,the accuracy of neural relation extraction models has been greatly improved.However,existing research generally focuses on sentence-level relation extraction for English language.Different from English,Chinese expressions are more flexible,with relatively unfixed grammatical structures and word formation rules.Hence,a large amount of semantic knowledge expressed in short Chinese texts is difficult to be extracted by existing algorithms effectively.This thesis mainly studies the problem of relation extraction from Chinese short texts.According to the uniqueness of linguistic characteristics of Chinese short texts,the corresponding relation extraction task has many challenges compared with traditional work.The grammatical structures and semantics of short texts are generally incomplete.The semantic relations expressed in some short texts belong to the category of commonsense knowledge.Therefore,the contextual expressions of such relations are highly sparse.Compared with English,the low accuracy of Chinese language analysis,together with the lack of annotated datasets for short-text relation extraction,also increases the difficulty of this problem.We conduct in-depth research from the following three aspects: i)hypernymy extraction based on word embeddings,ii)knowledge-enhanced semantic relation extraction,and iii)non-hypernymy relation extraction and semantic understanding.The framework of relation extraction from short Chinese texts is also illustrated,addressing these challenges well.Major contributions of this thesis are summarized as follows:(1)Hypernymy Extraction Based on Word Embeddings.The taxonomy is a hierarchical representation and an important organization form of concepts in knowledge graphs,consisting of a large number of hypernymy relations.Compared with English,the language expressions in Chinese are highly flexible.Hence,it is infeasible to extract Chinese hypernymy relations by simple text matching algorithms.In this thesis,we integrate neural language models and Chinese linguistic characteristics to address this issue by employing word embeddings as the representations of Chinese terms.The proposed algorithms model the representations of Chinese hypernymy relations,that is,learning the projections of Chinese hyponyms to their corresponding hypernyms in the word embedding space.We first propose a semi-supervised hypernymy extension model,which iteratively discovers new hypernymy relations from Web data,and solves the problem of the limited sizes of Chinese hypernymy datasets.To model the decision boundary of Chinese hypernymy and non-hypernymy relations accurately,two hypernymy classification models are further presented,based on transductive and fuzzy orthogonal projection learning,respectively.Experimental results show that the proposed models outperform state-of-the-arts,achieving the accurate extraction of Chinese hypernymy relations.(2)Knowledge-enhanced Semantic Relation Extraction.The above hypernymy extraction models based on word embedding rely on training sets in specific domains.They do not leverage other types of data sources and related tasks.Based on word embedding projection models,we explore the design of knowledge-enhanced relation extraction algorithms.Briefly,such algorithms harvest semantic relations from three perspectives,namely i)multiple knowledge sources,ii)multiple languages,and iii)multiple types of lexical relations.We first propose the Taxonomy-Enhanced Adversarial Learning framework,which exploits numerous hypernymy relations in large-scale taxonomies.It injects such knowledge into projection models trained over specific datasets by deep coupled adversarial learning.Next,the Transfer Fuzzy Orthogonal Projection Model and the semi-supervised version,the Iterative Transfer Fuzzy Orthogonal Projection Model,are proposed by extending the Fuzzy Orthogonal Projection Model.They combine the techniques of deep transfer learning and bilingual lexicon induction for few-shot cross-lingual hypernymy extraction,especially for lower-resourced languages.Finally,due to the existence of multiple types of lexical relations in ontologies,the learning process of Hyperspherical Relation Embeddings are presented,which learns the representations of different lexical relations in the hyperspherical embedding space.Therefore,the projection models can be extended for multi-way classification of lexical relations.Experimental results on the corresponding NLP tasks prove the effectiveness of these models.(3)Non-hypernymy Relation Extraction and Semantic Understanding.There exist a variety of non-hypernymy relations expressed in Chinese short texts.Previous models can only deal with a finite set of pre-defined relation types,which are difficult to extend to open domains and lack the ability of extracting commonsense relations by deep text understanding.In this part,the Pattern-based Non-hypernymy Relation Extraction model is first proposed.It employs graph mining techniques to acquire frequent textual patterns that express rich semantic relations from Chinese short texts.Relations related to these patterns can be extracts by unsupervised learning.As the algorithm can only deal with frequent patterns,we further present the Data-driven Non-hypernymy Relation Extraction model.It has a three-stage data-driven architecture,from Chinese short text segmentation to relation generation,improving the coverage of relation extraction.Finally,we observe that idiomaticity analysis based semantic understanding results in the extraction of more relations from Chinese short texts by deep knowledge reasoning.Hence,a Relational and Compositional Representation Learning framework is proposed,which classifies the idiomaticity degrees of Chinese noun compounds and improves the machine's ability of Natural Language Understanding.Experimental results show that the above algorithms can extract relation accurately and are not restricted to manually defined relation types.In summary,this thesis addresses the problem of relation extraction from Chinese short texts in three aspects.Experiments over public datasets of several NLP related tasks prove the effectiveness of the proposed algorithms.Our research also provides technical foundations for building automatic relation extraction and semantic understanding systems for massive Chinese short texts from the Web.With minimal human intervention,the knowledge in short texts can be fully extracted,beneficial for large-scale Chinese knowledge graph expansion and completion.
Keywords/Search Tags:Relation Extraction, Chinese Short Texts, Hypernymy Relations, Word Embeddings, Semantic Understanding
PDF Full Text Request
Related items