Font Size: a A A

Chinese Entity Relation Extraction Base On Syntactic And Semantic Analysis

Posted on:2018-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X GanFull Text:PDF
GTID:1318330515490895Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the explosive growth of the user scale, a large amount of text data has emerged on Web. There is an urgent need to extract useful knowledge from the massive text data automatically, quickly and effectively. Computer applications and systems also rely on specific "knowledge" to perform specific functions. For example, the application of Internet search, automatic navigation, automatic question answering, Machine Translation, speech recognition and other systems cannot do without the support of the knowledge base. Entity relation extraction is one of the key technologies of knowledge base construction,aiming to extract semantic relations between two named entities from natural language texts.Entity relation extraction has become a popular research topic in the field of Natural Language Processing, data mining, machine learning and artificial intelligence. In addition, it has important application value and wide application prospect, and has being received widespread attention in the industrial community.Previous research on entity relation extraction has mainly focused on English corpus and less on Chinese corpus. Although research on Chinese relation extraction has got some encouraging results, there is still room for improvement in terms of precision and recall. Especially, there is little research on Chinese implicit relation extraction. Therefore,we mainly focus on Chinese entity relation extraction.On the one hand, there exist a large number of long sentences with complex sentence structure in Chinese texts on Web. These sentences are usually very long and often contain an amount of entity information which results in many entity pairs. In addition, the quantitative distribution of entity types is uneven. The data characteristics of these texts bring great challenges to the task of Chinese explicit relation extraction.For Chinese complicated sentences, previous feature-based methods cannot effectively extract features representing entity relation types, which lead to the low performance of explicit relation extraction. Taking tourist domain as an example, there still exist many problems related to Chinese explicit relation extraction as follows:(1) Previous feature-based methods are often used to extract explicit relations.Usually, these methods only use dependency syntactic features of two entities separately without considering their order, which cannot really represent the syntactic structure of entities in the sentence. As a result, dependency syntactic feature has no obvious effect on relation extraction, resulting in poor performance of explicit relation extraction.(2) Classic studies on verb features generally select the verb closer to the latter entity as a verb feature. Because a sentence with long distance of entity pairs usually contains multiple verbs, many of these methods cannot effectively extract the verb that really represents relation types. As a result, they exhibit low precision in explicit relation extraction. In addition, many of existing methods cannot help to detect relations and to distinct relation types between entities. Sometimes they even cause a lot of noise,especially in relation detection task.On the other hand,there exist a large variety of implicit relations in Chinese text.Compared with explicit relations, implicit relations are lack of direct evidences to support specific relation types. Therefore, implicit relations usually need to integrate semantic associations of sentence content with relevant linguistic information, specific context semantic information and related domain knowledge for indirect inference.However, because of the ambiguity of semantic relations, the complexity of sentence structures, the uncertainty of context information and the imbalance of data, the task of implicit relation extraction is more complicated and more difficult, and it cannot be implemented using a general model. There still exist a lot of difficulties related to Chinese implicit relation extraction research as follows:(1) Due to great differences between Chinese and English languages, existing methods of English implicit relation extraction cannot be directly used to solve the problem of Chinese implicit relation extraction. Therefore, Chinese implicit relation extraction requires special considerations.(2) Because Chinese sentence patterns are complex and diverse, different sentence structures contain so many different entity relation types, there should be different ways to utilize external knowledge to extract implicit relations, which cannot be extracted using a general model. Therefore,it is necessary to analyze and understand sentence structures and context, so as to construct more refined implicit relation extraction models.In order to address the problems above, we mainly focus on Chinese explicit relation and implicit relation extraction. The main research contents are summarized as follows:(1) Chinese explicit relation extraction based on syntactic and semantic featuresWe obtain three features that can effectively express entity relation types from the perspective of syntax and semantics. We propose a method of Chinese explicit relation extraction based on syntactic and semantic features to improve the performance of explicit relation extraction. Specific features are included as follows:?) The feature of dependency relation composition. According to the order of two entities in a sentence, we obtain the feature of dependency relation composition by their respective dependency relations between two entities. This feature has a certain degree of differentiation, which can better reflect characteristics of relation types between the corresponding entities.?) The verb feature with the nearest syntactic dependency. Combining with the sentence structure characteristics, we propose the verb feature with the nearest syntactic dependency from syntactic perspective. It aims to capture such feature for two entities in a sentence by means of dependency relation and part of speech. Due to different paths between two entities with direct or indirect semantic association, we propose an algorithm for extracting the verb feature with the nearest syntactic dependency based on the analysis of dependency path.?) The feature of directional core verb. For sentences containing directional verbs such as "(?)arriving,(?)coming,(?)going" and so on,the verb feature with the nearest syntactic dependency cannot effectively represent the real relation types between entities, which affects the performance of explicit relation extraction. We propose the feature of directional core verb based on the verb feature with the nearest syntactic dependency mentioned above.(2) Chinese implicit relation extraction based on company verbsIn many text domains such as tourism and news domains, there exist many implicit entity relations triggered by company verbs. Therefore, based on the understanding of sentence structures and context, we try to use company verbs as the core, and construct the inference rules of implicit relation extraction based on company verbs. We integrate explicit relation extraction with implicit relation extraction. To take full advantages of explicit and implicit relation extraction, we use explicit relations to infer implicit relations, which is applied to solve the problem of Chinese implicit relation extraction based on company verbs in the filed of tourism and news. Specific contents include the following:?) Selection of company candidate sentences. The company verb vocabulary is constructed by using a variety of methods and is used to select candidates from sentences containing company verbs.ii) Classification of company candidate sentence patterns. According to different roles of company verbs in the sentence, we employ dependency parsing to decide company candidate sentence patterns and to classify them.iii) Recognition of company components. Due to the different roles of company verbs in different sentence patterns, methods of recognizing components from entities involved in company actions are also different. Using dependency parsing, we design corresponding component recognition algorithms for five kinds of company candidate sentence patterns.iv) Construction of inference rules for implicit relations. According to whether additional knowledge and the company verb are in the same sentence, we propose two kinds of inference methods for implicit relations based on company verbs: one for implicit in-sentence relations and the other for implicit between-sentences relations,where an in-sentence relation and a between-sentences relation refer to a relation that is inferred from a single sentence and multiple sentences, respectively. According to characteristics of company semantic components and the scope of company verbs, we design three rules for implicit in-sentence relation extraction based on company verbs.By exploiting the antecedent of the zero anaphora in a company sentence, we establish the associations between subject components and object components in different sentences, which are then used to extract implicit between-sentences relations based on company verbs. We integrate a machine learning method with rules and use explicit relations to infer implicit relations. The proposed strategy can effectively solve the problem of Chinese implicit relation extraction based on company verbs, so as to find more entity relations more accurately, and to improve the overall performance of Chinese relation extraction.The main innovative work of this thesis is concluded as follows:(1) From semantic perspective, the verb feature with the nearest syntactic dependency and the feature of directional core verb are proposed. The verb feature with the nearest syntactic dependency can better represent relation types between entities,which is beneficial to the identification of the specific relation types. Furthermore, it can effectively solve the problem of uneven data distribution, and significantly improve the performance of Chinese explicit relation extraction. The feature of directional core verb further improves the effect of verb features on Chinese explicit relation extraction.(2) An implicit in-sentence relation inference method based on company verbs is proposed. We design a sentence pattern classification algorithm and the corresponding component recognition algorithms for company candidate sentences. In addition, we design three rules for implicit in-sentence relation extraction based on company verbs,which effectively solve the problem of Chinese implicit in-sentence relation extraction based on company verbs.(3) An implicit between-sentences relation inference method based on company verbs is proposed. From the perspective of the zero anaphora, we design inference rules for implicit between-sentences relation extraction based on company verbs. The proposed method effectively solves the problem of Chinese implicit between-sentences relation extraction based on company verbs.
Keywords/Search Tags:relation extraction, relation detection, implicit relation, syntactic feature, semantic feature, company verb
PDF Full Text Request
Related items