Font Size: a A A

Research On Sentence-level Entity Relationship Extraction With Thai Features

Posted on:2018-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q ShenFull Text:PDF
GTID:2358330518460479Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Thai language entity relation extraction is an important part in natural language processing.Its function has a direct impact on the research of upper application,such as event extraction,knowledge base construction and search engine.However,these features of Thai language lead to the unclear borders among sentences,including complicated morphology,frequent uses of modal particle and very few punctuation marks in writing.All these features add the difficulty of intelligent information processing.This article combines Thai language features and statistical machine learning models.Then it has a research on Thai language sentences segmentation,Thai language named entity recognition and Thai language affiliated entity relation extraction.Finally,it gets three aspects of research results.(1)In the text of Thai language,there is just a space character at the end of a sentence to divide sentences.At the same time,there are also many space character which do not appear at the end of a sentence.All these make the border of Thai sentence vaguer.This article firstly collects some major grammatical rules related to the border of Thai sentence.Then,it uses the Maximum Entropy classification method to change the task of dividing Thai sentence into the question of categorizing the space character.It combined with contextual features of space character to train the Maximum Entropy classification model.So it will categorizes space character in Thai text.Finally this article uses some related grammatical rule base to correct the results from Maximum Entropy classification model.Compared to the method of just using the Thai grammatical rules,this article simplifies work of building many complicated grammatical rules.It only builds grammatical rules about how to recognize the border of Thai sentence.And by the Maximum Entropy classification model,it makes the use of the optimized Thai contextual features.So it can get a good result in the task of dividing Thai sentence,and it also has a stable function,which is based on the Thai language named entity recognition.(2)This article converts the task of Thai language named entity recognition into marking the lexical sequence of Thai sentence.Therefore,combined with the lexical contextual features of Thai language,this article uses the Hidden Markov model and the Conditional Random Fields model to build respective models on training corpus of Thai language entity recognition.Next,it uses testing corpus of Thai language entity recognition to test the functions of these models.As a result,the experimental results also verify that methods in this article are effective in Thai language named entity recognition.And these methods also lay the basis for entity relation extraction of Thai language sentences.(3)Based on the Thai language named entity recognition,this article changes the task of Thai language entity relation extraction into categorizing the triple table of entity relation.Firstly,under the situation of lacking of Thai language entity relation corpus,this article uses the parallel sentence between Thai and Chinese and Chinese-Thai dictionary to build the Thai language entity relation corpus.Then,combined with the contextual features around the Thai entity vocabularies,this article uses the Maximum Entropy classification model to recognize the affiliation entity relation among the candidacy of triple table of entity relation.Therefore,it will be able to extract the affiliation entity relation of Thai language.Finally,through many experiments,these methods in the article are tested to be effective in affiliation named entity relation extraction of Thai language sentences.
Keywords/Search Tags:Thai Language Sentences Segmentation, Named Entity Recognition, Entity Relation Extraction
PDF Full Text Request
Related items