Font Size: a A A

Research On The Method Of Identifying Anonymous Names In Laos

Posted on:2017-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:M J YangFull Text:PDF
GTID:2358330488464842Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER), also known as "proper noun recognition", refers to entities that have special meaning in Lao texts, including personal names, place names and organization names. NER is the important basic tool for Information Extraction(IE), Question Answering System(QAS), Machine Translation(MT)and other application areas. In addition, NER plays an important role in the Natural Language Processing(NLP) technology to practical use in the process. In recent years, the relationship development between China and the ASEAN is more and more rapid. Yunnan is a important bridge for the opening of the southwest of China, the mutual communication in language is a prerequisite to achieve exchanges of political, cultural and economic. At present, there are many studies about NER in many languages, such as English, Chinese and Thai, etc., but it is still very weak on the study of Lao language. Therefore, in order to promote exchanges of China and the ASEAN, it is very meaningful to study Lao NER. According to characteristics of Lao named entities, the paper mainly carries out studies of method of Lao personal name, location name and organization name recognition. The main results are as follows: (1) Lao Personal Name and Location Name Recognition based on Conditional Random Fields with Heuristic InformationAccording to characteristics of Lao Personal names and Location names, the candidate named entities are recognized by Conditional Random Fields(CRFs). Then, we use the heuristic information to determine candidate named entities. Finally, named entities which have not been discovered by CRFs model are further recognized by using the named entities word list, and these final named entities are obtained. The experimental results show the method proposed is effective, and it can improve the effect of NER by using machine learning method with heuristic information. (2) Lao Personal Name and Location Name Recognition Based on Semi-supervised Cascaded Conditional Random Fields with Generalized Expectation CriteriaBecause experts are very few in Lao domain and labeled corpora are difficult to be obtained, a new semi-supervised method is proposed based on CascadedConditional Random Fields(CCRFs) and Generalized Expectation Criteria(GE Criteria). For the proposed method, firstly, we selected some representative Lao personal names and location names as labeled features whose expectations can be obtained. Secondly, the GE Criteria is employed to score expectations and return vectors as constraints. After that, the first layer model is built to extract simple Lao personal names and location names by CRFs. Finally, the second layer model is built by using the results of the first layer CRFs model to identify complicated and nested Lao entities. The effectiveness of the proposed method is demonstrated by providing different training data and comparisons with other experiments. (3) Lao Organization Name Recognition Based on Dictionary and Conditional Random FieldsIt is more complex for characteristics and relationship of context of Lao organization names. In addition, some features are different with features of personal name and location name. Therefore, the paper proposes an approach of recognition about Lao organization name based on dictionary of Lao and CRFs. Firstly, By combining the dictionary features of Lao organization names which are built by dictionary of Lao and the other features, such as clue word features, "and" features etc., the paper recognizes the Lao organization name by using CRFs.
Keywords/Search Tags:Named Entity Recognition, Lao, Conditional Random Fields, Generalized Expectation Criteria, Semi-supervised Learning, Heuristic Information, Entity Feature
PDF Full Text Request
Related items