Font Size: a A A

Research On The Recognition Of The Names Of Persons, Places And Institutions In Lao Language

Posted on:2019-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HuangFull Text:PDF
GTID:2438330563457654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of China's social economy,economic cooperation and social exchanges with Southeast Asian countries have deepened.And language,as a bridge between countries and countries,plays a key role in the middle.As we all know,the language of the ASEAN countries is different,which has set obstacles to the continued strengthening of cooperation and communication among all countries.How to cross the linguistic gap so that countries can continue to cooperate and exchange without being limited by language barriers,and it has become a topic worthy of study among countries.Therefore,relying on modern computer science and technology and natural language processing techniques to help provide information exchange between countries has become a feasible and efficient method.At present,in the field of natural language processing,many studies have been conducted on languages with a large number of users such as Chinese and English.However,few studies have been conducted on ASEAN countries with small languages.With the further deepening of political,economic and cultural cooperation and exchanges between China and Laos,the study of natural language processing in Lao is very necessary and of great significance.Therefore,after studying some grammars and characteristics of Lao,this paper conducts a study on the name entity recognition of names,place names and organization names.The following results have been achieved:(1)By using a Tri-training-based Lao-named entity recognition method.Using the enhanced three-body training algorithm,three basic classifiers,support vector machine,conditional random field,and maximum entropy,are combined to form a more effective classification model.Finally,under the condition that Lao corpus is not enough,we use the existing marker corpus to select corpus according to the best sample selection strategy,and realize the recognition of Laos naming entities based on a small number of marked corpus.(2)The method of naming entity recognition by using the Laotian language based on migration learning is used.From the source dataset,select examples that have high commonality with the target data,and then transfer these examples to the target domain to assist them in learning and training the model of the target domain so that the marked sample in the target domain is insufficient or The way to learn without label samples is well solved.This method can solve the problem of the scarcity of corpus in terms of Laoese named entities and achieve the effect of obtaining more corpus through a small amount of corpus training.(3)Through the algorithm model,a prototype system for the identification of Laoese named entity is designed and implemented,which provides help for the further research on Laoese named entity recognition.
Keywords/Search Tags:Laotian, named entity recognition, Tri-training, Transfer learning, feature selection
PDF Full Text Request
Related items