Font Size: a A A

Research Into Names Automatic Recognition Based On Korean Corpus

Posted on:2019-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:D JinFull Text:PDF
GTID:2405330545958881Subject:Asian and African Language and Literature
Abstract/Summary:PDF Full Text Request
The automatic recognition of Korean nanme is one of the subtasks of the named entity reconition.With the half-century development of the Chinese and Enhlish information processing,great progress has achieved in the fields of the construction of basic resources,Compared with the Chinese and English information processing,the Korean information processing started relatively late,but it has obtained the distinctive scientific payoffs in minority language information processing.The Korean information processing has accomplished the processing of characters and word and entered into the stagenof sentences processing.After finishing the tasks of the superficial lexical analysis of phrase structure’s relation identifition and phrase boundary defining,the Korean information processing is stepping forward to the deep lexical analysis.At the same time,the research of Korean information retrieval,automatic summarization,text categorization and machine translation is still growing.This is paper analyzes the difficulty of personal nanme recognition,makes introduction to existing approaches,and makes comparison among these approaches.Then we build some linguistics resource such as personal name sample set,surname set and personal name corpus.After making statistical analysis on them,we also build personal name words list,probability list of surnames,segementation lexicon,context information list of personal name,context information list of surname being single world,prefix and suffix list of surnames etc,which are necessary for the process of recognizing personal name in text.The person names identification has important effect in many fields,for example information retrieval,machine translation and text proofread.This paper presents a hierarchy weighting model for Korean person name identification.This model is based on the surname and context boundary information,and makes use of a large amount of statistical data,which are extracted from real name library and real text corpus.Using the algorithm based on this model and the strategy for solving contradiction,it bring the person nanmes identification to pass.The test is carried out,the tesing sample,sentences containing person names,are randomly extracted from the 2016.5~2017.5 Yanbian Daily News Corpus.
Keywords/Search Tags:Korean Corpus, Names, Identification Methods
PDF Full Text Request
Related items