Research Into Names Automatic Recognition Based On Korean Corpus

Posted on:2019-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:D Jin

Full Text:PDF

GTID:2405330545958881

Subject:Asian and African Language and Literature

Abstract/Summary:

PDF Full Text Request

The automatic recognition of Korean nanme is one of the subtasks of the named entity reconition.With the half-century development of the Chinese and Enhlish information processing,great progress has achieved in the fields of the construction of basic resources,Compared with the Chinese and English information processing,the Korean information processing started relatively late,but it has obtained the distinctive scientific payoffs in minority language information processing.The Korean information processing has accomplished the processing of characters and word and entered into the stagenof sentences processing.After finishing the tasks of the superficial lexical analysis of phrase structure’s relation identifition and phrase boundary defining,the Korean information processing is stepping forward to the deep lexical analysis.At the same time,the research of Korean information retrieval,automatic summarization,text categorization and machine translation is still growing.This is paper analyzes the difficulty of personal nanme recognition,makes introduction to existing approaches,and makes comparison among these approaches.Then we build some linguistics resource such as personal name sample set,surname set and personal name corpus.After making statistical analysis on them,we also build personal name words list,probability list of surnames,segementation lexicon,context information list of personal name,context information list of surname being single world,prefix and suffix list of surnames etc,which are necessary for the process of recognizing personal name in text.The person names identification has important effect in many fields,for example information retrieval,machine translation and text proofread.This paper presents a hierarchy weighting model for Korean person name identification.This model is based on the surname and context boundary information,and makes use of a large amount of statistical data,which are extracted from real name library and real text corpus.Using the algorithm based on this model and the strategy for solving contradiction,it bring the person nanmes identification to pass.The test is carried out,the tesing sample,sentences containing person names,are randomly extracted from the 2016.5～2017.5 Yanbian Daily News Corpus.

Keywords/Search Tags:

Korean Corpus, Names, Identification Methods

PDF Full Text Request

Related items

1	Based On The Names Of Mongolian Corpus Automatic Identification
2	Investigaying Study On The Chinese Charactes Of Korean Origin
3	Personal Names Identification In An Unknown Language
4	A Study On Chinese Organization Names Based On Dynamic Circulating Corpus
5	Identification Research On The Corpus Of Four Extant Name-borrowing Novel In Han Dynasty
6	Linguistic Analysis On The Names Of The Restaurants In Nanchang City
7	The Analysis Of Korean-Chinese Spoken Corpus Based On The Corresponding Forms Of Korean ’(?)’ In Chinese
8	Research On Korean Big Data Text Mining Based On Statistical Methods
9	The Influence Of Korean Chinese Characters On Korean Students' Learning Of Chines
10	Research On The Methods Of Chinese Noun Compounds Identification And Classification