Font Size: a A A

A Method Of Proper Nouns Identification Based On Double-level Model Of NSP And CRFs

Posted on:2011-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:J F ShiFull Text:PDF
GTID:2178360308954089Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Proper nouns identification is an essential part of Chinese words segmentation, and the crux of Chinese information processing. Therefore, proper nouns identification is of great significance to improve the research of network information retrieval, text classification, speech recognition, machine translation and other important fields.This paper introduces a double-level model based on NSP(N-Shorest Path) and CRFs(Conditional Random Fields) to identify proper nouns. First of all, at the low level, it uses the rough-segmentation method based on the N-shortest path, and gets the segmentation result set of character string, thus the correct segmentation result is covered with maximum probability; then, at the high level, conditional random fields model uses the features submitted by the low-level model, single and compound features of proper noun to tag the text. Adding the complex feature is conducive to mine the context information of proper nouns and improves the accuracy of experimental system. The paper introduces the storage structures of some proper nouns dictionaries, and thus the finding and matching speed of system are improved effectively. The experiment chooses the Beijing University corpora of People's Daily in 1998 as training and testing data, and the recall rate and accuracy rate of place names are 87.42% and 83.99%, and the F-value is 85.67%; the recall rate and accuracy rate of organization names are 72.13% and 70.38%, and the F-value is 71.24%.
Keywords/Search Tags:Proper nouns identification, N-shortest path, Conditional random fields, Recall rate
PDF Full Text Request
Related items