A Method Of Proper Nouns Identification Based On Double-level Model Of NSP And CRFs

Posted on:2011-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:J F Shi

Full Text:PDF

GTID:2178360308954089

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Proper nouns identification is an essential part of Chinese words segmentation, and the crux of Chinese information processing. Therefore, proper nouns identification is of great significance to improve the research of network information retrieval, text classification, speech recognition, machine translation and other important fields.This paper introduces a double-level model based on NSP(N-Shorest Path) and CRFs(Conditional Random Fields) to identify proper nouns. First of all, at the low level, it uses the rough-segmentation method based on the N-shortest path, and gets the segmentation result set of character string, thus the correct segmentation result is covered with maximum probability; then, at the high level, conditional random fields model uses the features submitted by the low-level model, single and compound features of proper noun to tag the text. Adding the complex feature is conducive to mine the context information of proper nouns and improves the accuracy of experimental system. The paper introduces the storage structures of some proper nouns dictionaries, and thus the finding and matching speed of system are improved effectively. The experiment chooses the Beijing University corpora of People's Daily in 1998 as training and testing data, and the recall rate and accuracy rate of place names are 87.42% and 83.99%, and the F-value is 85.67%; the recall rate and accuracy rate of organization names are 72.13% and 70.38%, and the F-value is 71.24%.

Keywords/Search Tags:

Proper nouns identification, N-shortest path, Conditional random fields, Recall rate

PDF Full Text Request

Related items

1	Research Of Chinese Phrase Identification Based On Conditional Random Fields
2	The Research On Short Text Mining With Conditional Random Fields And Improved LSTM
3	Research On Chinese Prepositional Phrase Identification Based On Multi-layer Conditional Random Fields
4	SAR Image Change Detection Based On Conditional Random Fields
5	Research On Online Detection Method Of Reputation Fraud Campaign Based On Conditional Random Fields
6	A Study On Chinese Personal Name Recognition Based On Conditional Random Fields
7	An Self-adaptive BLP Optimal Model Employing Conditional Random Fields
8	Recognition Of Named Entity In Electronic Medical Records Based On Cascaded Conditional Random Fields
9	A Study On Chinese Location Names Recognition Based On Conditional Random Fields
10	Research On Short Utterance Semantic Recognition Method Based On Cascaded Conditional Random Fields