Font Size: a A A

Chinese Organization Name Recognition Based On Latent Semantic Analysis And Multiple Features Fusion

Posted on:2017-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2348330512969373Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The Named Entity Recognition plays key function in the development of the natural language processing technology, which has be applied in many Internet researches, such as Information Retrieval, Data Mining, Machine Translation and Information Extraction. As the key point and difficulties in Chinese information processing, scholars at home and abroad have proposed many excellent solutions, while recognition accuracy still can not satisfy people's real requirement. Based on the above reason, the thesis proposes a novel method by combining the Chinese organization names recognition of Latent Semantic Analysis and Multiple Features fusion together to solve it. The contributions of this thesis are listed below.(1) By studying the current sequence marking method, the thesis proposes a method based on LDA (Latent Dirichlet Allocation) and CRF (Conditional Random Fields) sequence marking. Firstly, the theme possibility obtained by LDA model training is added to CRF model training, one more character reflecting text theme can be extracted at the moment of extracting basic characters like word or words so that the currency of sequence can be increased. Besides, this method has low relationship with corpora, so it shows a good expansibility and needs less personal interferes.(2) By studying character and type of organization name, the thesis classifies it into simple organization name, normal organization name and complex organization name. According to different styles, two retrieve algorithms are designed and finally accuracy is enhanced by utilizing an error correction model.By experiments made on People's Daily corpora, CCL corpora and BBC corpora using this method, a higher average recognition level is obtained on these three corporas.
Keywords/Search Tags:Named Entity Recognition, Chinese organization name recognition, Conditional random field, Latent Dirichlet Allocation, Model training
PDF Full Text Request
Related items