Font Size: a A A

The Chinese Organization Name Recognition Based On SVM And HMM Algorithm

Posted on:2018-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhuFull Text:PDF
GTID:2348330515978437Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)technology is an integral part of a variety of Natural Language Processing(NLP)technologies such as information extraction,information retrieval,machine translation and online quick Q&A system.The primary mission of Chinese Named Entity Recognition is to recognize Chinese names,Chinese location names,Chinese organization names,expressions of times,quantities,monetary values,percentages,etc.Compared to the other Chinese named entities,the recognition of Chinese organization names is the most difficult task of Chinese named entities recognition because of its complex structure and various forms and other features.A machine-learning approach using statistics-based method(SVM model and HMM model)and rule-based method is proposed to recognize Chinese organization names.In terms of the characteristic of word-formation of Chinese organization names,the Chinese organization name is divided into two parts,including prefix and suffix of Chinese organization name.A characteristic-word dictionary is built by extracting the suffix of Chinese organization name from training set.We can determine the posterior boundary of an organization name by judging a characteristic-word appearing in the Chinese text whether it is a suffix of Chinese organization name,which included in the characteristic-word dictionary.Thus it can be seen that we can abstract the process described above to a binary classification problem.The Support Vector Machine(SVM)model can be used to solve the problem described above because of its significant advantages in solving the binary classification problem.It is more difficult to recognize the prefix of Chinese organization name,on account of its a lot of features such as complex structures and various forms.However,under the condition of the suffix of Chinese organization name has been recognized,the problem can be abstracted to a Hidden Markov Model(HMM)to determine the front boundary of an organization name.A Chinese organization name can be recognized correctly after executing the processes described above.The results show that the Chinese organization name recognition based on SVM and HMM algorithm is effective.The method proposed in this paper has a good recognition effect.In the closed test,the precision rate,recall rate and F-value can reach up to 96.29%?88.70%?92.34%;in the open test,the precision rate,recall rate and F-value can reach up to 90.17%?81.94%?85.61%.
Keywords/Search Tags:Natural Language Processing(NLP), Named Entity Recognition(NER), Chinese organization name, SVM model, HMM model
PDF Full Text Request
Related items