Font Size: a A A

Research And Application Of CRF Named Entity And Entity Relationships Based On Recognition

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q KanFull Text:PDF
GTID:2298330467972667Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named Entity Recognition, is a method to divide elements in the text into predefined categories, such as names, places, organization names, time, currency, and so on. As natural language information-bearing unit, named entity recognition is basic research in the field of text information processing. In a variety of natural language processing techniques including information extraction, information retrieval, machine translation, answering system etc., named entity recognition is an essential component. In the field of entity recognition,foreign research institutions have made outstanding achievements in identifying the entity for the English, recognition accuracy up to90%.For Chinese characteristics,there is a large number of semantic difficulties and this problem is still in the research and exploration stage in domestic.So there is a great significance for Chinese entities recognition.The work of the project mainly in the following aspects:First, the study analyzed a large number of machine learning models that used to identify the relationship between the entity and entity identification, including hidden Markov model, maximum entropy model and support vector machine model.Analyze the strengths and weaknesses of these models in the physical aspects of identification.Based on the new model CRFs detailed analysis, we can see that the model is a structurally inherits the advantages of the classical model, and overcome the above defects.Experiments show that CRFs model has a much better show in natural language processing, particular in named entity recognition field.Therefore we chosen CRFs model to identify entities and entity relationships.Second,CRFs model can take advantage of the property before and after the term of the current word more accurately judgments due to the long distance dependent However, this model also has some drawbacks:too dependent feature selection.The result of entity recognition depends largely on the extent of the feature vector selection for good or bad.The research of entity identification for Chinese is still in its infancy. The feature selection is still exploration and research, and there is no clear rules of feature selection.According to the characteristics of the system corpus, we use feature selection methods to make feature template from simple to complex Therefore, the construction of feature selection and feature template is the focus of the work of the system.In the entity recognition stage:Select the word itself and the part of speech as a basic characteristic feature to make the templates;Select the entity features to make feature template constructed by converting the corpus;Select the entity indicates word to make instruction word template and add to the relevant professional dictionary information;Select the combination of features to make the features template.In addition to the selected entity relationship recognition of the basic characteristics of speech feature,we also proposes a method to identify the syntactic features characteristic of the relationship.Through the analysis of syntactic structure, we select the common features, characteristics depend on the verb, entities-entities path characteristics and entity verb path dependent features identified as important characteristics of entity relationships.Third,the system use CRFs model as the framework and use the People’s Daily corpus in1998as a training set to identify entity identification and entity relationships.In parsing web pages, we propose a combination of HTMLParser technology,it can extract "The Demi-Gods and Semi-Devils" from Baidu.Experimental data show that the proposed feature template has reached a good degree of accuracy, recall.In terms of the model, the research of CRFs feature selection approach provides a rule reference.In terms of system, this is an attempt and exploration that the machine learning algorithm applied in literature.
Keywords/Search Tags:Name entity recognition, conditional random field, entity relationships extraction, feature vector model
PDF Full Text Request
Related items