Chinese Location Recognition Based On Statistics And CRF

Posted on:2019-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:W Teng

Full Text:PDF

GTID:2428330572955292

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With high speed of the development of the Internet in our society,as one of the most important information carriers in the production and life of human society,the network contains a great deal of valuable geographical location information.However,most of this information is in the form of web texts.Therefore,how to extract unstructured geographic information from web texts becomes the most important issue.The recognition of Chinese location is to extract the geospatial entities from Chinese digital texts.Chinese characters used in place names often have strong word formation abilities and diverse features,So it is difficult to accurately locate the location and boundaries of geographical names from the text.In this regard,this paper deeply analyzes the characteristics of Chinese geographical names,and converts the identification of geographical names into sequence labeling problems,and trains conditional random field model to recognize locations,At the same time,it designs an algorithm for the recognition of complex geographical names and modifies the results of the CRF model trained before.The main contributions of this article are as follows:(1)In view of the fact that the existing models have low recognition accuracy for complex geographical names,this article designs a algorithm based on information entropy and point mutual information to deal with this problem.The algorithm uses a location database to generate a relevance dictionary,and based on this,the correlation between the adjacent words in the text is calculated to determine the boundary of complex location names and its contexts,and finally realize the recognition of complex location names.(2)A rules-based window detection algorithm for the location recognition is proposed.In the existing research,the rules method combined with conditional random field models is mainly used as a supplemental means to the CRF recognition result,and plays the role of correction,disambiguation,and recall.However,because of its directing effect on the recognition results of the upper layers,there is no ability to make up for other unrecognized names hidden in the original text,and thus the impact is limited.For purely rule-based methods of geographical name recognition,it needs to apply a polling rule set to the sentence in the recognition process,and the efficiency is very low.This paper improves the above two shortcomings,applies the rule recognition method directly to the original text,and uses the geographical name feature words to coarsely locate the suspected place names in the original text,and further confirms or excludes them in combination with the detection window and rule sets.From the actual results,this method can effectively use the existing set of rules for the identification of geographical names,can better coordinate with the CRF model,and improve the effect of recruitment.(3)By crawling the authoritative website NGAC's geography article title data,this article makes a complex geographical corpus which provides a reliable corpus of training and verification for the identification of complex geographical names with the rules of The principle and application of Chinese information extraction.

Keywords/Search Tags:

Chinese Location Recognition, CRF, Information Entropy, PMI

PDF Full Text Request

Related items

1	Research On Chinese Recognition Algorithm Based On Maximum Entropy Regularization
2	The Research On Named Entity Recognition In Chinese Information Processing
3	Research On WLAN Indoor Location Alogrithm Based On Information Entropy
4	Recognition Of The Chinese Name Based On Maximum Entropy Model
5	Research Of Chinese Text Categorization Algorithms Based On Information Entropy
6	Research Into Chinese Names Entity Recognition Based On The Maximum Entropy Model
7	A Study Of Chinese Hierarchical Syntactic Boundaries Based On Phrase Structure
8	Personal Location Recommendation Based On Information Entropy And Trust Mechanism
9	The Application Of Information Entropy In Machine Learning Algorithm
10	Research On Intelligent Chinese Character-making Without Library Based On Topology And Statistics