Font Size: a A A

Researches Of Geographical Named Entity Recognition Of Chinese Micro-Blog

Posted on:2017-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:D S GuanFull Text:PDF
GTID:2348330488491669Subject:Information security
Abstract/Summary:PDF Full Text Request
With computer technology fully integrated into the social life, the information explosion has accumulated to the extent that nearly triggered a beginning of change. Internets, mobile Internet, Internet of things, Internet of Vehicles, etc., all of these internets are frantically generating the data all the time. It not only makes the world flooded with more information than ever before, and the amount of data is growing in the form of exponentially. Micro-blog as one of the most influential product of Internet age has brought great changes for the publics' daily life. According to statistics, in September 2015, the number of Micro-blog monthly active users(MAU) has reached a staggering 220 million; the average daily number of active users in September(DAU) reaches 100 million, up 30% over the same period last year. Even so, the Internet penetration rate of China is still less than 50%, this figure will continue to grow in the future. With the hot development of the micro-blog platform and its unique characteristics, marketing based on micro-blog, micro-blog search, micro-blog public opinion monitoring and other applications have emerged.Geographical location named entity recognition(GLNER) as part of Named Entity Recognition(NER). It is one of the important tasks of NLP(natural language processing).It is the basis to build answering system, information retrieval and machine translation. On the other hand, GLNE in micro-blog(Twitter posts) often refers to the site of events that occurs.It's the most important part of information extraction. How to effectively extract GLNE from massive micro-blog text can not only promote the further development of NER, but also could better serve the people's daily life.For the Micro-blog extraction of GLNE, this thesis present a splitting method; the GLNE will be split into the traditional geographic entities(TGLNE) and the basic geographic named entities(BGLNE), then the GLNE external features database will be built based on the features of GLNE grammatical feature, GLNE boundary characteristics and so on. Finally,these features are combined with the template of CRF to GLNER, experimental result demonstrate that the proposed method can enhance the performance of named entity recognition based on micro-blog.The detailed steps of this method are as follows:(1) Firstly, analysis and research of the composition structure of GLNE, then split GLNE into two basic components, namely TGLNE and BGLNE and give the formal definition of GLNE.(2) According to the formal definition of GLNE, analyzing the structural features,grammatical features, and boundary features of GLNE etc. to construct the external feature database of GLNE.(3) As the data limitation of the original external feature database of GLNE, in this thesis we exploit three ways to expand the original feature database, namely the way of CSC synonyms thesaurus, HIT-CIR Tongyici Cilin(Extended) and knowledge summary.Defines the important degree of features to avoid the cross effect of different features.(4) Combined the external features' database to build a scientific template of CRF to GLNER.(5) Finally, the experimental results demonstrate that the proposed method is feasible for GLNER of micro-blog, the precious of GLNER reached 82.51%, F-value reached 82.20% and the coverage rate C is 82.91%.
Keywords/Search Tags:Named entity recognition, GLNE, Chinese micro-blog, external feature database
PDF Full Text Request
Related items