Research Onchinese Named Entity Recognization

Posted on:2013-12-05

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H X Jiang

Full Text:PDF

GTID:1228330374999505

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Named Entities Recognition (NER) is to recognize proper entites like person names, location names, organization names, etc. in natural language. NER is a fundamental research task in Natural Language Processing (NLP). As an extension of Chinese segmentation task, Chinese NER has been widely used in information extraction, information retrieval, information recommendation, machine translation and other NLP applications. NER is playing a more and more important role in improving their performance. Currently, with the new requirements of NER, there are three main challenges in NER research:(1) NER has been applied in diverse situation from internet servers and PC to mobile devices with limited hardware-capabilities, where NER need meet the performance requirements and reduce model complexity;(2) With the rapid growth of the network data, new NEs are created rapdly, NER need to make use of large-scale data sets so that it can deal with new NEs effectively;(3) Named Entities (NE) contain not only person names, location names, organization names, but also publishing entities (film names, book names, music names), mercantile entities (brand, product names, product version), and so on.Focusing on the above challenges, our work makes the follows major contributions to Chinese NER:(1) To conquer the hardware limitation of mobile devices and meet the performance requirements, we present a knowledge-combined Second-order Hidden Markov Model (So-HMM) and efficient decoding algorithm for NER task in mobile devices. Then we build a recommendation system of mobile applications based on NER from short messages.The experimental results show that the NER performance is significantly improved by expending language and exploiting external knowledge, and the model complexity is significantly decreased by using a novel second-order backward A*decoding algorithm. The model achieves a satisfying performance in hardware-limited mobile devices.(2) We build an NE resource database of multiple types of entity from large-scale Web data set. Beginning with a small amount of labeled corpus, active learning (AL) strategy has been used to train CRF-based NE taggers, then the taggers are used to extract more named entities to build NE resource database from real time Web data; For different entity types with different distributions on internet, multiple entity types have been divided into two categories, for which we build different NE resource database based NER models respectively.The experimental results show that a high-quality NE resource database can effectively compensates the insufficient NE patterns instatistical model training. Simultaneously, the improved AL utility function can significantly reduce the workload of manual annotation of data.(3) We use the NE resource database based NER system to assist the analysis of web intentions in an intention analysis system which is based on the learning to rank method.The experimental results show that NEs have stronger meaning integrity and specificity than key words. It therefore can describe the core contents of web page in a better way. The NER system we built has positive contributions in intention analysis system.

Keywords/Search Tags:

Named Entities Recognition, Second-order HiddenMarkov Model, Conditional Random Field, Active Learning, NamedEntities resource database, Intention Analysis

PDF Full Text Request

Related items

1	Named Entity Recognition Based On Conditional Random Fields Chinese Research
2	Study On Text Emotion Analysis Based On Supervised Learning
3	Named Entity Recognition Of Middle School Mathematics Knowledge Based On Deep Learning
4	Named Entity Recognition Based On BiLSTM-CRF
5	Research On Chinese Named Entity Recognition And Field Application In Inspection And Quarantine
6	Research On Named Entity Recognition Based On Deep Learning
7	Research Of TCM Literature Knowledge Discovery Method Based On Conditional Random Field Model
8	Chinese Named Entity Recognition Based On Neural Network And Language Model
9	An Action Recognition Method Using Improved Hidden Conditional Random Field Model
10	Application Of Several Types Of Statistics And Deep Learning Methods In Enterprise Named Entity Recognition