Semi-supervised Based Mobile Phone Named Entity Recognition

Posted on:2017-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:J A Lei

Full Text:PDF

GTID:2348330509460269

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, more and more consumers make their purchase decisions after collecting enough information btained by browsing others' comments widely. This phenomenon is especially general in mobile phone field. So for mobile phone manufacturers, collecting feedbacks of consumers is of great commercial value. While on the other hand, user-generated mobile phone name is rarely formal, there may be many abbreviations, mis-spellings and nicknames embedded in the comments that are hard to recognize. In a word, informal product name recognition is not only an interesting and valuable task but also full of challenges.To solve this problem, this paper mainly explores in aspects as followed:(1)After obtaining word vector via word2 vec tool, this paper comes up with a modified k-means algorithm which is based on transliteration mapping model to cluster words. This modified algorithm in this simple way, we can put together nearly all the different expressions of the same entity such as �PLUS?puls�, while separate them from, other noisy words that seems to be syntactically and semantically similar but actually irrelevant due to natural properties of corpus. At last, brand cluster, series cluster, typename cluster and attribute cluster are obtained. The recognizing algorithm containing these four list features can deal with abbreviations, mis-spellings and nicknames well.(2)More than just studying the effectiveness of the list feature and word vector feature for the Chinese and English mixed text, this paper also utilizes the 1/2k-means clustering algorithm to hierarchically cluster the word vector, then compute the binary code according to the word vector, thus find out that the effective 1/2k-means hierarchical cluster feature can further improve performance.(3)This paper presents a new CRF-based semi-supervised learning method to solve the problem of lacking labeled data, which requires minimal manual labeling effort. By analyzing the characteristic of mobile phone name, we semi-automatically obtain the positive set by simple rules and manually pick up negative set, and then they are piped to fuzzy match all the training corpus to get training samples.Finally, we conduct a series of feature combination experiments in a testing corpus with 1000 sentences which contains 20 smart mobile phone brands, the result shows the effectiveness of the list features, 1/2k-means hierarchy feature. The best performance of our system has scored 93.39% of precision, 89.76% of recall, 91.54% of F1 in character level evaluation metric, which is slightly better than the semi-supervised method that combines self-learning with active learning, and also demonstrates the effectiveness of the semiautomatic annotation idea.

Keywords/Search Tags:

Named entity, Semi supervise, CRF, Automatic annotation, Transliteration mapping

PDF Full Text Request

Related items

1	Automatic Extraction Of Chinese-English Named Entity Pairs Based On Bilingual Aligned Corpus
2	Research On The Construction Of A Wikipedia-based Chinese Named Entity Corpus
3	Research On Named Entity Equivalents Automatic Acquisition Method Based On English-Chinese Parallel Corpus
4	Automatic Approaches To Develop Large-scale TCM Electronic Medical Record Corpus For Named Entity Recognition Tasks
5	Research On Japanese-Kana And Chinese Named Entity Equivalents Automatic Acquisition Using Inductive Learning
6	The Methods And Researches Into Construct Chinese-Japanese Named Entity Translation Equivalents
7	Generation Of Smoking Behavior Dataset Based On Semi-automatic Annotation
8	Semi-supervised Named Entity Recognition
9	Research Of Automatic Summarization Based On Named Entity
10	Design And Implementation Of Speech Semi-automatic Annotation System