Font Size: a A A

Semi-supervised Based Mobile Phone Named Entity Recognition

Posted on:2017-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:J A LeiFull Text:PDF
GTID:2348330509460269Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet, more and more consumers make their purchase decisions after collecting enough information btained by browsing others' comments widely. This phenomenon is especially general in mobile phone field. So for mobile phone manufacturers, collecting feedbacks of consumers is of great commercial value. While on the other hand, user-generated mobile phone name is rarely formal, there may be many abbreviations, mis-spellings and nicknames embedded in the comments that are hard to recognize. In a word, informal product name recognition is not only an interesting and valuable task but also full of challenges.To solve this problem, this paper mainly explores in aspects as followed:(1)After obtaining word vector via word2 vec tool, this paper comes up with a modified k-means algorithm which is based on transliteration mapping model to cluster words. This modified algorithm in this simple way, we can put together nearly all the different expressions of the same entity such as “PLUS?puls”, while separate them from, other noisy words that seems to be syntactically and semantically similar but actually irrelevant due to natural properties of corpus. At last, brand cluster, series cluster, typename cluster and attribute cluster are obtained. The recognizing algorithm containing these four list features can deal with abbreviations, mis-spellings and nicknames well.(2)More than just studying the effectiveness of the list feature and word vector feature for the Chinese and English mixed text, this paper also utilizes the 1/2k-means clustering algorithm to hierarchically cluster the word vector, then compute the binary code according to the word vector, thus find out that the effective 1/2k-means hierarchical cluster feature can further improve performance.(3)This paper presents a new CRF-based semi-supervised learning method to solve the problem of lacking labeled data, which requires minimal manual labeling effort. By analyzing the characteristic of mobile phone name, we semi-automatically obtain the positive set by simple rules and manually pick up negative set, and then they are piped to fuzzy match all the training corpus to get training samples.Finally, we conduct a series of feature combination experiments in a testing corpus with 1000 sentences which contains 20 smart mobile phone brands, the result shows the effectiveness of the list features, 1/2k-means hierarchy feature. The best performance of our system has scored 93.39% of precision, 89.76% of recall, 91.54% of F1 in character level evaluation metric, which is slightly better than the semi-supervised method that combines self-learning with active learning, and also demonstrates the effectiveness of the semiautomatic annotation idea.
Keywords/Search Tags:Named entity, Semi supervise, CRF, Automatic annotation, Transliteration mapping
PDF Full Text Request
Related items