Font Size: a A A

Design And Implementation Of Chinese Microblog Oriented Product Named Entity Recognition And Normalization Algorithm

Posted on:2016-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:X X YangFull Text:PDF
GTID:2308330476954997Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, social network platforms such as microblog gradually rise. Users are not only the viewer of the information; they also become publisher of information. The Internet has been transformed into an interactive communication platform from an information publishing platform. There is vast amounts of information on the microblog platforms which are powered by Sina, Tencent and etc. These information is very valuable in business. Because of the fastest spreading speed and the largest number of users, Microblog platform has become an important source of information. In this Internet era, network marketing, public opinion monitoring and business intelligence has attracted more and more attention. However, recognizing product named entity from the vast amounts of micorblog posts is the foundation and prerequisite of achieving public opinion monitoring and business intelligence.At present, the methods used in traditional media text processing are also used to recognize the product named entity from microblog post. However, those methods ignored the lack of contextual information, the omission and informal expression. As a result, the performance is very poor and the ambiguity problem is very serious. In order to solve these problems, this thesis mainly research on product named entity recognition from microblog post. The main jobs and innovations of this thesis are as follows:1) A cascaded conditional random fields(CCRFs) and word embedding based method is proposed to recognize the complex structure product named entity, in this method product knowledge base with attributes classification is introduced into the model to improve the performance. Experiments show that this method can improve the precision and recall of complex structure product named entity recognition by 0.6% and 3.2%.2) A word embedding based approach which can make good use of the global contextual information is proposed to select features. In view of the lack of contextual information in the microblog post, word embedding based and cluster based approachs are used to select features, and cluster based method requires less train corpus. Experiments show that word embedding based and cluster based method can improve the F1-measure of product named entity recognition by 3.12% and 3.34%.3) A global and local context information and user interactions based approach is proposed to normalize the product named entity. Experiments demonstrate that the F1-measure of this approach is improved by 6.92% than the knowledge based method.4) A prototype system is designed and developed to recognize and normalize product named entity from Chinese microblog post. In this system, the precision and recall of the recognition and normalization are considered, the time and space efficiency are also took into consideration. One-by-one and batch processing methods are provided in this system.
Keywords/Search Tags:Microblog, Product Named Entity Recognition, Cascaded Conditional Random Fields, Word Embeddings, Entity Normalization
PDF Full Text Request
Related items