Font Size: a A A

Research On Product Named Entity Recognition And Normalization

Posted on:2012-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:F MeiFull Text:PDF
GTID:2218330362450433Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity and prosperity of e-commerce, more and more researchers begin to concern about e-commerce search technology. As one of core search technology for e-commerce, the product named entity recoginition technology has become an important research issue. In this paper, we research on the definition of product named entity and cropus construction,product named entity recoginition technology and product name normalizaton technology. The dissertation concerns the following aspects:1) According to variation of product named entities in the internet, we proposed the new definition of the composition of product named entity, which is conducive to recognize the composition of product named entity. Based on the definition of product named entity, we developed a detailed specification cropus annotation, and built a high quality product named entity cropus by using semi-supervised methods. On the other hand, in order to successfully carry out the research on product named entity normalization, we gived the definition of product named entity normalization, and builded a hierarchical product named entity library which contains a total of 21240 product names.2) According to the characteristics of the structure product named entity, we divided product named entity recognition into two stages, the first stage recognized the brand, series, type and company name, based on the recognition of the first stage, the second stage recognized the product named entities, and we also gived the product named recognition method based on Hidden Markov model, Maximum entropy model, conditional random field. In the product named recognition method based on Maximum Entropy model and Conditional Random Field, we added the brand features and series features into feature templates, used to trigger the brand named entities, series named entities and type named entities recognized. The Experimental result shows that after added brand feature and series feature into feature template, the F-Measure of product named entity recognition system improved 8.42%. Finally, we compared the three method of product named entity recognition, and the method based on Conditional Random Field achieved the best performance, the system's F-Measure achieved 86.45%.3) According to the ambiguity of product named entities which is caused by abbreviation of product names and many names of one product, we proposed the concept of the product named entity normaliziton. According the characteristics of the structure of product named entity, we gived the product name similarity calculation method based on Edit Distance algorithm, which reached 84.72% accuracy in product named entity normalization. Then we used bootstrapping relation extraction method to extracted the relation between adjacent entities, and derived the relation between the every entities in the text based the transitivity of relation. Finally we use the relation between the entities and the similarity of product name calculation medthod to normalize product named entity, which achieved 88.09% accuracy.
Keywords/Search Tags:product named entity's cropus construction, product named entity recognition, product named entity normalization, maximum entropy model, condition random field
PDF Full Text Request
Related items