Font Size: a A A

Study On Product Named Entity Recognition In The Domain Of Research Reports

Posted on:2018-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:C JiangFull Text:PDF
GTID:2348330542964529Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Named Entity Recognition is one of the basic research of Natural Language Processing,the stand or fall of the effect of Named Entity Recognition may directly affect the effect of the follow-up work.In the area of financial research reports(thereafter referred to as research reports).The product-word is a common named entities,the recognition of these product named entity have contributed to dig deeper information from research reports,and it is highly significant for the follow-up study.Based on the analysis of a large amount of research reports,according to the potential rules of named entity existing in research reports,the article proposes the corresponding scheme of product named entity recognition,the research method is as follows:(1)The paper select the conditional random fields as the sequence tagging model,and after the introduction of common characteristic,such as word,POS,etc.The article proposes an algorithm of feature extraction and optimization based on word2vec.The algorithm firstly extract the top-five words which is similar to vector distance of current word as the optimized features of the model.And then,combining with the seed word dictionary and synonym theory,respectively introduce the seed products frequency,prefix and suffix collocations characteristics to optimize the model.This method not only enrich the characteristics of the model,but improve the accuracy rate and recall rate of the model,and also improve the product-word recognition effect of the model for some problems,such as sparse corpus and lack of tagged corpus.(2)On the basis of the original CRF model,this article proposes an improved algorithm of CRF which is called backtracking CRF algorithm to solve the problems of application.The first step of the algorithm,combining with rules,and setting the product-word label for the words which conform to the rules when calculating-the probability of sequence tagging of the model in order to obtained initial optimal sequence tagging.The second step,select an optimal sequence tagging by backtracking CRF algorithm,and through correcting the false labels which incorrectly set into product-word label in the first step.
Keywords/Search Tags:CRF, Backtracking CRF, Word2vec, Product Named Entity, Financial Research Reports
PDF Full Text Request
Related items