Font Size: a A A

The Method Study And System Implementation Of Named Entity Recognition In The Finicial Field

Posted on:2016-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2308330479491529Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of big data processing technology, internet finance has entered the area of big data as well. Facing the vast amounts of finance-focused literatures, parsing them with natural language processing technology has become the inevitable trend of technology development. As the important basis in the field of natural language processing, named entity recognition provides basic technology support for a variety of natural language processing technologies such as information extraction, informatio n filtering, information retrieval, question answering system and so on. Therefore, it has the pivotal realistic significance and use value to develop a system which can recognize the named entities in the field of finance such as stock name and stock code.This paper discusses the background of this program, development and application of related technology in detail by searching relevant literatures. Named entity recognition system has been studied thoroughly, and solutions and technical measures have been determined in the premise of summarizing the demands. In view of the named entity recognition, this paper adopts Conditional Random Field(CRF) model, and combine Co-Training method in the period of model training, which not only improve model recognition performance, but also can reduce a lot of manpower material resources caused by tagging corpus. We use Viterbi algorithm for named entity recognition. In addition, we set the system on the Hadoop framework so that parallel processing way can be used to shorten the running time, in order to solve the problem of too much time taken by model training and entity recognition.This system can be divided into two modules in design, that is, model training and entity recognition. Model training module can get a CRF model by training models based on Co-Training method with selected tagged corpus and feature templates. Named entity recognition module can recognize stock name and stock code from financial articles linked with financial news, companies’ annual reports, individual stock reports and so on. Recognition procedure uses Viterbi algorithm, which transforms the recognition problem into sequence labeling problem. Finally, we evaluate and compare the recognition performance of models, so that advantages of the CRF model and feasibility and effectiveness of Co-Training method can be verified.After testing, the system has achieved two stated functions, namely model training and entity recognition, which satisfies the functional and nonfunctional requirements in the demand analysis. The system has currently been running.
Keywords/Search Tags:Named entity recognition, CRF model, Co-Training method, Hadoop framework
PDF Full Text Request
Related items