| Named entity recognition aims to identify entities with specific meanings in the text,including names of people,names of places,names of organizations,proper nouns,etc.Named entity recognition plays an important role in many downstream natural language processing tasks.Since words are the basic unit of English,the input of English entity recognition is a sequence of words.Unlike English entity recognition,Chinese entity recognition methods mostly accept characters as input,so the methods of English entity recognition and the methods of Chinese recognition are also quite different.However,named recognition methods which accept characters sequence as input cannot utilize lexicon information which can help the model to identify entities more correctly.Therefore,how to make full use of lexicon information in character-based entity recognition methods has become a research hot point in Chinese entity recognition methods in recent years.“Four insurances and One Fund” is a very important social insurance system in our country,but there are few researches on natural language processing applications in this field.Therefore,in order to meet the needs of the people for acquiring knowledge in the field of“Four insurances and One Fund”,a field knowledge base of policies and regulations will be constructed,as well as based on this knowledge base,it is very important to construct a question answering system in the field of “Four insurances and One Fund”.The main contributions of this paper are as follows:(1)By studying the characteristics of the texts of policies and regulations in the field of“Four insurances and One Fund”,a part-of-speech combination rule is proposed to quickly filter out terminology in the field of “Four insurances and One Fund”.The terminology which is filtered out is used as a custom dictionary for the word segmentation tool to segment the text of policies and regulations,and then combined with manual labeling to construct a machine learning entity recognition dataset in the field of “Four insurances and One Fund”.(2)Based on the problem that character-level entity recognition cannot utilize lexicon information and the characteristics of long terminology in the texts in the field of “Four insurances and One Fund”,an entity recognition method that combines characters and words is proposed.This method uses word frequency,improved mutual information,and backward forward word frequency as lexicon information,which can better assign weights to all words matched by each character,dynamically adjusts the weight of each part of the lexicon information according to the semantics by introducing attention mechanism.Experimental results on three datasets show that,compared with some existing methods,the entity recognition method proposed in this paper can achieve a higher F1 value.(3)Through the entity recognition method proposed in the paper,field terminologies are identified from the text of policies and regulations in the field of “Four insurances and One Fund”,and the field dictionary of “Four insurances and One Fund” is constructed through web crawler.And then based on the field dictionary and the field entity recognition model,a field question answering system of the “Four insurances and One Fund” that can answer conceptual questions is constructed. |