Font Size: a A A

Research On Named Entity Recognition Method Based On Form And Meaning

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:H B WangFull Text:PDF
GTID:2518306554450134Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Named entity recognition is a basic technology in the field of natural language processing.The existing methods have the following problems:In terms of Chinese character representation,there are still problems of the insufficient use of the character structure semantics and poor representation of rare characters;In terms of the named entity recognition model,the characteristics of the form and meaning of the Chinese characters are insufficiently used,and the recognition accuracy is low.The main work of this paper is as follows:(1)Chinese character image data set and meaning data set were constructed to provide data support for glyph vector representation and named entity recognition,Web crawler and Chinese character image automatic generation technology are used to obtain Chinese character images,pinyin,basic interpretations,example words,example sentences,related words and other data,which contains 13000 Chinese character images and 10271 basic interpretations of commonly used characters.(2)A glyph vector representation method using the structural features of Chinese characters was proposed,which solved the problem of insufficient use of character structure semantics and improves the representation ability of rare characters.First,the character structure autoencoder is used to automatically extract the structural features of Chinese characters to obtain the character glyph vector.Then,the validity and completeness of the glyph vector are verified through observation method and quantitative analysis.Finally,an experiment was carried out,and the results showed that:In the Chinese word segmentation experiment,after the combination of the glyph vector and the GloVe or Word2vec vector,the F1 value increased by 0.01 and 0.09 respectively;In the short text semantic similarity calculation experiment,F1 value was on average higher than GloVe and Word2vec increased by 0.17 and 1.42 respectively;In the Chinese character representation experiment,the character representation ability of the glyph vector is better than Word2vec and GloVe,and can represent 13%more rare characters.(3)A named entity recognition method was proposed,which integrated the form and meaning of the characters and improved the recognition accuracy rate.First,a multi-feature embedding layer is proposed based on the effective representation of the form,meaning and context semantics,which combines the characteristics of the form and meaning.Then based on the multi-feature embedding layer,BiLSTM and CRF,a named entity recognition model fused with form and meaning is proposed.Finally,coarse-grained and fine-grained named entity recognition experiments are carried out.The results show that in the coarse-grained and fine-grained named entity recognition experiments,the F1 value of our model is increased by 1.8 and 0.43 respectively compared with the BiLSTM-CRF model.(4)An information system automatic construction platform was established to verify and apply named entity recognition methods.First,the proposed named entity recognition method is used to identify the needs of the users,and extract two types of entities of database table name and attribute.Then the platform will automatically create database tables,and use code generator to generate codes of entity,service,controller,and view.Finally,a test was carried out,and the results show that the entity recognition accuracy rate reached 96%,and the average response time was less than 4 seconds.To sum up,the proposed glyph vector representation method can automatically extract the semantic features of the character structure and improve the representation ability of rare characters;the proposed named entity recognition method makes full use of the characteristics of the Chinese character form and meaning,improves the recognition accuracy rate,and can meet the actual application requirements.
Keywords/Search Tags:Chinese Character Vector, Structural Features of Chinese Characters, Named Entity Recognition, System Automatic Construction
PDF Full Text Request
Related items