Font Size: a A A

Research On Name Recognition In Resource-scarce Languages Fused With Image Information

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:W G FangFull Text:PDF
GTID:2518306332977499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Compared with high-resource languages,the biggest difficulty of named entity recognition for low-resource languages is the data required for training model,especially the manual annotated data,which is difficult and costly to obtain.How to identify low-resource languages' named entities with low cost and high efficiency is one of the main researches in named entity recognition for low-resource languages.The thesis studies obtaining information related to person name entity from the text related images,and applying them to the task of person name recognition for low-resource languages.The thesis takes Tibetan person name recognition as an example,the specific work is as follows.Firstly,extract important people from text related images.Then,the Chinese person names of important people in the image are obtained based on face recognition technology.The third step is to obtain the candidate list of Tibetan transliterations of important people in the image based on machine transliteration.The fourth step is to search for the most similar person name results in Tibetan text using Tibetan transliterations based on Tibetan string matching method.In the fifth step,the names collected in the fourth step are mapped to the Tibetan names based on the traditional supervised learning method CRF,and finally the CRF recognition results are improved based on the important person information in the image.The contributions of this paper are mainly in the first step,the fourth step and the fifth step.They are as follows.Detecting important people in images based on multiple features.The existing methods of detecting people in pictures cannot be directly applied to the detection of multiple important people in images.According to the detection results of ImageAI,the thesis constructs the calculation formula of the importance of people in images based on three types of features(frame size feature,face orientation feature and character position feature).The weight of three features and the importance threshold are determined by experiments.The experiment uses 292 images(197 for training and 95 for testing).The experimental results show that when the weights of frame size feature,face orientation feature and position feature are 0.2,0.2 and 0.6 respectively,and the importance threshold is 0.7,the F1 value of the method for detecting important people in images is 81.46%.Research on Tibetan string matching algorithm based on person name recognition task.The non-uniqueness of transliteration results leads to multiple transliteration results for the same person's name.In order to map the important people's name information in the image to the CRF-based Tibetan person name recognition results as much as possible,the thesis studies the Tibetan string matching algorithm based on person name recognition task.Finally,the Tibetan string similarity matching method based on Tibetan syllable structure is selected.And the experimental results show that the Tibetan string matching effect of this method based on Tibetan person name recognition task reaches 86%.Designing a method for mapping important people's information in images to the result of CRF Tibetan person name recognition.The experimental results show that the F1 value of this method for Tibetan person name recognition is 88.1%,which is 16.0%higher than the CRF baseline result.Among the results,the F1 values of Han,Tibetan,and foreign person name recognition are 90.6%,80.6%,and 76.3%,respectively,which are 3.3%,5.0%,and 18.0%higher than the baseline.And the recognition result of foreign person names has improved the most.
Keywords/Search Tags:Person Name Recognition for Low-Resource Languages, Detecting Important People in Images, Tibetan String Matching
PDF Full Text Request
Related items