Font Size: a A A

Classification From Local And Global Perspective For Scene Text Script Identification

Posted on:2022-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q H HuangFull Text:PDF
GTID:2518306572981799Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of globalization,scene text script identification as a prerequisite for scene text recognition has attracted more and more attention.The task mainly takes challenges from both the background and the text.The complex background may be occlusion,texture and blur.The texts may be presented irregularly.The characters from different scripts may resemble while from the same scripts may differ from each other.Especially some scripts sharing characters,such as Chinese and Japanese,are rather confusing.All of these will seriously affect the prediction of scene text.The dual branch network consists of a global branch and a local branch is proposed to respectively consider the global information and the local information.The global branch extracts the comprehensive features from the global perspective while the local branch extracts the discriminative features.The main contributions of the thesis are as follows.First,a dual branch network based on the attention mechanism is proposed for script identification.The global branch is realized with global average pooling to evenly view all the local parts.The local branch is realized with the attention mechanism,which outputs the weight of each local feature through a set of learnable parameters.And then the weighted average of all features is calculated to obtain the final feature.Therefore,the larger the weight,the more important this feature is in the final prediction.The model is evaluated on three public language recognition data sets,CVSI2015,SIW-13 and RRC-MLT2017.It achieves state of the art results on SIW-13(0.8% higher than existing methods)and RRC-MLT2017 valid(0.34% higher than existing methods).The speed of the model reaches267 FPS,which is double that of existing methods.What's more,the method joint with a basic text detector achieves 1.09% F-measure improvement on the end to end script identification dataset E2E-RRC-MLT.Second,a dual branch network based on the proposed patch aggregator is put forward for script identification.A patch aggregator is proposed to effectively learn the most discriminative features with the approximate character level weak supervision on the intermediate features,and then max pooling is employed to drop the redundant features.To achieve the approximate character level supervision,an improved loss function is proposed,named softermax loss.Softermax loss makes the character prediction obtain high score on all classes which it belongs to rather than only extreme score on a single class according to the label.The model achieves state of the art results on SIW-13(1.2% higher than existing methods),CVSI2015,and RRC-MLT2017 valid(0.13% higher than existing methods).The speed of the model reaches 400 FPS,which is rather dazzling.In addition,the method integrated with a basic text detector achieves 1.99% F-measure improvement on the end to end script identification dataset E2E-RRC-MLT.In this thesis,a dual branch network is proposed for script identification.This is more suitable for this task because it both consider the samples with or without discriminative characters.And two methods,attention mechanism and proposing the patch aggregator,are employed to take the challenge of digging discriminative features from the local perspective of the dual-branch network.A series of experiments are conducted on public benchmarks,which fully proves the effectiveness of the proposed models.
Keywords/Search Tags:Script identification, End to end script identification, Dual branch, Attention mechanism, Patch Aggregator
PDF Full Text Request
Related items