Fast Classification Of Web Document Images

Posted on:2018-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:G S Liu

Full Text:PDF

GTID:2348330512487393

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet,smart phones and communication technology,multimedia data such as texts,images,videos and audios on the Internet increases rapidly,which brings rich information and great convenience to our life.On the other hand,it is becoming more and more difficult to exploit the information embedded in the heterogeneous data.Specifically,considering the increasing proportion of the visual data in the network data,we can also find that the number of document images grows rapidly in recent years.The recognition and understanding of document image content is of great significance for the effective use of network information.In this paper,we focus on the fast classification technique for web images,which is studied and applied to classify the web images into five categories,namely natural scene text / non-text images,born-digital images,camera-captured paper documents and scanned paper documents.To end this,we build two classification systems,the first one is type-based image classification module and the second is a content-based module.Specifically,the type-based classification system consists of two stages: the first stage extracts global features reflecting distributions of color and saturation and uses a support vector machine(SVM)classifier for classification.The images assigned low confidence by the first-stage classifier are processed by the second stage,which extracts local texture features represented in the Bag-of-Words framework and uses another SVM classifier for final classification.The contentbased module is used to distinguish text and non-text images in natural scene images.It first detects and localizes the Fastext corners in the natural scene image and then filters them using the combination of local and global non-maximums suppression.Finally,a flood fill operation is performed based on the candidate text points to generate candidate text connected components(CCs).The geometry features and local texture features are extracted based on those CCs and fed into a text/non-textregion classifier for final decision.Our experimental results on a large image database demonstrated the effectiveness of the two proposed classification methods.Our works have two major contributions.First,we focus on the fast classification technique for web images,design effective and efficient features and build a real time classification system.Second,we build a large image database called NLPR_Web4,which contains over 40,000 Web images.This database can be shared for academy research.

Keywords/Search Tags:

Classification of Web images, Hierarchical classification, Image processing, Feature extraction, Scene image text detection

PDF Full Text Request

Related items

1	The Research And Application Of Scene Image Text Extraction Method
2	Based On Feature Description Of Image Scene Classification Algorithm
3	Feature Extraction Algorithm Optimization And Implementation Used For The Scene Image Classification
4	Research On Feature Extraction And Classification Method Of RGB-D Images
5	The Research On Text Identification And Detection Algorithm Of Natural Scene Images
6	A Method For The Scene Classification And Labeling Based On Multi-feature Fusion
7	Research On Scene Text Detection And Image Classification Based On Convolutional Neural Network
8	Scene Image Text Detection Based On Deep Learning Method
9	The Image Classification Algorithm Based On Feature Extraction And Sparse Representation
10	Web Text Classification Method And System Realization