Font Size: a A A

Research On VOCR And HOCR Technology Based On Wavelet Neural Network Theory

Posted on:2009-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:T C HuangFull Text:PDF
GTID:1118360245999245Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of the modern information society, more and more multimedia information is available. So the technology of multimedia processing is becomnig the important task for the irrelevant area of scientists.Among of the video/image multimedia,text in images and video frames carries important information for visual content understanding and video extrieval. Different from traditional patterns (single character, humance face,et al.), text line varies in size,grey,shape and color.Furthermore,some text line embedded in complex background. These bring on difficulties to text detetection and recognition.The recognition of handwritten numerals is an important problem in optical character recognition(OCR) with applications such as the automatic ZIP-code recognition and mails sorting system .We study the VOCR(abbreviation of Video Optical Character Recognition) and HOCR(abbreviation of Handwritten Numeral Optical Character Recognition) with wavelet,muliwavelet,wavelet network and multiwavelet network.The main work of this dissertation is as follow:First,we explore deeply the wavelet network and multiwavelet network theory. Especially,the theorems for function approximation bilities and convergence of wavelet networks and multiwavelet networks are proposed and proved.At the same time,we study and implement the nonlinear function approximation of db2 wavelet network and GHM multiwavelet network. The result shows that multiwavelet network's function approximation is better than single wavelet network.Second,text localization and recognition in images is important for searching information in digital photo archives, video databases and web sites. However, since text is often printed against a complex background, it is often difficult to detect. In this dissertation, a robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages. First, a wavelet transform is applied to the image and the distribution of high-frequency wavelet coefficients is considered to statistically characterize text and non-text areas. Then, the k-means clustering algorithm is used to classify text areas in the image. The detected text areas undergo a projection analysis in order to refine their localization. The detection performance of our approach is demonstrated by presenting experimental results.Third,a novel text extraction approach based on wavelet neural network from image or video is presented. It successfully extracts features of candidate text regions using discrete wavelet transform. This is because the intensity characteristic of any detail component sub-band is different from that of the others. We utilize this difference to extract features of candidate text regions. A neural network based on back propagation (BP) algorithm is trained according to these features. The final network output of real text regions is different from those non-text regions. Hence an appropriate threshold value with some dilation operators can be applied to obtain the real text regions.Fourth, two kinds of wavelet features are proposed:(a)Kirsch edge enhancement based 2D wavelets and (b)2D complex wavelets. The two sets of hybrid features are congregated by combining them with the geometrical features for the recognition of handwritten numerals. Experiments conducted on handwritten numeral recognition and verification show that the two hybrid feature sets can achieve high recognition and verification performance. In addition, the merits of the proposed wavelet feature extraction methods are discussed.Fifth, we present a novel handwritten numeral recognition approach using multi-wavelet neural network clusters to expand contour shell. We first trace the contour of the numeral, then normalize and resample the contour so that it is translation- and scale-invariant. We then perform multiwavelet ortho-normal shell expansion on the contour to get several resolution levels and the average. Finally, we use the shell coefficients as features to input into a feed-forward neural network to recognize the handwritten numerals. The main advantage of the ortho-normal shell decomposition is that it decomposes a signal into multi-resolution levels, but without down-sampling. We conducted experiments and found that it is feasible to use multi-wavelet features in handwritten numeral recognition. An experiment to verify the efficiency of the multi-wavelet was performed omitting the feature extraction step. Results show that information about the relevant image features are evenly distributed in all sub-band images of multi-wavelet coefficients and that multi-wavelet neural network clusters are promising feature extractors and classifiers.
Keywords/Search Tags:text detection, text recognition, video content analysis, wavelet feature, unsupervised classification, wavelet neural network, multiwavelet neural network, video optical character recognition, handwritten numeral optical character recognition
PDF Full Text Request
Related items