| Text is a valuable cultural heritage that not only connects the past with the present but also spreads the culture and thoughts of human worldwide.In recent years,numerous historical documents have been captured and published online with the help of many libraries and archives.Studying these historical documents will bring the customs and culture of that specific period alive.Meanwhile,along the rapid development of Internet technologies and mobile devices,many daily-life scenarios that contain lots of handwritten text are captured and saved,such as handwritten notes,whiteboards and so on.Detecting,recognizing and understanding these text content can empower people to do more and achieve more.To fulfill this goal,robust handwritten text detection is a crucial prerequisite.However,handwritten text detection in natural scenes and historical documents is still an unsolved problem due to its unique challenges,such as various handwriting styles(e.g.,long ascenders and descenders,heterogeneous and touching strokes),complex layout(e.g.,arbitrary oriented or curved text lines,marginalia,heterogeneous inter-line spacing),physical degradations(e.g.,bleed-through,faded away characters)and distortions introduced by image capturing.In this thesis,after thoroughly studying related works and analyzing their limitations,we have done three works to overcome the above-mentioned challenges so that we can detect handwritten text in natural scenes and historical documents robustly,which are summarized as follows:(1)We present a robust connected component(CC)based approach for handwritten text detection from images of whiteboards and handwritten notes.For existing CC-based methods,their similarity-based text-line grouping algorithms limit them to handling with short,similar and sparse multi-line text scenarios,which are common cases in handwritten scene images.To address these problems,we estimate text-line orientation in the position of each remaining CC by using a Fast R-CNN framework,based on which the difficult text-line grouping problem can be simplified as a graph pruning problem.Moreover,the remaining hard non-text CCs can be pruned effectively by the same Fast R-CNN.As a result,this approach achieves promising results on an in-house testing set.(2)We propose a text-segment based approach to detecting handwritten text in natural scenes.Specifically,we propose two effective methods to improve a traditional SegLink approach and make it robust to handwritten text detection.First,we modify a label definition strategy by shrinking the short side of text bounding boxes to create a text core region and text segments are set as non-text if they are outside this region.This explicit label definition can suppress most of wrongly classified pixels and links in the space between two nearby text-lines already.Second,we propose a graph-based text-line segmentation method further to separate remaining wrongly grouped text-lines.Our experiments demonstrate that these two methods can improve the performance of SegLink for handwritten text detection significantly.Owing to these improvements,this approach achieves much better performance than previous CC-based approach.(3)We introduce the concept of baseline bounding boxes and baseline primitives to detecting text baselines in historical documents.After analyzing problems encountered by existing convolutional neural network(CNN)based text baseline detection approaches,we introduce the concept of baseline primitives to leverage wider context information to address these problems.Specifically,we propose to use a relation-network based framework to detect text baselines in historical documents,which identifies baseline primitives and learns a link relationship for each baseline primitive pair in a single neural network.Our approach can handle the wrongly merged and wrongly split problems effectively and achieves stateof-the-art performance on two challenging text baseline detection benchmarks,namely cBAD 2017 and cBAD 2019. |