Font Size: a A A

Research On Flowchart Recognition By Fusing Structural Model And Corner Feature

Posted on:2019-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhangFull Text:PDF
GTID:2428330548452317Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Patent retrieval is an important activity in the process of patent writing and review.Existing patent retrieval systems mainly use the text-based information retrieval technology.However,patent documents contain not only text,but also various kinds of images that contain a wealth of information such as molecular structure diagrams and algorithm flow charts.With the development of digital images processing technology,researchers began to explore efficient image-based patent retrieval technology.Flowcharts are common images in patent documents.Flowcharts contain important semantics.However,flowcharts with the same or similar semantics may have different layouts.In order to achieve patent retrieval based on flowcharts,it is necessary to first recognize the semantics of the flowchart,that is,to transform the flowchart images into text information describing the flowchart.Existing studies mainly use the method of connected-domain to extract and recognize the structural elements of flowcharts,but such methods cannot accurately recognize the structural elements with the defects such as broken edges and image-text overlap.For this reason,this paper makes full use of the stability of the corner features of flowchart and deeply analyzes the close relationship between the distribution of flowchart corner and structural elements.We propose a flowchart recognition method that fuses structural models and corner features,and use the flowchart data in CLEF-IP to verify the effectiveness of the method.Specifically,this paper completes the following research contents:First,on the basis of summarizing flowchart specification and its typical structural elements,we proposed a flowchart structure model based on corner features(CBSM).The model firstly abstracts the structural elements of the flowchart into corner combinations,and classifies and defines these corners.Then it formalizes the distribution relations of corners on the graphic elements as well as the connection lines,and designs the corner combination rules describing the relationship between graphic elements and connection relations.At last,in order to simplify the judgment process of the corner combination and tolerate the deviation in the actual situation,it defines the corner combination constraint.The model lays a theoretical foundation for the follow-up corner detection and classification,and the recognition of flowchart structural elements.Second,we designed and implemented the detection and classification approach of the corners of the flowchart based on CBSM.At the beginning,we used the connected domain labeling method to layer the preprocessed flowchart,and extracted the flowchart structure.Then,we evaluated classical corner detection methods.Through experiments which analyze comparatively the detection results of corners on linear and curved structural elements,we designed and optimized the corner detection method for flowchart structure.At last,we extracted the corner features to form the high-dimensional feature vector.Following the definition of the CBSM,we implemented corner classification by machine learning.And optimized the parameters through cross-validation.According to the analysis of experimental results,the accuracy rate of the corner classification can reach 91.8% based on this method.Third,we designed a CBSM-based algorithm for structural elements recognition.Firstly,we summarized and analyzed the misrecognition such as overlapping and nesting in the flowchart of CLEF-IP.Then,we combined the above conditions wtih the corner combination rules and constraints in CBSM,designed the recognition algorithm for the elements and connections of the structure.Based on the algorithm,we judged and recognized the structural elements by corner combination information which traversed.Mornover,we used the OCR algorithm to recognize texts in flowchart.The text and the recognized structure information constitute a final text description.The experimental results show that the proposed method can effectively recognize the flowchart with the broken edges and image-text overlapping.The recognition rate of the flowchart can reach 89%.
Keywords/Search Tags:image retrieval, flowchart recognition, structural model, corner feature, corner combination
PDF Full Text Request
Related items