Research Of Text Keyword Retrieval Technology For Secrecy Inspection

Posted on:2018-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z G Wang

Full Text:PDF

GTID:2348330542490974

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Secrecy inspection is an important means which safeguard national information security.With the intensity of secrecy inspection gradually increased,document confidential information inspection is the focus of the current research for checking tool.As the storage capacity of the computer is increasing,followed by the massive file data information,which greatly prolongs the time of file secret information inspection,so the traditional pattern matching algorithm has been difficult to meet the current mass file pattern matching speed requirements.On the other hand,the current documents confidential information check mostly only for the text in the document,ignoring the images in the document to be checked,these images will still exist important confidential information,resulting in the current document confidential information check is incomplete,it is far less than the confidentiality check with efficiency and accuracy requirements.This paper focuses on the research of text key words retrieval technology for secrecy inspection,including the research of text extraction technology in image and the study of multi-pattern string matching algorithm.The paper focuses on the key techniques of text keyword retrieval,and the main works are as follows:(1)A text extraction method based on non-subsampled Contourlet transform is designed.The method have three steps.Firstly,the image to be processed is decomposed by Gaussian pyramid,and the different resolution images are obtained.Secondly,the non-subsampled Contourlet transform is used to locate the text area,and the final text area is obtained by synthesizing the position of the text area under each resolution.Finally,the global threshold binarization of the text region is performed,and text area input OCR system to recognition,access to extract the results of the text file.(2)A multi-pattern string matching algorithm based on jump table and double hash technique is designed.The algorithm is divided into three steps.First,the pattern matching algorithm can be divided into two stages,preprocessing stage and search matching stage.In the preprocessing phase,a character shifting table is created,which is used to transform thesearch window during pattern matching.Then,a first-level hash table and a second-level hash table are created,which are used for the search of the rule pattern to be matched.Finally,in the search stage,based on the shifting table,the first-level hash table,the second-level hash table to be matching text in the regular pattern scan matching to find all hit positions of patterns.The results show that the proposed image text extraction method uses the ICDAR data set compared with the existing typical method have higher image text extraction rate and accuracy rate.The proposed multi-pattern string matching algorithm uses the Reuters-21578 news data set to compare with the existing classical algorithms,and has relatively high time efficiency and space efficiency.Therefore,the text keyword retrieval techniques can be used for secrecy inspection.

Keywords/Search Tags:

secrecy inspection, image text extraction, keyword retrieval, double hashing, non-subsampled Contourlet transform

PDF Full Text Request

Related items

1	Research On Image Denoising Algorithm Based On Non-subsampled Contourlet Transform And Deep Learning
2	Image Retrieval Based On Color And Non-Subsampled Contourilet Features
3	Research Of Non-subsampled Contourlet Transform And Adaptive PCNN In Image Fusion
4	Research On Image Denoising Methods Based On Non-Subsampled Contourlet Transform And Bilateral Filtering
5	Research Of Multi-focus Image Fusion Based On Non-subsampled Contourlet Transform
6	Research On The Fusion Technology Of Infrared And Low-Light Level Image
7	Application Of Non-subsampled Contourlet Transform To Multi-source Image Fusion
8	Research On Methods Of Perceptual Hashing Content Authentication For Color Image
9	Research On Image Denoising Algorithm Based On Non-subsampled Contourlet Transform And Statistical Modeling
10	Research And Implementation Of Secrecy Inspection And Monitoring System For Android Platform