Font Size: a A A

Research And Implementation Of Metadata Extraction Based On SWT

Posted on:2019-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y P QianFull Text:PDF
GTID:2348330545958258Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,more and more PDF appear on the Internet and grow at tens of thousands of a day.Faced with such a large number of PDF documents,how to get useful information and save it in different categories is very important for archiving PDF and scientific research.The purpose of this subject is to design and implement a metadata extraction tool based on SWT.This tool can be used to automatically extract the metadata of books in PDF and export the data permanently.Compared with the manual extraction of metadata,this tool has higher accuracy and efficiency,and can greatly improve the efficiency of metadata extraction.This topic through the AWT,SWING,SWT/JFace several commonly used Java GUI framework contrast and analysis,so as to choose the most excellent SWT framework as the metadata extraction tool of the desktop development framework.In the PDF text extraction method,by comparing the advantages and disadvantages of PDFBox and iText two commonly used operation PDF Java class library in the PDF text information extraction,select the better performance of PDFBox as the PDF text extraction technology selection.At the same time,in the extraction process,according to the weight sorting algorithm design and implementation of Pinyin auxiliary prompt to make up for the defects of automatic extraction.In order to protect the intellectual property rights of software and the legitimate rights and interests of buyers,a login authorization verification mechanism based on RSA is added into the tool.Considering the maintenance and upgrade of tools,the log management system based on log4j is used.At the same time,in order to resist the catastrophic losses caused by the uncertain factors such as power failure and program abnormal closing,this tool designs a disaster tolerant mechanism which is automatically saved.Finally,through a large number of tests and analysis,it is concluded that this tool can fully meet the requirements of the project,and can greatly improve the speed and accuracy of metadata extraction.
Keywords/Search Tags:Metadata extraction, PDFBox, SWT, JFace
PDF Full Text Request
Related items