Design And Implementation Of Metadata Extraction Tool For Academic Paper Documents

Posted on:2018-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Deng

Full Text:PDF

GTID:2428330545961123

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the popular application of computer technology in various fields,many enterprises and organizations have also begun to realize the significance of information management.In the process of achieving information management,the data of information management mainly in the form of electronic documents and there are many documents in the academic paper format.With the increasing number of such documents and the requirements on the high accuracy of document retrieval,classification and statistical,it is highly necessary to improve the quality of metadata extraction of paper document.In this paper,a mixed model based on BP neural network and support vector machine(SVM)is proposed to extract the metadata information of the relevant text content of this kind of Chinese paper document.Aiming at the problem that the accuracy of the existing metadata extraction method is not high and the adaptability is not strong,an extraction method of mixed model based on BP neural network and Support Vector Machine is proposed.The extraction of metadata from paper document is transformed into the classification of text block.By analyzing and comparing of the several kinds of usual classification methods,the feasibility of method based on BP neural network and support vector machine is obtained.For the text blocks to be classified and identified,the preprocessing is carried out by using the feature rules of the text.The summary metadata and the keyword metadata are extracted by the rule matching method.For the preprocessed text,in order to improve the accuracy of extraction model,the feature vector is constructed by combining the local features of the text and the characteristics of its context blocks.The feature vector of the input text block is classified and identified by using the BP neural network model,and thereby the corresponding metadata type is identified.For the text blocks with unit address metadata and author metadata,the preprocessing is taken by using the separators between blocks.The feature vector of the sub-text is constructed by combining the common names and place names information obtained from corpus.The metadata type of the text is obtained by using the support vector machine model.The metadata extraction tool based on the BP neural network and the support vector machine model is implemented through Java and libsvm library.The experimental result shows the better performance of this mixed model for the document metadata extraction in the academic papers.

Keywords/Search Tags:

metadata extraction, feature vector, BP neural network, Support Vector Machine

PDF Full Text Request

Related items

1	Convolutional Neural Network Based On Improved Support Vector Machine Research On Image Recognition Method
2	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
3	Research On Text Classification Algorithm Based On Support Vector Machine And Neural Network
4	Face Detection Based On Convolutional Neural Network And Improved Support Vector Machine
5	Research On Some Problesm Of Support Vector Machine Learing Algorithm
6	Research On Application Of Support Vector Machine In Liver B Ultrasound Images Classification
7	The Study Of Several Issues And Application In Statistical Pattern Recognition
8	Research On The Joint Classification Based On Support Vector Machine And K-nearest Neighbor
9	Application Of Image Processing Technology And Support Vector Machine In Tobacco Grading
10	Study On Distributed Feature Extraction Of Scattering Centers And Key Technique Of Classifier Based On Kernel Method