Research On Metadata Extraction Approach From Papers Based On Ensemble Learning

Posted on:2013-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Zhao

Full Text:PDF

GTID:2248330392454883

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

To build a digital library repository, open access(OA) journal papers on theInternet are usually used as an information source. The accuracy and speed of paperretrieval in the library digital repository can be improved by using paper metadata.Therefore, how to accurately and quickly extract OA journal papers metadata is thekey to construct a library digital repository. Based on comprehensive analysis ondomestic and foreign paper metadata extraction methods, combined with the idea ofensemble learning, this paper has further deep research on paper metadata extractionmethod from the synthetic of conclusions and generation of individual learners.Firstly, aimed at the problem that using a single metadata extraction model toextract paper metadata, the accuracy and generalization ability of the single model islow, a paper metadata extraction method based on Bayesian fusion is presented formthe synthetic of the conclusions of individual learners in ensemble learning. Theindividual learners use three kinds of machine learning algorithms HMM, SVM andCRF to learn to generate extraction models. The proposed method uses the generatedmodels extract papers metadata, calculates the posterior probability of the extractedsample belonging to each of the metadata classes and the posterior probability of eachmodel weighted, makes a decision based on the posterior probability by using theBayesian theory, and finally extracts papers metadata.Secondly, a paper metadata extraction method based on the meta-learning isproposed form the generation of individual learners in ensemble learning. Aconstruction method of base-classifiers is presented, which combines the SupportVector Machine (SVM) with the created diverse base-level training sets to constructsome base-classifiers with larger diversity. The training sets are created according tothe OA journal categories. Then, we present a paper metadata extraction algorithmbased on meta-learning, which uses the meta-classifier to integrate the classificationresults of the base-classifiers and generates the final extraction results. Our approachis superior to other single machine learning algorithm and the accuracy of papermetadata extraction is improved. Finally, we analyze and validate the proposed methods. The experimental resultsshow that the proposed methods improve the accuracy of the paper metadata extractionand have good generalization capability. At the same time, according to the researchresults, we prospect future research work.

Keywords/Search Tags:

Paper metadata, Metadata extraction, Statistical learning, Ensemblelearning

PDF Full Text Request

Related items

1	Research On Metadata Extraction Approach For PDF Document Papers
2	Design And Implementation Of Metadata Extraction Tool For Academic Paper Documents
3	Design And Implementation Of A Metadata Integration System In Data Warehouse
4	Registry And Application Of Metadata In Science And Technology Resources Database
5	Research Of File System Metadata Graph
6	Research And Design Of A System For Managing Geospatial Metadata Based On XML
7	A Parallel Metadata Extraction System For Moving Objects In Surveillance Video
8	Bi Data Quality Management System Design And Research
9	Research Of Metadata Management In Multiple MetaData Servers Environment
10	Research On The Key Technology Of Metadata-based Integration For Proteomics Data Resources And The Development Of The Application Platform