Font Size: a A A

Research On Metadata Extraction Approach From Papers Based On Ensemble Learning

Posted on:2013-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhaoFull Text:PDF
GTID:2248330392454883Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
To build a digital library repository, open access(OA) journal papers on theInternet are usually used as an information source. The accuracy and speed of paperretrieval in the library digital repository can be improved by using paper metadata.Therefore, how to accurately and quickly extract OA journal papers metadata is thekey to construct a library digital repository. Based on comprehensive analysis ondomestic and foreign paper metadata extraction methods, combined with the idea ofensemble learning, this paper has further deep research on paper metadata extractionmethod from the synthetic of conclusions and generation of individual learners.Firstly, aimed at the problem that using a single metadata extraction model toextract paper metadata, the accuracy and generalization ability of the single model islow, a paper metadata extraction method based on Bayesian fusion is presented formthe synthetic of the conclusions of individual learners in ensemble learning. Theindividual learners use three kinds of machine learning algorithms HMM, SVM andCRF to learn to generate extraction models. The proposed method uses the generatedmodels extract papers metadata, calculates the posterior probability of the extractedsample belonging to each of the metadata classes and the posterior probability of eachmodel weighted, makes a decision based on the posterior probability by using theBayesian theory, and finally extracts papers metadata.Secondly, a paper metadata extraction method based on the meta-learning isproposed form the generation of individual learners in ensemble learning. Aconstruction method of base-classifiers is presented, which combines the SupportVector Machine (SVM) with the created diverse base-level training sets to constructsome base-classifiers with larger diversity. The training sets are created according tothe OA journal categories. Then, we present a paper metadata extraction algorithmbased on meta-learning, which uses the meta-classifier to integrate the classificationresults of the base-classifiers and generates the final extraction results. Our approachis superior to other single machine learning algorithm and the accuracy of papermetadata extraction is improved. Finally, we analyze and validate the proposed methods. The experimental resultsshow that the proposed methods improve the accuracy of the paper metadata extractionand have good generalization capability. At the same time, according to the researchresults, we prospect future research work.
Keywords/Search Tags:Paper metadata, Metadata extraction, Statistical learning, Ensemblelearning
PDF Full Text Request
Related items