| Paper is an important carrier of documents and books,and plays an important role in the communication of spiritual civilization and the transmission of information.In the examination of documents and paper material evidence,the paper is usually identified as the same type of paper by analyzing whether the paper has page changing and splicing.In the examination process,the paper sample is often mashed to obtain its cellulose suspension for identification,which not only destroys the fiber structure of the paper of documents and paper material evidence,but also brings irretrievable losses to the examination of the documents involved.Due to the solitary nature of traditional ancient books,when studying the paper year of the books,we should choose the inspection material similar to the paper production year to replace it,which leads to the inevitable errors in the inspection of ancient books and greatly reduces the experimental effect.In order to make up for the defects of traditional methods for the identification of the same type of paper for documents and the examination of the year of paper for ancient books,this paper will use portable high spectrometer to conduct identification research from the two aspects of common paper types and book paper years.The main research contents are as follows:In this study,spectrometers were used to collect hyperspectral data of 75 paper samples of five different types in the range of 400~1000nm,including offset paper,coated paper,embossed paper,white card paper and light coated paper,to identify the homogeneity of the papers.Standard Normal Variate(SNV),Multivariate Scattering Correction(MSC),and Savitzky Golay(SG)and First Derivative(FD)are four methods for spectral preprocessing.Successive Projections Algorithm(SPA)and Competive Adaptive Reweighted Sampling(CARS)have been used to select the characteristic wavelengths.Furthermore,Support Vector Machine(SVM),Random Forest(RF)and K-nearst Neighbor(KNN)were used to build a classification machine learning model based on all-band and feature-band.The average accuracy of the classification model in the training set and the test set was taken as the evaluation index of the model.The experimental results showed that MSC was used as the spectral pretreatment method,and then the SVM model was established based on the SPA characteristic band selection.The accuracy of the training set and the test set were 100.00% and 98.88%,respectively.Therefore,the combination of hyperspectral imaging technology and MSC-SPA-SVM can realize rapid nondestructive identification of different types of paper samples.Books from 1990 to 2010 were collected as test materials,and spectral data at the upper,middle and lower positions of the first,middle and last ink-free parts of each book paper were scanned as sample data sets.The original spectral data were pretreated by SNV,MSC,SG and FD.SVM,RF and KNN classification machine learning models were established by auxiliary preprocessing methods.Experimental results showed that SNV pretreatment operation could significantly improve the classification accuracy of the models.Then CARS and SPA algorithms were used to select typical bands as feature bands,and on this basis,a classification model based on the same pretreatment method was established.The experimental results show that the RF model based on the characteristic band selected by CARS has the best paper year identification effect,which not only greatly reduces the input variables of the model,but also makes the accuracy of the training set and the test set up to 99.70% and 99.48%.Therefore,SNV-CARS-RF was used in this study to achieve accurate and efficient nondestructive identification of paper age. |