Individual identification,which directly links a crime scene to a suspect,holds a very important place in the field of forensic science.Hair shaft is one of the most common biological evidences at crime scenes.However,due to the small amount of nuclear DNA and its severe degradation,existing DNA testing methods struggle to perform individual identification.The proteins in the hair shaft are abundant and stable,and contain genetic information.In this study,the extraction and detection method of hair shaft protein was optimized,and the single amino acid polymorphism(SAP)analysis process of hair shaft protein was established based on East Asian population.The single amino acid polymorphism information in hair shaft protein was obtained,and the individual identification was attempted by using hair shaft SAP to calculate the value of random matching probability(RMP).The main study contents are as follows.(ⅰ)Two pre-processing methods,ionic liquid extraction and PCT extraction,were investigated for the hair shaft protein.In terms of stability and operational convenience,ionic liquid extraction is more advantageous.Methods for hair shaft protein extraction based on ionic liquid and mass spectrometry detection have been developed.The ultrasonic crushing time for extraction was also optimized.We compared the differences in the detected proteins of mass spectrum replication from the same hair shaft,different hair shafts from the same individual and different hair shafts from different individuals and found protein difference in mass spectrum replication of the same hair shaft<different hair shafts of the same individual<different hair shafts of different individuals.These results suggest that the amount,type and SAP of hair shaft protein were different among individuals.The sampling site for hair shaft protein detection was fixed at 2 cm near the root tip and the effect of hair diameter on hair shaft protein detection was excluded.(ⅱ)The SAP analysis algorithm for hair shaft proteins was developed and the individual identification ability of the SAP detected in 12 samples was analyzed.Firstly,the SAP-containing protein sequence database of East Asian population was established(the minimum allele frequency of non-synonymous SNP corresponding to SAP in East Asian population≥0.1%),which contained 250,000 SAPs,and the corresponding SNP and SAP annotation table was generated.Twelve hair shaft samples with a length of 2 cm(6 people with 2 hair shaft each)were pre-treated with ionic liquid method,then LC-MS/MS mass spectrometry was performed.The mass spectrometry data was searched and analyzed against the SAP-containing protein sequence database,and a total of 321 SAPs were obtained,with an average of 131±17 SAPs per sample.The SNPs’typing corresponding to SAPs was imputed according to the annotation table information of SAP and corresponding SNP,then compared with the SNP results of blood exome sequencing of the same individual of the hair shaft to verify the accuracy of SAP detection.The RMP was calculated based on the population frequency of SNP.The resulting RMP values range from 1.4×10-4 to 1.0×10-9,with a median of 1.3×10-6.(ⅲ)To further improve the detection of SAP in hair shaft proteins,the extraction and detection methods have been optimized.The new method not only extracts the hair shaft 3 times to fully dissolve the hair shaft protein,but also uses high-p H reverse-phase liquid chromatography to divide the mixture of 3 extractions into 6 fractions for mass spectrometry detection.Compared to the original single extraction method,the new method effectively improves the coverage of mass spectrometry detection and can identify more low abundance proteins and peptides.The number of variant peptides containing SAPs increased from 99 to 203,and the number of SAPs increased from 97 to 179.In the hair shafts from 10 individuals,3957±943 peptides and 632±243 protein groups were identified by the optimized method,and a total of 321 SAPs were obtained.The average number of SAP detected per individual was 157±23.The calculated RMP value is between 6.53×10-4 and 3.10×10-14(median=1.37×10-8).Optimized extraction and detection methods can increase the number of detected SAPs in the hair shaft,thus improving the individual identification capability.(ⅳ)Different algorithms for RMP and the individual recognition ability have been explored,and other parameters to increase individual recognition ability have been considered.Assuming the following application scenario:the biological evidence at the scene was one hair shaft and there was N suspects’blood.SAPs from the hair shaft were detected and compared with the SNPs from the exome of the suspects’blood.Then the suspects were ordered.The accuracy of SAP to SNP conversion was verified by exome sequencing and the number of correct and incorrect SAPs was counted.The correct SAPs or SNPs are used to calculate the RMP.The results show that the number of suspects affects the RMP ranking results,with the number of correct and incorrect SAPs being the largest or smallest when the hair shaft and blood are from the same individual.It was revealed that the RMP value,the RMP ranking,and the number of correct and incorrect SAPs can be integrated in future applications for individual identification.Due to the small sample size,this needs to be further validated by large-scale data validation in future studies to build a more comprehensive individual identification algorithm.In conclusion,this paper has established a complete system of individual identification techniques for hair shaft protein,covering optimization of extraction and detection method,SAP typing and identification,and random matching probability calculation,which laid a solid technical foundation for individual identification application of hair shaft protein. |