Font Size: a A A

The Study On Molecular Substructure Prediction Based On Metric Learning

Posted on:2016-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z S ZhangFull Text:PDF
GTID:2308330461991954Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development and the advent of society, more and more valuable mass spectral data are collected, it promotes the progress of the chemical structure parsing, how to mine the information of chemical structure or chemical properties from these mass spectrometry data? Data mining technology guide the way for people, here, we study the classical classification problems mainly. Classification in chemistry is to calculate more useful message from the compound databases, and then classify the compounds or medicines by their molecular feature message and chemical fingerprints. In this thesis, we firstly explain the original feature expression for mass spectra data. Because of its high dimension, it may cause over-fitting easily and computational complexity is high, then we deal with the original by some mathematical methods, finally we compare some algorithms that use metric learning to compute K-NN classification error based on their mass spectral data, such as, Neighborhood Component Analysis (NCA), Large Margin Nearest Neighbor Classifier (LMNN), Relevant Component Analysis(RCA), Information-Theoretic Metric Learning (ITML) and Maximally Collapsing Metric Learning(MCML), Discriminative Component Analysis (DCA), and linear Principal Component Analysis (PCA), Multidimensional Scaling(MDS) and Isometric Mapping(ISOMAP). these algorithms are all used to predict the molecular substructures presence or absence, elucidation. Identification of compounds or automatic recognition of structural properties from mass spectra data is an important work in chemometrics. Experiments showed that according to the characteristics of the mass spectrum data processing, metric learning algorithm can obtain a better effect, then in order to verify the experiment conclusion, we make metric learning as a dimension reduction method and compare it to other classic dimension reduction algorithms, such as principal component analysis, manifold Learning, results also show that it gets better effect, reduces the time complexity, at the same time, samples separability is enhanced.
Keywords/Search Tags:Data mining, molecular structure, metric learning, mass spectral data
PDF Full Text Request
Related items