Font Size: a A A

Research On Molecular Retrieval And Drug Candidates Recognition In Big Data Environment

Posted on:2017-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:X SunFull Text:PDF
GTID:2308330503984352Subject:Engineering, software engineering
Abstract/Summary:PDF Full Text Request
In recent years, along with the sustainable development of our country national economy and the progress of chemical industry, the continuous development of combinatorial chemistry and high-throughput screening technology produce a large number of compounds. It can be synthesized a large number of diverse molecules in a short time. However, the molecular properties and functional properties get relatively slow. In some cases, it hinderes the research of some areas, including computational chemistry, chemical information, drug design. Traditional retrieval methods have achieved certain achievements, which can handle small- scale molecular data. However, with the explosive growth of existing molecules, the computing power of the traditional chemical software is limited, so the service rate of molecular data becomes the bottleneck. At the same time, as research of the optical materials and stealth materials focused on molecular refractive index. It is important practical significance to retrieve molecules about the refractive index. Finally, the way to choose high quality drug candidates is a research hotspot in drug research.We develop the research of molecular retrieval and recognition, the work mainly divided into two parts. In the first part, the traditional retrieval methods are analyzed in the big data environment. Attribute selection VF2 algorithm is proposed, and a distributed molecular retrieval model is established. The experimental results show, it realizes to effectively retrieve compounds with specific information in the big data environment. At the same time the retrieval complexity is lower. And we combine with the characteristics of the molecular properties. After the analysis of the classical efficient retrieval algorithm, the continuous refractive index is dispersed by the width algorithm, and then high-speed hash index is established, and the distributed massive retrieval system based on consistent hash function is realized. The calculated amount of data is effectively reduced to improve efficiency. The experimental results show, molecular data can be positioned fast, and the average time of this method is reduced. Besides, the model has the steady performance with high scalability.In the second part, there are 1555 molecules which are collected by us, including drug and non-drug. We have a further arrangement of the molecules from the database. First the molecular descriptors are analyzed. Then we ensure that valuable and non-redundant feature is left through pretreatment of molecular information. In addition, drug candidates recognition method uses deep belief network model based on molecular descriptors. The experimental results show that the method extracts the deeper characteristic vector, which is applicable to identify drug candidate task. The accuracy of recognition is up to 85.3% which is higher than the traditional methods such as support vector machine and artificial neural network.
Keywords/Search Tags:Molecular retrieval, Properties pre-screening, Big data, Deep Brief Network, Feature extraction
PDF Full Text Request
Related items