Protein is the main bearer of life activities,its characteristics of interaction with chemical molecules are very important for the discovery and development of novel drug molecules.Traditional drug discovery methods rely on expert intuitions and mass invivo experiments,which high cost brings a lot of difficulties to the process of drug development.Recently,drug screening based on computational methods and artificial intelligence models gained a lot of attention,because of its potential in greatly accelerate and reduce the cost of the drug discovery process.Being one of the most popular generative framework,variational autoencoders(VAE)are known to suffer from a phenomenon termed posterior collapse,i.e.the latent variational distributions collapse to the prior,especially when a strong decoder network is used.In this work,we analyze the latent representation of collapsed VAEs,and proposed a novel model,neighbor embedding VAE(NE-VAE),which explicitly constraints the encoder to encode inputs close in the input space to be close in the latent space.We observed that for VAE variants that report similar ELBO,KL divergence or even mutual information scores may still behave quite differently in the latent organization.In our experiments,NE-VAE can produce qualitatively different latent representations with majority of the latent dimensions remained active,which may benefit downstream latent space optimization tasks.NE-VAE can prevent posterior collapse to a much greater extent than it’s predecessors,and can be easily plugged into any autoencoder framework,without introducing addition model components and complex training routines. |