Font Size: a A A

Application And Research Of Topic Model In Gene Semantic Similarity Calculation

Posted on:2018-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:K JiaFull Text:PDF
GTID:2348330512487252Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,when biologists discovered unknown genes,they often compared them with known genes,and then infer the characteristics of unknown genes based on their similarity.Biologists compared genetic sequences or structures by comparison algorithms to find the similar or related gene in function.But studies have shown that functionally similar or related genes do not necessarily have a large correlation in sequence.Regarding the above issue,the current mainstream method analyzed and predicted the characteristics of unknown genes by calculating the semantic similarity between gene products annotated with gene ontology terms.However,these methods only indirectly reflected the semantic similarity of genes through the relationship between the terms in the gene ontology,not involving the instrinsic semantic meaning in term itself.This paper proposed a gene semantic similarity algorithm based on the topic model,which extracted the instrinsic semantic information from the text and solved the lack of mainstream methods in a certain degree.This article had the following three innocations:1.When calculating the similarity between term pairs,we extracted the potential semantic information from the description text of term and dedicated the text to a high-dimensional semantic topic vector.Then the similarity between the terms was translated to the similarity between two vectors.2.In this paper,SSGTLDA and SSGTBTM models were proposed.For the long text information obtained from the Google,the SSGTLDA modeled the text-topic relation and the topic-word relation,and finally obtained the high-dimensional theme vector of the text.For the short text information obtained by the definition information of GO,SSGTBTM modeled the word pairs in the whole corpus and finally obtained the theme distribution of the text.3.This paper implemented the SSGTLDA and SSGTBTM algorithm,and the experiments were carried out on term and protein pair data sets.The experimental results showed that the model proposed in this paper achieved a good effect.
Keywords/Search Tags:Gene Ontology, Semantic Similarity, LDA, BTM, Topic Model
PDF Full Text Request
Related items