Font Size: a A A

TCm Literature Search Engine Based On Latent Semantic Infomation

Posted on:2012-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y L FengFull Text:PDF
GTID:2178330332476011Subject:Computer applications
Abstract/Summary:PDF Full Text Request
TCM(Traditional Chinese Medicine) is a discipline which studies and exploits the macro functions of lives and diseases.During 2500 years of development and practice, a huge number of medical literatures have been generated and the number is still growing.We need new tools to explore and browse large collections of TCM literature.User suddenly faced with access to millions of articles in her field, is not satisfied with simple search that current literature search engines have provided.Effectively using such collections requires interacting with them in a more structured way:finding literature more easy and exploring the collection through the underlying topics that run through it.In this paper, we describe our TCM literature search engine based on latent semantic information. The system has been successfully deployed at China Academy of Traditional Chinese Medicine.We use the latent semantic information in the literature and provides an easy and efficient way to finding literatue.This system has two major features:(1) We use topic model to find the latent semantic information in the literature. The topic provide a summary of the corpus that is impossible to obtain by hand.Then we can explore a corpus visualizing a topic;visualizing a literature and finding similar documents.(2) We extend autocompletion by supporting Chinese character fuzzy match; considering phonetic similarity and shape similarity of Chinese characters and allowing infix match as well as prefix match. When typing a Chinese character string, minor mistakes are tolerated and expected autocompletion strings are presented to user.
Keywords/Search Tags:TCM, Search engine, Topic model, Autocompletion, Lucene
PDF Full Text Request
Related items