Knowledge discovery and hypothesis generation from biomedical literature using text mining

Posted on:2010-03-01

Degree:M.S

Type:Thesis

University:Purdue University

Candidate:Vaka, Harsha Gopal Goud

Full Text:PDF

GTID:2448390002986701

Subject:Computer Science

Abstract/Summary:

Automated extraction of knowledge from voluminous documents is a vast research area. Text mining is a promising approach for extracting knowledge from unstructured textual documents and is the automated approach for knowledge extraction from unstructured data like text. The objective of this thesis is to mine documents pertaining to Ayurveda, which are retrieved from PubMed, and find novel transitive associations among biological objects. This thesis discusses the extraction of biological objects from the collected documents (databank) using an Automated Vocabulary Discovery (AVD) algorithm. An effective co-occurrence based text mining algorithm was designed for hypothesis generation combining AVD (Automated Vocabulary Discovery) algorithm and tf-idf (term frequency and inverse document frequency) algorithm. This algorithm was designed to extract novel binary associations and hypergraph based ternary associations (object1 -- object2 -- object3) among various objects (genes, chemicals, drugs etc.,) using transitive text mining. This research established relationship between objects from modern medicine and traditional Indian medicine Ayurveda. Thus generated hypotheses (novel associations) were assigned with co-occurrence based significance score and few highly significant novel associations were validated. Finally compared and analyzed thus obtained knowledge (ternary associations) with binary associations (object1 -- object2) which form the superset for the ternary associations.

Keywords/Search Tags:

Text mining, Associations, Discovery, Using, Documents

Related items

1	Literature-based discovery: Finding implicit associations between genes and diseases
2	Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining
3	Research On Text Representation And Feature Extraction Methods Based On Conditional Co-occurrence Degree
4	Research On Social Network Media Hotspot Mining Algorithm Based On Distributed Computing
5	Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach
6	The Application And Study Of Clustering Analysis In Text Mining
7	Research On Several Key Issues In Unsupervised Knowledge Discovery
8	Research On The Key Techniques Of Web Information Intelligent Acquisition
9	Study On Management Of Text Documents Based Content In Dataspace
10	Research And Implementation On Public Opinion Analysis And Attribute Discovery Orientied Internet Text Mining