Font Size: a A A

Research And Application Of Information Extraction Based On Query Expansion

Posted on:2012-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2218330368492652Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the information about certain topics has been increasing explosively through different channels across the internet. In face of a large number of redundant information upon a certain topic, it becomes increasingly difficult on how to access the really needed information effectively. Therefore, how to provide users with comprehensive and concise information on specific topic and improve the efficiency information extraction have attracted the attention of many researchers. This thesis focuses on the query expansion and information extraction technologies which are needed by information extraction and aggregation for the side effects information of Traditional Chinese medicine.The main fruits are listed as follow:Firstly, information extraction need more to the comprehensive information, so that this thesis proposes a novel topic-related and query-based keyword expansion approach to solve the problem of information deficiency in the original query. Our method analyses the feedback pages obtained from the query on a specified keyword related to a topic, and then calculates those weight of topic-related keywords using the TF*PSF measure with the semantic weighting to filter those extracted keywords and achieves the purpose of information collection. Otherwise, it also designs an iterative keyword query expansion algorithm and adopts keyword combination method to improve the overall strategy for the web topical information.Secondly, according to the noisy,sparely, redundancy, less structural features of the network information, it proposes a topic sentence extraction approach based on reliability calculation to extract fine granularity on the subject, which can increase the reliability of the certain topical information and achieve the goal of information screening. On several sub-topics against a target topic, it extracts those topic sentences by means of the reliability calculation according to the smoothness of the topic-sentence probability distribution. In addition, the AP(Affinity Propagation) clustering is applied to eliminate redundant information ,and then it proposes a method to organize the topical information in hierarchy and structure form based on information ratio evaluation.Finally, It tests the performance of the information Retrieval and extraction experiments which based on the side effects information of three drugs. Experiments show that our approach achieves good results in the special application of information extraction on web topic.
Keywords/Search Tags:Query Expansion, Keyword Expansion, LDA Model, Clustering, Topical Information Extraction
PDF Full Text Request
Related items