Font Size: a A A

Research On The Categorization Methods Of EBM Web Documents And Its Application

Posted on:2009-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:X L GanFull Text:PDF
GTID:2144360248954765Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Evidence-based medicine (EBM) has become the prevailing medical model in the international medicine area, in which it is key to get and select the best evidences and form EBM Systematic Review. With the rapid development of Internet, artificial classification and management cann't deal with vast amounts of medical documents. Therefore, it is urgent to get the support of effective webpage categorization methods and other intelligence information processing techniques.This thesis summarizes current research about webpage classification and explores several key technologies, and then presents some new points of view and solutions by our own way. The main work of the paper includes:Firstly, web page pretreatment and categorization methods are studied. Especially in English webpage pretreatment, some effective methods have been found, including content extraction of webpage title and steming of English word.Secondly, a method of Weighted Naive Bayesian (WNB) based on the summarization of webpage is presented. When critical information is extracted by LUHN and LSA methods and word weights is adjusted by evaluation functions, and then construct WNB classifier. It is confirmed by our experiments that the precision can be increased 6-9 percentage points.Thirdly, an approach of using Boosting algorithms to improve SVM classifier based on webpage summary is proposed.The experimental results show that this method can improve classification accuracy more effectively than single SVM classifier.Fourthly, a webpage classification system is designed and implemented, including webpage pretreatment, some general webpage classification methods, and some improved classification algorithms.
Keywords/Search Tags:Webpage Categorization, Text Categorization, Webpage Summary, Weight Adjustment, Boosting
PDF Full Text Request
Related items