Font Size: a A A

Drug Webpage Retrieval Based On Multi-instance Learning

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:M XiaoFull Text:PDF
GTID:2308330482979458Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the extensive propagation of the Internet, the methods of human production and life have been undergoing profound changes. By means of the Internet, people can work, study, communicate with each other, and enjoy entertainments. Internet has been providing many conveniences for the human, meanwhile, lots of harmful contents have been emerging on the web, and overflowing seriously. These harmful contents have greatly injured human normal development, especially teenager’s physical and mental health. Consequently, it’s of great meaning to detect and recognize them.Generally, there are many kinds of harmful contents, such as pornographic information, violence information, horror information, drug information, reactionary information, etc. At present, little work has been done for the detection and recognition of drug information, however, their harm is comparable with, or even worse than those of other kinds of harmful contents. For this purpose, taking full advantage of object recognition and information fusion, we have conducted the classification of drug images and web pages on the Internet. The main work contains content as follows:Drug-taking instruments recognition based on holistic features. Drug-taking instruments have apparent shape features, correspondingly, can be recognized by them. After comparing some shape descriptors, PHOG is used as the shape descriptor of drug-taking instruments. Taking PHOG as input, SVM can recognize these drug-taking instruments satisfactorily.Cannabis recognition based on local features. As a kind of plant, cannabis can present diverse appearance. Consequently, they should be recognized by local features. BOW model is used, and five coding schemes of the BOW model are evaluated. The result is that, for cannabis images, the hard coding scheme achieves the best performance.Drug web pages classification based on Multi-instance Learning. The forward Comparison of Relative Sizes Sorting(FOCARSS) algorithm is proposed to extract those valid images in a web page. The experimental results demonstrates its robustness and satisfactory performance. According to the analysis of web pages’structure, a general method is proposed to extract the related text of a valid image, and achieves good effect. Taking a web page as a bag, a valid image and its related text can be seen as a instance in this bag. Subsequently, Multi-instance Learning algorithm can be used to conduct web pages classification, and experimental results demonstrate its effectiveness.
Keywords/Search Tags:Multi-instance Learning, Cannabis, Drug-taking Instruments, Drug Webpages
PDF Full Text Request
Related items