Font Size: a A A

Research And Implementation Of Content Detection System Based On Net Crawler

Posted on:2011-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:X P HuangFull Text:PDF
GTID:2178360308961652Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information industry, the relationship between the Internet and people's daily lives is becoming increasingly close. The scope of business is getting more expanding, the Internet has moved from the early e-mail application into a big business platform including network entertainment, information access, communication and business transactions. However, with the growth of the number of netizens and the services on the Internet, the problems are also increasingly exposed, especially the spread of unhealthy information on the Internet resources, seriously affecting the health of users, especially the young people. Therefore the study of content detection system is increasingly important. By Detecting the contents of web pages exist on the Internet, We can clean the unhealthy resources on the internet. Content detection system plays a positive promoting for providing a healthy network environment to users. It has great significance and valueThis paper first analyzes the status of development of the Internet and the problems on the Internet. Then it introduces two kinds of the content detection systems implementations:active detection and passive monitoring technology, and a brief analysis of the characteristics and implementation methods of the two kinds of detection technology. On this basis, the author proposed the implementation of this paper-content detection based on the web crawler technology. Through the Web crawler technology to realize the extracting of the content on the web page, and the recognizing of the text and image appeared on the web page. The next describes the technology-related technology and implementation about content detecting. Comprehensive comparison of the various technologies, the author gives the design and implement of content detection on this paper. And the image recognition on the crawled page is the focus of this research. Image-recognition technology used in this article based on support vector machine (SVM) theory. By extracting a number of feature information of the input image, the input image is mapped to a high-dimensional vector space, on which the images can be linear divided by a hyper-plane. And give the implementation of the image recognition basis of the SVM theory and the testing recognition rate. Finally, sums up the work and points out the next step in the direction of work.The content detection system based on the Web crawler can identify most of the irregularities information about the texts and images appeared on the network. It plays an active part in avoiding the spread of the unhealthy information on the Internet. And it provides a guarantee for the Internet regulation, therefore it has a strong practical significance.
Keywords/Search Tags:internet, net crawler, svm, image detect
PDF Full Text Request
Related items