Font Size: a A A

Research On Key Technologies Of Net Image Capture And Erotic Image Filtering

Posted on:2009-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:H W GuFull Text:PDF
GTID:2178360242480237Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development and popularization of Internet brings unprecedented facilitation on information. The development of Internet technology, on the one hand, has greatly enriched the common net users'demand for information, the other for the eroticism maker and disseminator provided more advanced means and channels of communication. The proliferation of Internet pornography is not only seriously affects physical and mental health of young people, but also brings a lot of inconvenience to the people who use the Internet normally. Traditional technologies such as blockage based on IP or sensitive keywords matching haven't work effectively any more on the research that how to prevent the spread of Internet pornography, the image filtering technology must be integrated to deal the problem more effective. Founded on"Research on Content-Based Erotic Image Filtering technique and its Application in IE"of Zhuhai Science and Technology Planning Projects in 2004, we study the key technologies of Content-Based erotic image filtering, extract five feature vectors for classifying erotic images, and construct a classifier based on Support Vector Machine.This paper discusses several key technologies of Content-Based erotic image filtering, after studying the research results that have presented, we design and realize an effective erotic image filter. The main work of the dissertation is as follows:(1) The capture and recombination of datagram. In Ethernet, the MTU, Maximum Transmission Unit is 1500 bytes, an IP packet can transmit data at a maximum length of 1480 bytes without the 20 bytes of the IP head. The TCP packet can transmit data at a maximum length of 1460 bytes without the 20 bytes of the TCP head. As a result, when the data exceeds the maximum length, it will be divided into pieces. Therefore, a picture may be divided into many parts to transmit with packets, we need to recombine these packets and eventually reverted to the picture.First, we analyze the head part of the packet in accordance of IP protocol, identify whether the packet is from the website to our computer or not by the source address and the destination address. Second, we analyze the content of the packet according to TCP and HTTP protocol , identify whether the content is picture or not and search for the size, name, type, the end sign and other information of the picture, then recombine the content of the packets with insertion sorting algorithm. We use two different methods considering the different situations in determining whether the data of the picture is accept completely: the first situation is the server gives the key words"Content-Length:"indicating the size of the picture in the HTTP head part of the first patch of the picture in the response packets, we can get the size information of the picture by finding the key words"Content-Length:"and then calculate all the data lengths in this picture's packets that we have received, if the summation is equal to the size of the picture, it means the data of the picture is complete, otherwise, the data is not complete. The second situation is when the server send the picture, it does not contain the key words"Content-Length:", so we can't judge the integrity of the data by calculating the sum of the patches length because of no size information of the picture. We found that the last patch of the picture will be added a seven-byte ending sign at the end of the patch indicating the end of the picture, so for the same picture, if the sequence numbers of the packets are continuous from the first to the last, it means the picture is complete, otherwise, it's not complete.(2) We construct a more complete image database, containing a training image bank of 1154 images and a test image bank of 14792 images, and sign the images using the classification strategy. All the work we have done in this paper is based on the image bank.(3) The research of the skin-color detecting model. The skin-color detection seems simple but complicated mainly for the influence of the factors such as race, illumination, noise and so on. At present there are three methods of skin-color detection in common use in the research field: the Chroma Space Algorithm, the Byes Classifier Algorithm based on skin-color statistical histogram and the Seed Diffusion Algorithm based on neighboring information. This paper chooses the Byes Classifier Algorithm based on skin-color statistical histogram.(4) The feature vector extraction and evaluation for classifying erotic image. We extract ten features that are propitious to classifying in all from mask image and the relevant origin image before classifying, and evaluate these features considering their capability of classification respectively, then select five features as our character set.(5) The construction of the classifier. The common classification methods are clustering method, Bayesian method, neural networks method, k-nearby method, Fisher Linear Discriminant method and Support Vector Machine method. Support Vector Machine classifies by constructing an optimal separating hyperplane in the feature space, it's suitable for our problem which divide images into sensitive and non-sensitive images by the eigenvector of the images. Support Vector Machine built on the basis of statistical learning theory, based on the principle of SRM, does not require prior knowledge of the specific issues, it can work good in the limited training samples circumstances, so we ultimately choose Support Vector Machine to construct the image classifier. We choose the RBF after the evaluation of the four kernel functions: the linear function, the polynomial function, the Gaussian function and the Sigmoid function, because the Gaussian function (Radius Basis Function, RBF) has the following advantages: first, the RBF maps the data to the high-dimensional space to solve the nonlinear relationship problem between the tags and attributes, second, the Sigmoid function is nearly the same to RBF when it takes certain parameters, third, the polynomial function is more difficult in model selection because it has more functions than RBF.RBF has two parameters that we need to regulate, different parameters will make corresponding classification of different identification accuracy, in order to find the best parameters, we used the m-fold cross validation method and have got parameters of high recognition rate.Experiments and analysis show that our erotic image classifier can identify the benign images and erotic images effectively, with precision of about 89.39%(while the precision for erotic images recognition is 76.61%, the precision for benign image is 91.47%) on our test set.There are many places of our filtering system that need to be improved and perfected, such as more efficient skin-color pixel detecting model, the detection of human face ,human body and special parts of human body, these are also our future work.
Keywords/Search Tags:Technologies
PDF Full Text Request
Related items