Currently, the Internet is flooded with all kinds of eroticism and pornography with rapid growth, which has terrible influence on the cleanness and harmoniousness of the virtual world. In order to restrain the rapid spread speed of these eroticism information over the Internet, traditional technologies such as blockage based on IP or sensitive keywords matching haven't work effectively any more. In this situation, the research focus on the image filtering technology has been developing rapidly. Founded on the project"Research on Content-Based Erotic Image Filtering technique and its realization in IE"of Zhuhai Science and Technology Planning Projects in 2004, in this paper, it studies the key technology of Content-Based erotic image filtering, the skin-color detecting technology, and finally constructs the Byes Classifier model based on skin-color statistical histogram. After that, extract five feature vectors for classifying erotic images.Erotic images are characteristic of bareness skin, so we use skin detecting models, texture models to detect skin-color area and build binary image, then distill character vector, and finally use corresponding classing algorithm to filtrate images. We construct a more complete image database, containing a marked skin-mask bank of 1442 images and a test image bank of 15890 images, and sign the images using the classification strategy. All the work we have done in this paper is based on the image bank.The main work of the dissertation is as follows:(1) Construct a database including two types of tables: the statistics table used to store statistic data and the tests table used to store conditional probability of skin-color detecting model. The essential data in both statistics table and tests table is based on the pixels of the images from the standard skin-masked images bank, which contains 1442 images and 0.75 billion pixels. We use Microsoft SQL Server 2000 to build the database with ADO (ActiveX Data Object) database technology, which contains 16,777,216(256* 256*256) rows of records. On the one hand, these records are useful for the research on the distribution of RGB values of skin and non-skin pixels; on the other hand, they provide data for further tests in the model. (2) The research of the skin-color detecting model. The skin-color detection seems simple but complicated mainly for the influence of the factors such as race, illumination, noise and so on. At present there are three methods of skin-color detection in common use in the research field: the Chroma Space Algorithm, the Byes Classifier Algorithm based on skin-color statistical histogram and the Seed Diffusion Algorithm based on neighboring information. This paper improves three inadequate places of the Byes Classifier Algorithm based on skin-color statistical histogram mentioned by Jones and Rehg, which are the construction of skin and non-skin models with images containing skin and not containing, statistics for pixels and 32 bins per channel in RGB color space.First construct two kinds of RGB histogram model from images containing skin in 256 bins per channel in RGB color space. Based on more suitable model, we promote Byes Classifier Algorithm, then after comparing the auto-generated masked images with hand-generated masked images, we collect and analysis statistical rates of Omission Rate and False Positive Rate, extract the needed prior probability formula and the conditional probability formula, and finally build the Byes Classifier Model. After that, by comparing the cnt's values in the statistics table, we select the relatively valuable records, and insert them into the test table as the conditional probability. The cnt's value shows the appeared times of skin-pixels in a row of record. In order to check the correctness and completeness of the selection, we collect statistic rates of Omission Rate and False Positive Rate from the generated marked images and build a check table of the rates of Omission Rate. Then by adding the omitted RGB values of skin pixels, we complement the test tables, and finally obtain the actual conditional probability of the test table in the Byes Classifier Model based on RGB histogram.After images'detecting, we adopt one-rank-gray stat as the texture model. The area (such as Yellow of sofa, yellow of woolen blanket etc.) will be masked as non-skin. It decreases the false positive rates and supports the corresponding classing algorithms with valid characteristics.Compared with one mentioned by Jones and Rehg, our model decreases the influence of non-skin pixels in a skin-marked image. We evaluate the optimal threshold through estimating the Equal Error Rate and choose the thresholdθ= 0.07 in our training set. Compared with the 76.55% correctness and 14.59% omitted ratios in Jones and Rehg's Model, in our model, the correctness of skin-color detecting can achieve 80.83% on the test set which contains 1442 images, and the omission rate of 13.20%.(3) The feature vector extraction and evaluation for classifying erotic images. Before classifying, we extract ten features that are relatively more appropriate for classifying from masked images and its corresponding original images, and then respectively, we evaluate these features by considering their capability for classification, and finally select five features to form the classification character set. In order to reduce the false positive rate of classification for portrait image effectively, the human face detection mechanism is utilized in the filter. Take into account of both precision and computing speed, in this paper, we use the face detection mechanism proposed by P.Viola, which combining AdaBoost and Cascade technology, and achieved by OPENCV. The results show that the precision of our system can be improved largely (about 10% on our test set) after adding the face detection mechanism into our erotic image classifier.(4) Experiments and analysis show that our erotic image classifier can identify the benign images and erotic images effectively, with precision of about 88.51%(while the precision for erotic images recognition is 71.15%, the precision for benign image is 91.23%) on our test set with 4624 images.There are still many aspects of our filtering system that need to be improved and perfected, such as more efficient skin-color pixel detecting model, the correctness of the face detection mechanism, the optimization of the system real-time capability. These are also our future work. |