With the development of the technology and the improvementof the society, the technology of biological characteristicsrecognition become more and more mature. Now many technologiesof biological characteristics recognitions have been applied tothe practice. Face recognition is one of the most importantbiological characteristics. Clustering is a kind ofclassification which depends on data characteristics. Namely,the objects are clustered by the classifications, which is oneof most important data mining technology. It has widelyapplications in life science ( biology, zoology), medical science(psychiatry, pathology), social science (sociology, archeology),geoscience (geography, geognosy), and engineering science.Except the traditional clustering, the fuzzy clustering and theneural network clustering both have a deep development in recentyears. When policemen find a suspect and take photos of him, theyexpect they can search out whether the man is the one who is themain criminal at large from the face database, and hold someuseful information about him. So we should make a fast searchingsystem which can search out some pictures with the suspect's inmost similarity pictures from large database, then the policemencan make a decision whether and how arrest the suspect. Becauseit takes long times to return the result for the common searchingalgorithm , they are not suitable for the practical applications.In order to solve the problem, in this paper we combine clusteringalgorithm with the face recognition technology to make a fastsearching system in large scale face database.In the former part of this paper we introduced some basicknowledge of the face recognition, the advantages anddisadvantages of common classifications, some basic knowledge ofclustering, we analysis the merits , shortcomings and theapplied situations of the tradition clustering and fuzzyclustering in the view of face recognition. But the everyclustering has some limitations when they are used in the facesearching, so the latter part of this paper we combine k-meansclustering, hierarchical clustering with FCM etc., and proposea clustering algorithm that is a double hierarchical faceclustering based on k-means which suitable for the system of fastsearching in large scale face database. The algorithm is dividedinto two hierarchies. The first one is to pack the data havingthe high similarity into subclasses, which can ensure thecorrectness of the results. The second one is to use the k-meansclustering on the data package of the first hierarchy to classifythe data, and then unpack the package, return the initial datain this hierarchy which can improve the searching speed. We chose45 persons from FERET face database in our test. In this test setevery person have 4 to 5 pictures. In these pictures everyone hasseveral photos with different affection, that is, they may betaken at different view angle, with different equipments andscenes, or at different time etc, and they may also have differentilluminations, poses, expressions and so on. We had proved thesepictures feature data have space separability. The best resultis that all the images of one person be clustered into the sameclass. The test proved this algorithm is available. Thisalgorithm can reduce the searching scale to improve the searchingspeed and ensure some similar pictures can be in the same classat last. In this paper we designed and realized a system for fastsearching in large scale face database. The system has three baseparts : â‘ Picture Input. In this part we extract face featurefrom initial face pictures ,and put the feature, initial picture,name, sex, age, height, and weight etc into our face database.In this part we can input a batch of records once. â‘¡ Clustering.In this part we select out all the feature data part from database,and use the double hierarchical face clustering on them , savethe result in the table RESULT. â‘¢ Face Searching. The former twoparts are prophase processing , only this part is the part ofsearching. In this part we locate the face and extract the facefeature first , then compute the similar of this picture withevery cluster center, select the records in the classes whosesimilarity with this picture above a threshold or in the classeswhich is the two most similar with the picture. Find out some mostsimilar picture and show them on the computer. Besides thesefunctions ,this system has some other functions like Add One, Find,Change, and Look Over etc. Besides using clustering, somestructure of the table are changed in order to improve searchingspeed. In ORACLE9I database, we create index, put the blob typedata in a special tablespace, put the clustering result andinitial pictures in different tables, use space field as littleas possible , when open a table only the useful fields are selected.Above all improve the searching efficiency. Finally we appliedthe clustering algorithm into a database which concludes 110,000pictures . Then we tested three conditions : when searching apicture in the database, the first condition is the picture isin the database, the second condition is the picture is not inthe database but the pictures of people in the picture in thedatabase, the third condition is no picture of the people in thedatabase. And compared the result of searching using cluster andwithout using cluster. With using cluster the searching timereduce from about 120s to about 7s, and the pictures which havethe similarity with the searched picture over 0.80 can all besearched out. The pictures having low similarity can not be thesame person ,so if all the pictures in the database having lowsimilarity, the searching result will not be the most similar,but this can not affect our judgment. From the results of thetests , we can see that our algorithm can keep the searchingcorrectness and improve the searching speed effectively. If weadd some other body information when searching like: scales ofheight, weight, age, the searching speed can increase more. Weexpect the system can be more practicality by improvement. |