Research Of Clustering And Searching Algorithm In Large Scale Face Database

Posted on:2008-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:H P Sheng

Full Text:PDF

GTID:2178360215952540

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Withtherapiddevelopment ofthetechnologyandtheimprovement ofthesociety, the recognitional technology of biological characteristics has becomemoreandmoremature.Especiallythefacerecognition,asoneofthebiologicalcharacteristicsrecognitionaltechnologies,hasbeenwidelyappliedinthefieldsofhumanmachineinterface(HCI),videoconference,publicsecuritymonitorand control and so on because of its direct,friendly and convenienttraits.Clustering analysis is one of the most important unsupervisedclassification which depends on data attributes in the field of datamining.Clustering has attracted a lot of attention in many fields,such as lifesciences,medicine sciences and society sciences.With the size of datasetincreasing more and more rapidly,the method of linear matching strategy forface recognition becomes hard to be applied since it is time-consumable.So itis necessaryto find a fast searching algorithm which is suitable for large scaleface database.In this paper,we combine clustering analysis with facerecognitiontechnologytoclassifythefacedatabase,andthenweonlysearchinonly one or several subclasses,and this accelerates the search of large scalefacedatabase.In this paper, we first introduce some basic knowledge of the facerecognition and clustering analysis,including the frame of a typical facerecognition system,the strongpoints and difficulties in face recognition,thesteps of clustering and its application analysis.There are three main steps in atypical face recognition system:face detection,feature extraction,classifierdesign.We present a concise survey on these steps,especially on somestratigies of face detection and PCA feature,as these have very importantinfluenceontheresultoffacerecognition.Then we present a survey on the clustering algorithms of large scaledataset. Grid-based methods are very fast when the dimension of the data islow,but with the data dimension rising the complexity will increase byexponent and the partition at each dimension is not suitable for all types ofattribute. Density-based methods can find clusters of any shape using thedefinition of neighbor,which agrees with our common intuition of finding clusters in two or three data space.It is necessary to combine hierarchicalmethods with other technology for its complexity.K-means clusteringalgorithm,as a delegate of partition methods,is verysimple and fast.It employsthe euclidian diatance as its distance measure,so it tends to find hypersphereclusters.K-means clusteringalgorithm is suitableforlargescaledataset,but thepreconditionisthattheprogrammustloadalldatumintothemainmemory.We can see that the typical clustering algorithms are not suitable for ourfast search of large scale face database,so we propose the L-K menashierarchical clustering algorithm by modifying the typical K-means algorithmand combining with hierarchical classification.Our algorithm has two mainsteps: (1)It classifies the dataset recursively using the modified K-meansclustering algorithm.Especiallyit can use the one-step split algorithm which isasimplifiedK-means clusteringalgorithm whenthedatasetistoolargetobeloaded into memory totally.The number of subclasses and the decision ofcontinuingtoclassifyaredependentonthesizeofthedataset.Afterthis,wegeta hierarchical tree indexingstructure of face feature.(2)Our algorithm searchesredudant datum to ensure the accuracy rate of search. As the separation of thedataset could divide some neighbors into different classes,especially when thedistribution of the data is complex or the clustering algorithm is not verysuitable for the dataset which could decrease the accuracy rate of search,weuse this step to achieve a better accuracyrate.In this step,we search the first Nneighbors for each data in every leaf node in the indexing tree, and if theneighbors are not in the leaf node and they are near to the data in somedegree,we put them into the leaf node as redudant datum.Our algorithmemploysseveralparameters,andtheyinteractwitheachother,andmostofthemare dependant on the environment of the application.On the other hand,theymake our algorithm veryflexible,and the user can get a good tradeoff in manyaspects,suchasthecomplexityofthedataset'sdistribution,theeffectivenessofclusteringalgorithm,thespaceandtimecomplexityofthealgorithm,redudantrate of the datum,search time and accuracy rate of search.We can even get abetter result when one aspect is short,because we can compensate it in otheraspects.For example,we can get a shorter search time when we decrease themaximum size of each leaf node,but this could classify more neighbors intodifferent leaf nodes,so we must increase the reduant rate of datum to make sure a satisfying accuracy rate.There is another point to be explained:since welimitthenumberofsubclassforeachdatasetandthetimesofiteration,thetimecomplexityofclusteringstepinouralgorithmislow,andthecomplexityofouralgorithm mainly depends on the second step,so we provide a parameter toenable the user to get a tradeoff between the accuracy rate and timecomplexity.We implement a fast search system by applying our algorithm on aoracle9i database which contains more than 100,000 face image.The systemhas threemainsubsystems:(1)Input of faceimage.Inthis subsystem we extractface feature whenthereis anew faceimage,andthenclassifyit intoa subclassand put it into database for later search.We do not rebuild the hierarchicalindexing structure until there are enough new face image because the spaceand time complexity of rebuilding the indexing structure is veryhigh.(2)Building hierarchical indexing structure by using our L-K meanshierarchicalclusteringalgorithm.Wecanusethestructureinfastsearch.(3)Fastsearchoffaceimage.Givenasamplefaceimagetobesearched,wefirstextractits feature,get some leaf nodes which are nearest to the given image, load thedatum belonging to the nodes into memory and search the neighbors of thegiven face image.We stat some aspects of performance.We get a fast andsteadysearch time bylimit the data size of each leaf node.If we search in onlyone leaf node,we assume that the maximum size of each leaf node is M.Whenthe M is larger,the possibility of neighbors separated into different leaf nodesis less,so the redudant rate of datum is smaller,and the accuarcy rate ishigher;When the M is smaller,the result is reverse.The search time is linearwith M,so the search time will increase with M.When M is 5000,redudant rateof datum is 0.95,accuracy rate is 0.85 and the average search time is2.47s,which is much less than the original 120s.If we search in a few leafnodes,we can get more accurate search result.When M is 1200,if we search in5 leaf nodes,the accuracy rate is up to 0.95 from 0.77, and the search timeincreasesaccordingly.We can see from the result of experiment that our algorithm is veryeffective andflexible.However,There aresome otherproblems to bestudiedinfast search of large scale face database, such as the complexity of buildingindexingstructure.

Keywords/Search Tags:

Clustering

PDF Full Text Request

Related items

1	Research On Ensemble Clustering Algorithm Based On Bilateral Clustering
2	A Study Of Large-scale Data Clustering Based On Fuzzy Clustering And Its Application
3	The Research On The Method To Measure The Validity And To Abstract Knowledge Of Clustering
4	Research On Hybrid Algorithm Based On Subtractive Clustering
5	Research On Dynamic Clustering And Incremental In Data Mining
6	Research On Hybrid Ant Colony Clustering Algorithm
7	Study On New Data And Text Clustering Methods Based On Representatives
8	Research On Key Technology Of Clustering Analysis Optimization
9	The Study And Application Of New Clustering Algorithms In Image Processing And Text Clustering
10	The Research On Clustering Technology For Big Data