Font Size: a A A

Supervised Hashing With Latent Factor Model

Posted on:2015-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:P C ZhangFull Text:PDF
GTID:2298330452464026Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Ever since the information age, a lot of data is generated everyday on the web.It thus becomes very important to have the information indexed to fnd the useful onequickly. Due to this need, search engines have become very successful since the lastdecade, and the feld of data mining and analysis has drawn much attention. Approxi-mate nearest neighbor search is one of the fundamental problems in this feld. In orderto perform ANN search efciently on large dataset and avoid some issues in dataset ofvery high dimensions, hashing algorithms are proposed to transform high-dimensionalfeature vectors into low-dimensional binary codes. Hashing algorithms based on ma-chine learning are being proposed with its rapid development in recent years.Inthispaper,wemakeathoroughstudyoftheexistinghashingalgorithms. Throughthe learning process, we fnd some problems that can be improved. With this concern,we propose a new supervised hashing algorithm based on latent factor model. The ex-periment results show much improvement of our proposed algorithm over the existingones in both accuracy and time cost.Furthermore, we build an automatic moderating system for FML website usingkNN regression with hashing. The system collect raw data of the posts from the webpages, and extract feature vectors from it with the help of some natural language pro-cessing tools to describe the content of the posts. With some collected training data,our system can automatically predict the number of votes for the posts, leading to au-tomatic moderation.Additionally,webuildageneralplatformthroughtheexperimentprocess,towhichvarious hashing algorithms can be easily added. The platform supports performancecomparisonofthealgorithmswithrespecttomanystandardevaluationmetrics. Onthisplatform, we integrate most of the existing algorithms, implement our own proposedalgorithm, and thoroughly compare their performance through a lot of experiments.
Keywords/Search Tags:hashing, latentfactor, ANNsearch, kNNregression, machine learning
PDF Full Text Request
Related items