Font Size: a A A

Research On Large-scale Unstructured Data Processing Of Index And Visualization

Posted on:2013-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:L CaoFull Text:PDF
GTID:2268330392970628Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature dimension reduction is an important part of the process of text classification. Astraditional text classification dimension reduction methods can’t be rapid processing of large data,this paper presents statistics, information gain and mutual information MapReduce parallelmodel and conducted experiments under the Apache Hadoop platform. From our test of2010Mogao Grottoes dataset, we can see that our parallel model is very good in practice.As technology advances, the image data is rapidly growing. How to effectively manage theimage data is a great challenge. Image clustering is an important image data managing solution.This paper proposes a MapReduce solution for the SIFT image feature extraction and k-meansclustering method on Hadoop platform, and achieves good results.With the great progress of information technology, especially the rapid development ofInternet technology, information is no longer limited to the traditional non structure information.The traditional text retrieval or image retrieval technology can only deal specifically with a certaintype of unstructured information. It is a great challenge to describe different information with thesame method. This paper realizes hybrid index based on R-Tree, and achieves excellent result.Because of the rapid development of information technology,information overload makespeople hope that there is a way to fetch content they are interested in.But information retrieval cannot intuitive user interests behind the content.This paper studied how to utilize the informationvisualization technology to show law behind data.
Keywords/Search Tags:R-Tree, unstructured data, index, parallel
PDF Full Text Request
Related items