Research On Large-scale Unstructured Data Processing Of Index And Visualization

Posted on:2013-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Cao

Full Text:PDF

GTID:2268330392970628

Subject:Software engineering

Abstract/Summary:

Feature dimension reduction is an important part of the process of text classification. Astraditional text classification dimension reduction methods canâ€™t be rapid processing of large data,this paper presents statistics, information gain and mutual information MapReduce parallelmodel and conducted experiments under the Apache Hadoop platform. From our test of2010Mogao Grottoes dataset, we can see that our parallel model is very good in practice.As technology advances, the image data is rapidly growing. How to effectively manage theimage data is a great challenge. Image clustering is an important image data managing solution.This paper proposes a MapReduce solution for the SIFT image feature extraction and k-meansclustering method on Hadoop platform, and achieves good results.With the great progress of information technology, especially the rapid development ofInternet technology, information is no longer limited to the traditional non structure information.The traditional text retrieval or image retrieval technology can only deal specifically with a certaintype of unstructured information. It is a great challenge to describe different information with thesame method. This paper realizes hybrid index based on R-Tree, and achieves excellent result.Because of the rapid development of information technology,information overload makespeople hope that there is a way to fetch content they are interested in.But information retrieval cannot intuitive user interests behind the content.This paper studied how to utilize the informationvisualization technology to show law behind data.

Keywords/Search Tags:

R-Tree, unstructured data, index, parallel

Related items

1	Study Of Distributed And Parallel Index
2	A Distributed Index Research Based On B~+-Tree In Parallel Data Warehouses
3	Research And Implementation Of Parallel Index For Space Information System
4	Research On Unified Access Plantform For Unstructured Data And Index Technology
5	Based On The HDFS Unstructured Data Retrieval Technology Research And Application
6	Runtime support for unstructured data accesses on coarse-grained, distributed-memory parallel machines
7	Research On Multidimensional Cloud Data Index Structure Based On KD Tree And R Tree
8	Combining Segmentation Graphs And B+ Tree Cloud Data Indexing Mechanism Research
9	Research And Application Of Improved Spatial Index On Mass Remote Sensing Data Storage Platform
10	Research On Distributed And Parallel Spatial Index Mechanism