Font Size: a A A

Based On Hadoop Specific Information Storage And Multi-dimensional Analysis System Build Research

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2428330551956468Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Aiming at the situation that large amount of thematic information resources and the rapid speed of change in the era of big data can't be dealt by the traditional information processing methods simply,this paper presents the topic of Hadoop framework for the construction of thematic information resource storage and multidimensional analysis system.In-depth study of the service model,service targets,technical systems and other aspects of the system,from the perspective of system construction and application to analyze the key technologies involved in building systems,based on this to design and implement thematic information storage and multidimensional analysis system based on Hadoop framework.The system can store thematic information from different sources,to help users efficiently analyze thematic information resources,and display the results of the analysis.The main work of this paper is mainly reflected in the following four aspects:(1)In order to meet the storage requirements of thematic information heterogeneous data storage scheme,the paper designed a storage solution based on HBase database,This program facilitates the uniform management of heterogeneous thematic information resources while providing efficient access to information for processing and utilization.(2)Based on MapReduce design text processing methods,topic information resources segmentation,feature extraction,vectorization and other operations.In the text word segmentation operation,to use collected field jargon to train word segmentation tools,to expand the word segmentation dictionary and improve the quality of word segmentation.(3)Statistical distribution and theme analysis of thematic information resources:statistical characteristics of the external information on thematic information resources,come to resource development trends and its network;Using the LDA model to analyze the theme of resource discovery and intensity evolution,the research focuses and the evolution of the field are obtained,and the analysis results are visualized.(4)Based on the actual production requirements,a Hadoop framework is designed and implemented to store thematic information and multidimensional analysis system.The system can realize thematic information resources storage,resource processing and analysis,and verify the usability of the system through functional testing,which has certain practical significance.
Keywords/Search Tags:Specific Information Resources, System Building, Topic Detection, Hadoop Framework, Information Resource Organization
PDF Full Text Request
Related items