Font Size: a A A

Design And Implementation Of A Massive Text Data Processing Tool Based On Reusable Components

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C MaFull Text:PDF
GTID:2348330545958406Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,a large amount of text data is generated.These text data contain huge value.How to get useful information from massive text data is particularly critical.To achieve information mining of massive text data,the first thing to solve is to store and efficiently deal with massive text data.Hadoop distributed platform and Spark computing framework solve these two problems well.Secondly,the mining of the value of the text data can be completed by the text mining algorithm.However,whether Hadoop,Spark or text mining algorithms involve a lot of technical details,it is difficult to master these technologies to complete the processing and analysis of massive text data.In order to solve this problem,this paper designs and implements a mass text data processing tool based on reusable components.By the tool,the massive data processing component users can achieve already on the tools used in the front page in the face of these interactive tool components component selection,component parameters,based on component editing workflow and submit workflows to complete sea text data processing task.This tool greatly reduces the technical threshold of massive text data processing,and can accomplish complex mass text data analysis tasks without programming,which makes users analyze and process massive texts simply and conveniently.The main functions of the tools implemented in this paper are as follows:1.support for massive text data:tools support massive text data through large data technology,including the storage and processing of massive text data.2.mass text data processing components:components encapsulate and component a variety of massive text data processing operations so that they can be reused for users.The tool is designed and implemented on the structure and operation of reusable components,and on this basis to achieve the common massive text data processing module,including data acquisition module,text representation module,text classification component,text clustering component,to satisfy the basic demand of massive text data processing.3.workflow:workflow is based on components to express the data processing process.On the basis of components,tools are designed and implemented for the structure and operation of workflow,which enables users to define the process of massive text data processing tasks through components and workflow,and complete the massive text data processing and analysis tasks.4.tool front-end interaction page:through the front-end interaction page of the tool,you can browse the information and components of the massive text data processed by the tool,and set the parameters of the components.At the same time,the workflow can also be edited and submitted to run by the component.
Keywords/Search Tags:component, workflow, massive data processing tool, text mining
PDF Full Text Request
Related items