Font Size: a A A

Research And Implementation Of Data Transformation Technology Based On Automated Text Rule Extraction

Posted on:2019-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhuFull Text:PDF
GTID:2428330548977440Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,there is too much valuable information hidden behind the ever-expanding data.It is obviously impractical to rely on manual mining.So,using auxiliary tools to help analyze data is particularly necessary.Although there are quite a few mature ETL(Extract-Transform-Load)tools on the market today,they often have a same problem:Many data transformation tasks require code to be implemented.This kind of work that has little to do with the data analysis itself not only takes a lot of time,but also makes it difficult for data analysts with weak programming skills to get started.To solve this problem,this article designed and implemented an automated text rule extraction technology and applied it to the data transformation process.This technology starts with several strings that need to extract rules from,it firstly generates a set of regular expressions that can match all input strings,then filters out a set of rules that users possibly need,and finally recommends these rules.to them.By using this technology,data analysts can easily transform data without programming.To verify the feasibility of the above technology,we implemented an interactive,visualized data transformation assistance system.This system reads plain text data,takes a series of text selected by the users as an input,and generates some rules that users may possibly need by automated text rule extraction technology.In this way,users can carry out data transformation quickly.In addition,the system also provides a what-you-see-is-what-you-get interface,so that users can easily preview the execution result of the recommended rules;it will attach a corresponding natural language description to each rule,to help user understand the work to be done by it better;after all the transformation is completed,the entire process can be exported to facilitate batch processing on large data sets of the same format.Finally,this paper optimizes the key parameters in automated text rule generating and filtering algorithm through experiments.And it also compares the time efficiency to process several plain text data sets between our data transformation assistance system and Excel.After testing,the processing efficiency of our system is significantly higher than that of Excel,which shows that our proposed technology has practical value in the data transformation phase and can help data analysts complete their work faster and better.
Keywords/Search Tags:data transformation, automated rule extraction, visualization, high efficiency
PDF Full Text Request
Related items