Research On Big Data Text Analysis Based On Hadoop Architecture | Posted on:2020-11-01 | Degree:Master | Type:Thesis | Institution:University | Candidate:SORBONI MUMIN | Full Text:PDF | GTID:2428330596478137 | Subject:Communication and Information System | Abstract/Summary: | PDF Full Text Request | The era of "Big Data" is upon us,and the Big Data has emerged with new opportunities and challenges to deal with massive data.The Big Data has played an important role in modern society.To find the useful information from massive data,the data need to be analyzed.Data analysis requires to get the information from unstructured data that emerges on the web by texts,images,videos or social media posts.This thesis presents an overview on Big Data about its advantages and research scope.And this thesis gives an introduction to Big Data Text Analysis in Hadoop architecture and its components.This thesis also concentrates on application of Big Data in Data Mining.Text Analysis is one of the most complex analysis in the industry analytics.The reason is that we need deal with unstructured(Email,Facebook,Twitter and Linkedin feeds)data while developing text mining.We do not have clearly defined observation and variables(rows and columns).Hence,for doing any kind of data analysis,you need to first convert this unstructured data into a structured dataset and then proceed with normal modelling framework.The additional step for converting an unstructured data into a structured format is facilitated by a word dictionary.We need a dictionary to do any kind of information extraction.Dictionary for sentiment analysis can be found on web.However,for some specific analysis,you need to create a dictionary of your own.In this thesis,two conceptual parts of text analy sis are described with the Hadoop eco system,as well as particular MapReduce.The first way is collected a large text file(CSV files)from tweets in 2013.The tweets are a small sample extracted from Twitter's tweets using a DataSift stream.The tweets in the stream are filtered,they mention Apple products like iPhone,iPad,Apple Watch and so on.In the second part of the thesis deal the data for developed application at Lanzhou University of Technology.The application of the thesis research is achieved with automatic system for determining the movement of transport vehicles between the two campuses(Langongping Campus and Pengjiaping Campus)at my university was fully developed and implemented.The application is available in the iOS platform for iPhone and iPad devices.The application automatically collects all the current GPS data for the campus' buses for next research.Using this application to the Hadoop eco system can know the time about traffic jams between campuses and about buses insufficiency to transport students.The application can also collect general information on the location of students or passengers and can determine the nearest bus station or campus.The research can collect a huge of big data for further analysis.In this thesis can resolve main problems that are running a Text Data processing transform as a MapReduce job.The errorlog of the DataServices job will simply contain the errorlog provided by the generated Pig script.It might already point you to the source of the problem.On the other hand,other kind of problems might not be listed here.Instead you need to review the relevant Hadoop or MapReduce log files within Hadoop. | Keywords/Search Tags: | Big Data, Hadoop, MapReduce, HDFS, Text Analysis, Hadoop Cluster | PDF Full Text Request | Related items |
| |
|