Font Size: a A A

The Research Of Parrallel Community Detection And Pattern Mining To Large Scale Telecommunication Data

Posted on:2012-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2178330335960544Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapidly development of science and technology, the world we live in are gradually covered by large amount of data. The scale of these data is far beyond the ability that a human being's brain can process. So, it becomes a science to extract useful information from this data ocean, which is Data Mining. With the use of computing devices, it becomes easier and easier to mine the data. Researchers just write pieces of program to process, statistic and analyze all kinds of data. However, just like the society keeps progressing, the growth of the data scale won't stop. And really quick, this growth make the single computer look slow and stupid. To deal with this situation, many parallel computing methods have been proposed, one of which is the famous MapReduce calculation model.There are many areas in which large scale data processing are needed, one of which is the telecommunication. With the development of telecommunication technology especially after the invention of mobile phone, the scale of data begins to explode in this area. Big potential profit space is hiding in these data. So, naturally, it becomes an urgent requirement to analyze them. Because of the large scale, it is necessary to store and process these data in a distributed way.In this paper, I will first propose a method to extract, transform and load (ETL) original telecommunication data based on Hadoop platform. This method makes it possible to support new data format without any change of the original software implementation.After that, I propose a method to detect community in large scale telecommunication graph using Hadoop and it's powerful calculating ability.Finally, I will describe a process procedure to the telecommunication data. The data is first transformed by some certain format, to form a big graph of people linked by telephone calls. To detect all the communities in the graph, the result of which would be stored in a distributed database named HBase. A visualization module will show the relationship of these communities, and what's more, the relationship of the nodes in these communities. At the same time, another module would take analyze all the communities to do the statistic and sort out the useful pattern and characterizes.
Keywords/Search Tags:community detection, MapReduce, distributed computing, graph mining
PDF Full Text Request
Related items