Font Size: a A A

Design And Application Of Distributed Platform Based On DPI Large Data

Posted on:2018-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z D LiFull Text:PDF
GTID:2348330518496839Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, it was found that massive data has a very large value and unlimited imagination, which gave birth to the "big data" concept. Large data analysis has features that has a large amount of data and complex query comparing to the traditional data warehouse applications. Many research organizations and companies have proposed many solutions, such as Nathan's proposed Lambda architecture,Kappa architecture and so on. For a company that has just started big data and want to make a difference in data analysis, how to build a distributed large data analysis platform quickly and effectively, and how to build its own data analysis application, has become the company's primary problem to be solved. The core of this paper is to design, analyze and build a distributed data analysis platform based on DPI data. Based on this platform, two applications are constructed: automatic page classification and group user portraits.Specifically, the main contents of this paper include the construction of distributed data analysis platform based on DPI data, the automatic page classification application based on the platform, the application of mobile user group portrait based on the platform. This paper introduces the architecture design, the function design and the design of each module of the platform. Architecture design from the current actual situation, select the appropriate technology stack. The architecture of the platform is divided into three layers: the data layer, application layer and presentation layer. Mainly used technology for Hadoop, Hbase, Hive, Django and so on.The second part is to realize a web page automatic classification application based on the platform. In this paper, the automatic classification of web pages is based on the url classification information obtained by web crawler as the training set, and the model is trained by using libsvm tool.The accuracy and effectiveness of the automatic classification are evaluated and compared by the parameter adjustment and feature selection of the SVM algorithm, and the accuracy of the final classification is more than 80%. The third part is the design and implementation of a platform based on the mobile user community portrait applications. Mobile user community portraits focus on label design and label mining. This paper designs a set of labels for group user portraits according to the specific needs and existing data, and through the platform, the labels are excavated and finally rendered and displayed in the platform. In the end, the platform implemented in this paper was able to run online and actually yielded good feedback.
Keywords/Search Tags:DPI Data, Distributed Platform, SVM, User Portraits
PDF Full Text Request
Related items