Font Size: a A A

Research And Application Of Mobile Phone Users Classfication Method Based On Characteristics Of Text

Posted on:2017-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:D C ZhongFull Text:PDF
GTID:2348330503468510Subject:Software engineering
Abstract/Summary:PDF Full Text Request
We are in an era of information explosion, and text classification technology has become an important basis for information retrieval and text mining. With the increase in the number of mobile Internet users, mobile advertising need to be precise delivery. Meanwhile, mobile phone user classification, such as gender classification,age classification, occupation classification and so on, is the key to improve the efficiency of mobile advertising, so it begin to catch more and more people's attention. In this paper, the classification of mobile phone users, relative to other text classification, there are several characteristics: data form of diversification, data dimension, data is more fragmented. Few people study the classification of mobile phone users in China and abroad. Moreover, it is necessary to study the classification of mobile phone users based on the characteristics of large data technology.On the basis of the research of text classification, this paper focuses on the problem of mobile phone users classification based on text features. The data set used in the experiment is the application list of the mobile phone users provided by Guangzhou YouMi Technology Co,Ltd.On the basis of the analysis of the classification requirements of a specific data set, I construct a stable performance, good scalability, supported for large data computing data classification platform. In order to adapt to different application scenarios, this thesis studies and implements two classification models. The one is based on the improved TFIDF vector weighted classification algorithm model, and the other is based on the information entropy of the KNN classification algorithm model.The former model improve the classification accuracy by increasing the weight parameters to measure the part of speech. In addition, based on the information entropy KNN algorithm model has good interpretability of the classification results and its accuracy of classification is higher than the traditional KNN algorithm, at the same time, the KNN algorithm has a good performance for the multi classification data, so it has a good scalability.In this paper, the two classification models have their own advantages, the former model's advantage is the higher speed of classification, and its suitable for real-time system. And the latter has advantages in the correct rate.In the context of Internet + big data, this paper is based on the Hadoop distributed processing platform, which greatly expanded its storage and computing power.
Keywords/Search Tags:text classification, TFIDF, KNN, distributed platform
PDF Full Text Request
Related items