Research And Application Of Mobile Phone Users Classfication Method Based On Characteristics Of Text

Posted on:2017-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:D C Zhong

Full Text:PDF

GTID:2348330503468510

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

We are in an era of information explosion, and text classification technology has become an important basis for information retrieval and text mining. With the increase in the number of mobile Internet users, mobile advertising need to be precise delivery. Meanwhile, mobile phone user classification, such as gender classification,age classification, occupation classification and so on, is the key to improve the efficiency of mobile advertising, so it begin to catch more and more people’s attention. In this paper, the classification of mobile phone users, relative to other text classification, there are several characteristics: data form of diversification, data dimension, data is more fragmented. Few people study the classification of mobile phone users in China and abroad. Moreover, it is necessary to study the classification of mobile phone users based on the characteristics of large data technology.On the basis of the research of text classification, this paper focuses on the problem of mobile phone users classification based on text features. The data set used in the experiment is the application list of the mobile phone users provided by Guangzhou YouMi Technology Co,Ltd.On the basis of the analysis of the classification requirements of a specific data set, I construct a stable performance, good scalability, supported for large data computing data classification platform. In order to adapt to different application scenarios, this thesis studies and implements two classification models. The one is based on the improved TFIDF vector weighted classification algorithm model, and the other is based on the information entropy of the KNN classification algorithm model.The former model improve the classification accuracy by increasing the weight parameters to measure the part of speech. In addition, based on the information entropy KNN algorithm model has good interpretability of the classification results and its accuracy of classification is higher than the traditional KNN algorithm, at the same time, the KNN algorithm has a good performance for the multi classification data, so it has a good scalability.In this paper, the two classification models have their own advantages, the former model’s advantage is the higher speed of classification, and its suitable for real-time system. And the latter has advantages in the correct rate.In the context of Internet + big data, this paper is based on the Hadoop distributed processing platform, which greatly expanded its storage and computing power.

Keywords/Search Tags:

text classification, TFIDF, KNN, distributed platform

PDF Full Text Request

Related items

1	Tfidf-based Text Classification Algorithm Research
2	Research On Text Classification Of Web Text Mining
3	Improved Term-weighting Approach In Chinese Text Classification Over Skewed Data Sets
4	Research And Implementation Of KNN Text Classification Based On CURE Clustering
5	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
6	Research On Chinese Text Classification Algorithm Based On Improved TFIDF And LSTM
7	Application Of Improved TFIDF Algorithm In Text Analysis
8	Correlation Algorithm Research And Realization Chinese Text SVM-based Classification
9	Sentiment Classification By Combining Lexicon-based And Machine Learning Methods
10	Research On Classification Of Chinese Documents Based On Vector Space Model