Font Size: a A A

The Design And Implementation Of Call Center Text Classification System

Posted on:2020-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H H ZhaoFull Text:PDF
GTID:2428330626450752Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of urban intelligence,government departments have established call centers in order to understand people's livelihood demands,and have become an important channel for the masses to express their opinions.The text of people's livelihood appeal contains a wealth of local hot events and appeal information.These text data are often of varying lengths and contain a large amount of information.With the gradual increase of the amount of data,how to find the type of hot appeals that the masses care about has become the concern of managers Focus.This thesis uses text classification technology to analyze the call center's appeal data,and designs and implements a text classification system from data collection,data preprocessing,text data classification and visualization.The main work of this thesis includes:(1)Data collection,collecting raw data scattered among different data sources.The thesis designed a reasonable data storage format to complete the initial entry of data.The call center's data source has an update summary every day,so the incremental acquisition mode is used later to complete the incremental entry of new data.(2)Data preprocessing,the thesis design realizes a set of preliminary data cleaning methods,and completes the cleaning and filtering of real data.At the same time,the Chinese text processing process is implemented for the appeal text in the data,including Chinese word segmentation,removal of stop words,feature selection and text representation.In the text feature representation stage,the traditional TFIDF algorithm ignores the defects in the feature class and the distribution between classes,and combines the chi-square statistics and information entropy to propose an improved TFIDF-T algorithm.At the same time,the text representation based on the word vector Word2 vec is studied.The A_Word2vec model of the word vector average and the T_Word2vec text feature representation model combined with the TFIDF weight.(3)Classification of appeal texts,combining the above several text feature representation models with KNN,SVM and Na?ve Bayesian classification algorithms,and comparing with the deep learning text classification model FastText based on Word2 vec word vector model,the results show that The T_Word2vec model performs text feature representation,and KNN as a classification algorithm has a better appeal text classification effect.Furthermore,with the distributed parallel computing power of Hadoop cluster,the parallelization of the appeal text classification algorithm is realized.(4)Visual display,generate an intuitive report display for the final classification result,and realize the time and regional display of the appeal.The system test and actual operation show that the call center text classification system implemented in this thesis has better accuracy and performance,and can reflect the hotspots of people's livelihood in a timely manner.
Keywords/Search Tags:Hadoop, call center, text classification, Word2vec, TF-IDF
PDF Full Text Request
Related items