Font Size: a A A

Research On Web Text Classification Algorithm Based On Parallelism

Posted on:2019-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:2428330563999164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,a lot of text data is generated all the time on the web,traditional manual management methods have been unable to meet the needs of society,therefore,fast and efficient automatic text classification technology has become a hot spot for people to study.Although the text classification technology is widely used in spam filtering,search engines and information management and achieved rapid development,however,the actual classification performance is still relatively low,there is still room for improvement in classification accuracy and efficiency.This paper mainly analyzes the feature selection and text classification model construction in two aspects and achieved the following results:1.An optimized weighted naive Bayesian parallel classification model is proposed.The use of information gain in the process of building feature set by adding the word frequency adjustment factor,eliminating the feature of the high frequency of redundant features,select features with strong discrimination to construct feature sets;Using ant colony algorithm to iteratively optimize weights,find the global optimal solution,construct IA-WNB classification model;MapReduce framework is combined with feature selection,model training and model verification respectively,design parallelism to complete the task of classification of web text data.Through experimental design verification,IA-WNB classification model can effectively improve the classification efficiency of web texts,and in the parallel design to ensure accuracy and can shorten the running time.2.A convolutional neural network parallel classification model based on semantic extension is proposed.Due to the semantic ambiguity and sparse features of web short text data set.Therefore,the purpose of the semantic extension of textual features is achieved by constructing a {topic-feature} double tuple,Binary as input data for the CNN classification model,using Convolutional Neural Network Classification Model to further optimize data features,use the Softmax function for classification;Then combine the MapReduce framework with the constructing feature tuple and parameter training,in the data preprocessing and classification model parameter tuning two parts to complete the parallel design.Through experimental design verification,Convolutional Neural Network Classification Model Based on Semantic Extension When Processing Web Short Text Data,the accuracy and classification efficiency of classification models are improved.
Keywords/Search Tags:Text Categorization, Naive Bayes, Web Text, Convolution Neural Network Model, Parallelization
PDF Full Text Request
Related items