Font Size: a A A

Improvement And Application Of Naive Bayes Algorithm

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:G MaFull Text:PDF
GTID:2348330545498778Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the popularity of big data,the demand for analysis of data becomes more and more important,in recent years,data mining technology has received widespread attention,which extracts data from each area,turns this data into information and finds the connection between the data and information.Classification algorithm is an important research contents in the field of data mining.It mainly divides data items into a certain category using various classifiers.Bayes algorithm is a classic algorithm in classification algorithm,which is a reasoning method of probability based on probability and statistics and which calculates posterior probability by using a priori probability.Due to smaller error rate,it has been widely used.Bayes methods are mainly divided into Bayes methods and Bayes networks.Na(?)ve Bayes method is a simplified method for Bayes methods.This article is going to make some research about Na(?)ve Bayes methods.This paper makes some improvements on the basis of Na(?)ve Bayes algorithm,as follows.(1)Data set is crucial in the classification,data is often lost during the collection process.In order to obtain complete data sets,a fast cluster algorithm is proposed to fill the missing data.Firstly,the data set is separated to form a complete data set and a missing data set.Next,the complete dataset is clustered using the fast cluster algorithm.Finally,fill the missing data with the similarity between missing data and cluster center.Experimental results show that Na(?)ve Bayes classification model based on Fast Cluster algorithm(FCNB)can correctly fill the missing values and improve the accuracy of classification.(2)According to characteristic of classification of two categories,that is the proportion of positive and negative examples set is different in the data set.This paper proposes a Na(?)ve Bayes classification model based on K value(K Na(?)ve Bayes,K-NB),the method is to compare two kinds of probability ratio,when the probability ratio greater than K,non-classified data is going to classify.The experimental results show that K-NB algorithm can get a higher rate of accuracy for classification of two categories compared with other improved algorithms.(3)In this paper,a Na(?)ve Bayes classification model(Supervised Learning Na(?)ve Bayes,SLNB)is proposed based on Supervised Learning algorithm and Na(?)ve Bayes algorithm.Supervised learning algorithm has better efficiency of classification and Na(?)ve Bayes algorithm has a high accuracy.Firstly,data sets is clustered respectively using supervised learning algorithms,sequentially form a number of clustering center of the positive and negative example,and then decided to adopt supervised learning algorithm or Na(?)ve Bayes algorithm based on adjacent distance between samples and clustering center of positive and negative example,which not only ensure the accuracy,also has a good efficiency of classification.(4)Based on the above improvements,an phishing website detection model of Android mobile is built.Firstly,collect the data of phishing websites and use the Fast Cluster algorithm to process the collected dataset to obtain a complete dataset.Finally,the model uses K-NB algorithm to classify web sites.The results of experiment show that the model has high accuracy,which can detect the phishing sites effectively.
Keywords/Search Tags:Na(?)ve Bayes, Supervised Learning, Phishing Website, Android
PDF Full Text Request
Related items