| Social media big data classification for the discovery of hot news events,personalized recommendation,public opinion monitoring and spam filtering,brand promotion and other aspects have important research value.Most social media contains text,even the images,video,audio and other multimedia social data,there is also a corresponding text label or corresponding text description is obtained through the multimedia information retrieval.Therefore,this paper mainly studies the classification method of text format social media big data,and a corresponding improvement is made to the Bernoulli model and the Multinomial model Naive Bayes classification method.Firstly,this paper introduces the rationale and related technologies of large data classification of social media text,including text infomation representation of social media big data,the basic structure of the classifier and classification process,feature selection method,Hadoop platform etc,and based on Bayes formula,analyzed the principle of Naive Bayes text classification in detail.Secondly,for social media big data Naive Bayes classification that feature items obey Bernoulli distribution probabilistic model,through by feature item normalization,let "positive" feature item in operation and adjusting smoothing parameter method.After improved,classification speed has been greatly increased,and in the case of uneven sample,classification performance has also been improved.Thirdly,for social media big data Naive Bayes classification that feature items obey Multinomial distribution probabilistic model,through introduced independent variable coefficients of IDF logarithmic function in TF-IDF algorithm,assiged a higher weight to the feature item that is beneficial to the classification,and normalized feature item weight,improved Multinomial model Naive Bayes classification method.By experimental verification,the improved method improves the classification performance to a certain extent.For realizing classification under MapReduce framework,according to Naive Bayes classification process and related calculation formula,a parallel implementation of the improved Multinomial model based Naive Bayes classification method is carried out.The experimental results show that the improved method is effective,and it has certain expansibility and reliability. |