Font Size: a A A

Research On Advertisement Recommendation System Based On Data Mining

Posted on:2019-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:L JiangFull Text:PDF
GTID:2428330596451110Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the unstructured or semi-structured text data in the network is increasing dramatically.Faced with such a vast amount of data,the ability of users' understanding and processing data remains unchanged,it is particularly important that help users manage these data effectively and help users obtain the information they need.This question relates to the research of text mining.Text mining is a branch in the field of data mining,text mining extracts the unknown,understandable and potentially useful values of the models and knowledge from a large collections of texts or corpus.Text mining can be divided into text clustering mining and text classification mining,this paper will study text clustering mining concretely,since clustering does not need to pre-labeling of documents by hand,it can greatly reduce the time to manually organize the text,thus improve the efficiency.Therefore,it is very important to study the text clustering.Firstly,this paper summarizes the concept,the function and the step of the data mining and the concept of the clustering,based on the introduction of the basic ideas and the advangtges and disadvantages of some common clustering algorithms,this paper describes the reason that select the K-Means clustring algotithm for text clustring,analyze the advangtges and disadvantages of K-Means clustering algorithm and improve its disadvantages.Aiming at the problem that the original K-Means clustering algorithm randomly selects K initial cluster centers,an improved K-Means clustering algorithm is proposed.The algorithm firstly selects the initial cluster centers based on the density and clustering criterion,then,the K-Means clustering is performed on the selected initial cluster centers,,finally,using different data sets to verify the accuary and stability of the improved K-Means clustering algoriithm.The experimental results show that the improved K-Means clustering algorithm is more accurate and stable than the original K-Means clustering algorithm.Secondly,this paper describes the speific steps of the text clustering mining and gives the sepcific ideas of each step.Aming at the problem that the K-Means algorithm is very sensitive to the clustering number of the K,an improved K-Means algorithm is proposed.The algorithm firstly calcuates the similarity between word vectors based on the co-occurrence words principle and the clustering samples will be divided into the clustering number of the K+X based on the similarity threshold and selects K+x initial cluster centers based on the density and clustering criterion,then,the K-Means clustering algorithm is used on the K+x initial cluster centers.Use different text data to verify the accuary of the improved algorithm.The experiment show that the improved K-Means clustering algorithm effectively reduces the dependence of the algorithm on the parameter K.Finally,the overall design and module design of the advertising recommendation system in OFBiz e-commerce platform is given.The advertising recommendation system is implemented in the OFBiz e-commerce platform and the improved K-Means clustering algorithm is used in the advertising recommendation system,the system verifies the effectiveness of the improved K-Means clustering algorithm.
Keywords/Search Tags:K-Means algorithm, initial cluster centers, clustering criterion, text cluster mining, co-o ccurrence word
PDF Full Text Request
Related items