Font Size: a A A

Improvement Of Constructive Parallel Covering Algorithm And Its Application In Service Recommendation

Posted on:2020-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhouFull Text:PDF
GTID:2428330575463022Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning is a very popular research direction recently and many researchers are interested in it and proposed many excellent learning methods such as Deep Neural Network,Decision Tree,etc.,and applied these methods to numerous machine learning areas,such as data mining,pattern recognition,natural language processing,etc..The application of data mining is very extensive among them.In the big data environment,business organizations and researchers are aware of the tremendous value the big data contains.Therefore,it is vital for researchers to choose suitable approaches to address big data and obtain valuable information from them.Clustering analysis is one such method that conforms to this application scenario.Clustering is a classic data mining method,and the K-means clustering algorithm is one of the ten classic algorithms in the area of data mining.However,the value of the clustering number k in the K-means algorithm is not always easy to be determined,and the selection of the initial centers is vulnerable to outliers.Based on the geometric meaning of neural networks,the covering algorithm was proposed by Chinese scholars.Covering clustering algorithm applies the concept of covering to aggregate the samples with similar features.It has "blind"feature and requires neither the number of clusters to be pre-specified nor the initial clustering centers to be manually selected.However,the radius acquisition approach of the traditional covering clustering algorithm is not scientific and does not analyze the clustering results,which results in unreasonable clustering results.This article mainly studies covering clustering algorithm and K-means algorithm and its improved algorithms,and research the application of the improved covering algorithm in service recommendation under big data environment.The innovations of this dissertation are as follows:(1)A new strategy for improving the radius of the domain in the traditional covering algorithm is proposed.The contribution value of each data point to each domain covering is different.The farther the data point is from the cluster center,the smaller the contribution value to the cluster center.On the contrary,the closer the data point is from the cluster center,the greater the contribution value to the cluster center.Therefore,the radius is obtained based on the contribution value of each data point to the cluster center,which makes the similar data more likely to cluster in the same cluster and the dissimilar data more likely to cluster in different clusters;(2)Based on the theory of quotient space,the split mechanism and merge mechanism are proposed.The covering results obtained by the improved covering algorithm based on domain radius are regarded as a pre-treatment process.These initial clustering results are analyzed by split mechanism and merge mechanism.The number of clusters is obtained based on quotient space theory knowledge,instead of estimating the number of clusters subjectively in advance,which solves the problem that the number of clusters is difficult to determine;(3)In order to overcome the defects of K-means algorithm and its improved algorithms and the difficulty of determining the initial cluster center point,a K-means algorithm based on improved covering clustering algorithm is proposed.The experimental results show that compared with K-means algorithm,K-means++algorithm and K-means|| algorithm,the proposed algorithm is better in terms of accuracy and efficiency;(4)Some previous algorithms performed well in off-line testing on small-scale datasets,but they did not work on large-scale datasets.Therefore,we should consider the real-life large data scenarios.In order to adapt to the large data environment,parallel algorithms are implemented in Spark environment,which proves that the improved covering algorithm(CA-QGS)and C-K-means algorithm have good scalability and can effectively solve large-scale data clustering problems;(5)The application of the C-K-means is studied.It is applied to service recommendation in big data environment.A new recommendation model is proposed based on the traditional service recommendation,which covers all users who have invoked different services and all services that have been invoked by different users.A new Top-k;mechanism is proposed.The clustering results are used to obtain the similar neighbors of the target users or services,and then use the coverage information to predict the target user's QoS value for the service,and then perform service recommendation.The C-K-means algorithm's recommendation accuracy and efficiency are significantly outperforms these methods compared to other methods currently used in service recommendation.
Keywords/Search Tags:Machine learning, Covering algorithm, Quotient space, Service recommendation
PDF Full Text Request
Related items