The Research And Application Of Parallel Latent Dirichlet Allocation And Clustering Algorithm

Posted on:2017-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Wan

Full Text:PDF

GTID:2348330488977974

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the society has entered a era of data explosion. Since these data contains useful information, how to mine valuable information from big data has become the hotspot. Faced with a flood of data, traditional stand-alone data processing has been unable to deal with, so people began to seek new solutions. Cloud computing, big data processing technology came into being. Among the different big data processing technologies, Spark is a in-memory computing framework for large-scale data processing which is popular in recent years. It is widely used for its advantage of being good at interactive and iterative calculations.In this paper, we design parallel methods of machine learning based on Spark. The paper also involves the calculation of word similarity, and we make some improvements on the method of calculation. Finally, these methods are applied to micro-blog ads serving that can achieve ads targeted delivery.In this paper, research can be divided into the following four areas:1. We propose a parallel LDA topic modeling method based on Spark. We use Gibbs sampling for the model. We divides the data set into several sub-data sets, and we assign each sub-data set to each node in for parallel processing to realize parallel LDA model.2. Bisecting K-Means clustering algorithm improved and parallel method designed. Considering the insufficiency of clustering speed which exists in the selecting the initial centroid of Bisecting K-Means clustering algorithm, we improve the algorithm by selecting the two patterns with distance maximum as the initial cluster centroid in order to accelerate clustering in clustering system. In addition, we propose a parallel method based on Spark.3. Word similarity computing method improved. Similarity computing method is based on How Net. Through the study of the How Net, we improve the word similarity computing method, the experiment shows the similarity in line with people's understanding and awareness.4. With the research results, we design a scheme of the Micro-blog advertising targeted delivery. First, we use LDA parallel algorithm and Bisecting K-Means clustering parallel algorithm to mine users' interest. Second, we use the word similarity computing method to compute the similarity of the interest-word and ad keywords, so we can advertising to interested users.

Keywords/Search Tags:

Spark, LDA, Bisecting K-Means, Word similarity, micro-blog ads

PDF Full Text Request

Related items

1	Design And Implementation Of Micro-blog Advertising System Based On Users’ Interests
2	The Desgin And Implementation Of The Micro-blog Public Opinion Monitoring System Based On Spark
3	A Fast And Efficient Parallel Bisecting K-Means Algorithm
4	The Research Of Micro-Blog New Emotion Words Recognition And Orientation Judgment Based On Word2Vec
5	Research On Unknown Words Recognition And Word Meaning Discovery Based On Short Text Of Micro-blog
6	The Design And Implement Of Combined Recommendation Algorithm Based On Micro-blog Information
7	Research On Topic Detection Technology For Chinese Micro-blog
8	Based On The Micro-blog Hot Topic Extraction And Utilization Of Research
9	A Research Of Timing Events Based On Personal Micro-blog
10	Research On Particle Swarm Optimization Clustering Algorithm Oriented To Micro-blog Topic