| Nowadays, there are three major challenges for short text clustering, the sparsity of feature key. the complexity of processing in high-dimensional space and the comprehensibility of clusters. For these challenges, a short text clustering algorithm is proposed in this paper,which is improved by combining with semantic. Short text is described by collection of words in this algorithm.it alleviates the sparsity problem of Characteristics of short text keywords. The clustering center can be obtained by mining the maximum frequent word set of short text collection and semantic similarity,which effectively overcomes the defect that traditional clustering algorithm is sensitive to the clustering center,it solves the problem of the comprehensibility of clusters.It avoids the operation in high-dimensional space. While in the process of mining frequent item sets, for the inefficiencies of serial algorithm inefficiencies, an association rules algorithm based on Hadoop platform is proposed.The experimental results show that the parallel mining algorithm of frequent itemsets proposed in this paper can efficiently mining frequent itemsets and short text clustering algorithm combined with semantic is better than traditional algorithms. |