Font Size: a A A

Research On Classification Methods Of Experts Based On Clustering Analysis

Posted on:2018-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2348330542969378Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the era of big data,data mining technology has been favored and widely used by academia and industry.This thesis was based on the research and development of the expert recommendation system project from the Southeast University Migu big data joint laboratory.It provided background data support for the recommendation system with related cluster analysis technology as the core.Due to the fact that the data were no chapter,no structure,and could not be realized the automatic process,so the thesis chose 125499 articles of students' degree thesis from 1990 to 2016 as the data source.From the point of view of the research of the students' theses,speculated the research direction of the tutors,the research direction of experts in this paper.The thesis needed to find out the information of the field of experts among the big data to classify them.Though the research direction of experts could not be predicted,that means the number of the cluster was unknown,so this thesis would do the research based on clustering analysis,the unsupervised learning method.The research work of this thesis mainly includes four aspects:1)Automatic acquisition of data sources:For the real-time updating of the Wanfang database,and the uniform format,the thesis focused on utilizing the Focused Web crawler to automatically crawl the data in real time.The main problems were to limit access to the site and cookies'problems.The thesis should solve the problems above to maintain the continuous data acquisition without manual intervention;2)Data preprocessing:This text data chose the vector space model as the text model,but the amount of original data and the data dimension were thousands of,so the kernel work of the data preprocessing in the thesis was the reduction of dimension of the high-dimensional sparse data.In this thesis,two methods which called feature extraction and feature selection were used to reduce the dimensionality of the data.The feature selection stood from the new point of view,machine learning and statistics,combined with the CRF segmentation and TF-IDF dimension reduction methods to define the new feature space;Feature extraction compared with two algorithms:PCA and LDA to complete the work of dimensionality reduction.3)hybrid clustering algorithm based on fuzzy clustering and Dirichlet process:Combined with the actual situation of expert data,hard clustering method could not reflect the truth that an expert would have many research fields.According to this situation,the thesis put forward a fuzzy clustering method which could make the experts be divided into several clusters.This thesis proposed a hybrid clustering method based on fuzzy clustering and Dirichlet process which named FCM-DP.It not only improved the accuracy and efficiency of data processing,but also could determine the clustering theme better.In addition,on the basis of the results of the hybrid clustering,the data were optimized by post-processing.4)Results evaluation and method validation:In this thesis,the experiment part involved the research work of experimental design and realization in 2),3),and the experimental results are verified through the relevant assessment evaluation parameters to compare FCM-DP hybrid clustering method and other classical clustering methods.This thesis put forward the algorithm of FCM-DP clustering based on the fuzzy clustering and Dirichlet process to realize the classification work of the experts in accordance with research direction.And it used the baseline method and validation data set to validate and analysis the clustering results and the method.It not only implemented a multi-class,dynamic classification of experts,also achieved high value of silhouette coefficient.At the same time,the thesis used the LDA method to define the cluster theme.Making construction of experts net through analyzing the distribution of places of the experts,so that the result of this thesis could be used as the background data support of the recommendation platform.
Keywords/Search Tags:text mining, web crawler, classification of experts, FCM-DP hybrid clustering algorithm
PDF Full Text Request
Related items