Font Size: a A A

Course Similarity Calculation Using Efficient Manifold Ranking

Posted on:2017-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:B J ZhaoFull Text:PDF
GTID:2348330488453527Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Course Similarity Calculation aims at quantitatively computing the cross degree of the knowledge points two courses contain. In many cases, we have an urgent need to find out similarities among different majors. For example, when freshmen are faced with choosing majors, students pursue interdisciplinary studies, and graduates make sure of the fields of employment during hunting a job, all of them need a reference of the similarities among various majors. One of the most significant measures when calculating major similarities is the relevance among their curriculums. Therefore, course similarity calculation acts as the most critical step of major similarity calculation, and has important research significance. The challenge for course similarity calculation mainly exists in the polysemy and synonym of the extracted knowledge points.Existing course similarity calculation methods are mainly based on the traditional text mining approaches such as Latent Semantic Indexing (LSI) and Term Frequency-Inverse Document Frequency (TFIDF). However, these methods calculate the similarity between two courses simply by their absolute pairwise distance, which significantly limits the effectiveness of capturing the semantic relevance among all the courses. In this paper, we propose a novel course similarity calculation method using Efficient Manifold Ranking (EMR), which improves the traditional methods by measuring course similarities considering the underlying intrinsic manifold structure on the whole dataset, contributing to the adequate mining of semantic relevance among all courses in the data set. Our proposed course similarity calculation method mainly contains three steps. Firstly, we need to preprocess all course samples in the dataset. Each course sample consists of two parts:course name and course content. We use word segmentation component to extract the knowledge points from each course in the data set and after removing stop words and stemming process, we finally get the index knowledge points collection to feature each course. The second step is data modeling, that is, we need to construct the TFIDF-weight vector space model (VSM) over the whole data set. This step is divided into three processes:firstly, we must determine the m dimensional indexes of the vector space model VSM; then based on m dimensional indexes, we compute a TFIDF-weight feature vector of each course according to the knowledge points it contains; finally, feature vectors of all courses construct the VSM over the whole data set. In the third step of our proposed method, EMR algorithm is used based on the VSM to calculate the similarity among different courses. We use K-means clustering algorithm to compute a number of cluster centers over whole data set, and then construct a weighted graph from all course samples to the cluster centers. Based on the manifold structure, we run EMR algorithm to calculate the similarities among different courses. Experimental results on a real world course database demonstrate the outstanding accuracy performance of our proposed method comparing with the traditional text mining methods. Furthermore, we extend the proposed method to major similarity calculation area, and give an example of major similarity calculation process based on real world data.
Keywords/Search Tags:course similarity calculation, text mining, manifold ranking algorithm, EMR, major similarity calculation
PDF Full Text Request
Related items