Course Similarity Calculation Using Efficient Manifold Ranking

Posted on:2017-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:B J Zhao

Full Text:PDF

GTID:2348330488453527

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Course Similarity Calculation aims at quantitatively computing the cross degree of the knowledge points two courses contain. In many cases, we have an urgent need to find out similarities among different majors. For example, when freshmen are faced with choosing majors, students pursue interdisciplinary studies, and graduates make sure of the fields of employment during hunting a job, all of them need a reference of the similarities among various majors. One of the most significant measures when calculating major similarities is the relevance among their curriculums. Therefore, course similarity calculation acts as the most critical step of major similarity calculation, and has important research significance. The challenge for course similarity calculation mainly exists in the polysemy and synonym of the extracted knowledge points.Existing course similarity calculation methods are mainly based on the traditional text mining approaches such as Latent Semantic Indexing (LSI) and Term Frequency-Inverse Document Frequency (TFIDF). However, these methods calculate the similarity between two courses simply by their absolute pairwise distance, which significantly limits the effectiveness of capturing the semantic relevance among all the courses. In this paper, we propose a novel course similarity calculation method using Efficient Manifold Ranking (EMR), which improves the traditional methods by measuring course similarities considering the underlying intrinsic manifold structure on the whole dataset, contributing to the adequate mining of semantic relevance among all courses in the data set. Our proposed course similarity calculation method mainly contains three steps. Firstly, we need to preprocess all course samples in the dataset. Each course sample consists of two parts:course name and course content. We use word segmentation component to extract the knowledge points from each course in the data set and after removing stop words and stemming process, we finally get the index knowledge points collection to feature each course. The second step is data modeling, that is, we need to construct the TFIDF-weight vector space model (VSM) over the whole data set. This step is divided into three processes:firstly, we must determine the m dimensional indexes of the vector space model VSM; then based on m dimensional indexes, we compute a TFIDF-weight feature vector of each course according to the knowledge points it contains; finally, feature vectors of all courses construct the VSM over the whole data set. In the third step of our proposed method, EMR algorithm is used based on the VSM to calculate the similarity among different courses. We use K-means clustering algorithm to compute a number of cluster centers over whole data set, and then construct a weighted graph from all course samples to the cluster centers. Based on the manifold structure, we run EMR algorithm to calculate the similarities among different courses. Experimental results on a real world course database demonstrate the outstanding accuracy performance of our proposed method comparing with the traditional text mining methods. Furthermore, we extend the proposed method to major similarity calculation area, and give an example of major similarity calculation process based on real world data.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	The Study On Ranking And Similarity Calculation In Information Retrieval
2	Research On Text Similarity Calculation Method And Its Application In Financial Field
3	Research And Application Of Sentence Similarity Calculation Based On Distributed System
4	Research On Calculation Method Of Text Similarity Based On Deep Learning In Intelligent Question Answering System
5	Content Analysis Based Patent Mining Research
6	Study On Similarity-based Text Clustering Algorithm And Its Application
7	Forum Data Extraction Based On Similarity Calculation
8	Research On Personalized Recommendation Of University Library Based On Comparison Of Text Similarity
9	Research And Implementation Of College Enrollment Question And Answer Service System Based On Deep Learning
10	Research On The Calculation Method Of Han-Thai Bilingual News Text Similarity With News Elements