Font Size: a A A

Educational Text Retrieval System Based On LDA

Posted on:2015-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WuFull Text:PDF
GTID:2298330422977183Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology, educational resources appearwith many kinds, i.e. Word, PowerPoint, PDF, Flash and so on. However, so manyresources are lack of management. Traditional retriever system based on VSM can’thandle with the semantic of content, so that it can’t make results relevant enough. Itmust need more to search for learning a new course or technologies. Therefore, it hasgreat significance to improve the quality of retrieval system.This paper comes from Research on Key Enabling Technologies for Cloud BasedEducation, NSFC. The research project works on the development, application andsharing of the quality educational resources, using cloud computing technology tomerge educational resources to apply services. My main work is divided into threeparts. First, research and design retrieval method based LDA. Second, design andimplement educational text retrieval prototype system based on LDA. Third, compareresult between prototype system and the traditional retrieval method based on TF-IDFVSM model, and give a method to let prototype system to access the cloud educationplatform.Traditional VSM model has sparse high-dimension and can’t make a goodunderstanding of the semantics of the text. This paper solves these problems usingLDA. With Gibbs sampling method, the text can be present with a probabilitydistribution on topic sets with a low computational complexity, and then useJS(Jensen-Shannon) distance and K-Means on the theme of space so that morerelevant files can be return. In this paper, the system makes some pretreatment on the collected educationalfiles. The main purpose of pretreatment is to remove stop words and specialcharacters with regular expressions, and to get the text’s word vector representation.Then it uses LDA to model files to get topic space and word space about these files.Finally, it combines LDA model with K-Means to get relevant documents accordingthe query item. Experimental results show that compared with VSM, the relevancehas been improved.
Keywords/Search Tags:Semantic analysis model, LDA model, clustering, K-Means clustering, JSdistance, VSM model
PDF Full Text Request
Related items