Educational Text Retrieval System Based On LDA

Posted on:2015-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wu

Full Text:PDF

GTID:2298330422977183

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology, educational resources appearwith many kinds, i.e. Word, PowerPoint, PDF, Flash and so on. However, so manyresources are lack of management. Traditional retriever system based on VSM can’thandle with the semantic of content, so that it can’t make results relevant enough. Itmust need more to search for learning a new course or technologies. Therefore, it hasgreat significance to improve the quality of retrieval system.This paper comes from Research on Key Enabling Technologies for Cloud BasedEducation, NSFC. The research project works on the development, application andsharing of the quality educational resources, using cloud computing technology tomerge educational resources to apply services. My main work is divided into threeparts. First, research and design retrieval method based LDA. Second, design andimplement educational text retrieval prototype system based on LDA. Third, compareresult between prototype system and the traditional retrieval method based on TF-IDFVSM model, and give a method to let prototype system to access the cloud educationplatform.Traditional VSM model has sparse high-dimension and can’t make a goodunderstanding of the semantics of the text. This paper solves these problems usingLDA. With Gibbs sampling method, the text can be present with a probabilitydistribution on topic sets with a low computational complexity, and then useJS(Jensen-Shannon) distance and K-Means on the theme of space so that morerelevant files can be return. In this paper, the system makes some pretreatment on the collected educationalfiles. The main purpose of pretreatment is to remove stop words and specialcharacters with regular expressions, and to get the text’s word vector representation.Then it uses LDA to model files to get topic space and word space about these files.Finally, it combines LDA model with K-Means to get relevant documents accordingthe query item. Experimental results show that compared with VSM, the relevancehas been improved.

Keywords/Search Tags:

Semantic analysis model, LDA model, clustering, K-Means clustering, JSdistance, VSM model

PDF Full Text Request

Related items

1	Chinese Text Clustering Based On Latent Semantic And Its Applications
2	Application Researches On Independent Component Analysis Based Semantic Clustering In Information Retrieval
3	Research On The Distribution Characteristics Of Flora Information Based On The Probabilistic Topic Model
4	Study Of Chinese Text Clustering On Improved K-means Algorithm
5	Quantitative Research, Particle-based Clustering Algorithm Context
6	Clustering Lars And Clustering Coordinante Descend Simulation And Case Analysis
7	Document Topic Clustering Analysis Based On Improved K-means Method
8	The Construction Of Condensed Semantic Tree Model And Its Application In The Analysis Of Video Key Frame Clustering
9	Accurate Marketing Of Credit Card Customers Based On AP Clustering Algorithm
10	Research On Domain Resource Clustering Based On Semantic Field Model And Its Application