Font Size: a A A

Research And Realization Of The Search Engine In The Field Of Education Based On C-LDA

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:F F LiFull Text:PDF
GTID:2428330545969477Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the development of Web resources have demonstrated explosive growth.It has brought immense and diverse information to people in various fields.Education,as one of the important field on the Internet,provides people with abundant learning resources.However,as the amount of data increases,some problems gradually show up:search engines are generally based on keyword-based inverted indexing algorithms and the coverage of search results is too extensive,as it contains a lot of ads and spam.It cannot offer satisfying search results based on the user's search intent.Therefore,it is of great practical significance to establish a vertical search engine that is oriented to the education field.It is more in line with user needs and the information quality is improved.This paper analyzes and discusses the LDA Topic Model and the search engine sort algorithm.Firstly,it proposes an LDA Optimal Topic Number Selection Method(C-LDA)based on the frequent words network.Then it designs a User Interest Improvement Model based on C-LDA and based on the above-mentioned algorithm,it builds a search engine system in the education domain,which allows users to search for more educational information that interests them.The main research contents of this paper is as follows:(1)For the problem that the LDA Topic Model cannot determine the optimal topic number,this paper proposes a method to specify the topic input number of LDA Topic Model by the number of communities in the frequent word-set network.This method constructs frequent word pairs for documents and builds a word co-occurrence network based on this.Then it uses an unsupervised community partitioning algorithm to divide the word co-occurrence network into communities,and finally uses the number of divided communities as the topic number of the LDA Topic Model.This method can specify the number of hidden topic numbers more accurately in LDA,improve the topic accuracy rate and recall rate,and reduce the topic confusion level.(2)This paper proposes a User Interest Improvement Model based on C-LDA for users to use search engines to search for information that interests them.Firstly,the implicit topics(interest)of user and course is calculated using the C-LDA Topic Model,and the similarity is calculated based on the interest distribution.The similarity is then used as the interest similarity between the user and the course,and a Lucene sort score is combined to obtain the final sorting score for the course.Compared with the traditional search engine algorithm,this algorithm has a higher interest accuracy rate and interest recall rate.(3)Based on the algorithms proposed in(1)and(2),this paper builds a set of search engine systems for education.In order to solve the issue of search engine data acquisition,this paper designs a distributed crawler system using HttpClient,Quartz and ActiveMQ technology.It proposes the using of the relational database Mysql for storage,and the using of Ajax technology and SpringMVC framework to design a complete set of search engine system solutions in education.Finally,the various functional modules of the search engine in the education domain based on C-LDA are implemented,and the algorithm proposed in this paper is applied to the system.At the same time,the algorithm is further verified by using the captured data.
Keywords/Search Tags:C-LDA, Lucene, SpringMVC, Frequent Word Network, User Interest Model
PDF Full Text Request
Related items