Font Size: a A A

Personalized Vertical Search Engine For Basic Education Resource

Posted on:2015-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WanFull Text:PDF
GTID:2298330452453144Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present, with the rapid development of the Internet and the exponentialgrowth of data resources, it becomes very difficult for the users to find theinformation they need from the Internet, thus we need a technique to solve thisproblem. The emergence and development of search engine technology allows usersto search the information and resources on the Internet they need more convenient andeasier. At present, the most common search engines are based on keyword matching,and does not take full use of the information of the individual users, so the searchresults that the user obtained are not entirely necessary, requiring the user to spendextra effort to filter unwanted information. Learning from the idea of personalizedrecommendation system, personalization technologies will be applied in the field ofvertical search engine, which allows users to be more efficient in the professionalfield, more accurately find the resources that they need to have a better searchexperience.Firstly, this thesis study the theory of search engine, and then focuses onpersonalized search engine technology in key thematic networks crawling、webinformation extraction technology、user interest model. The thesis also use the userinterest model to improve the sorting algorithm in Lucene. Finally, the thesis design avertical search engine personalization system model and apply it to the field of basiceducation.This paper focuses on are:(1) Analysis the open source web crawler Heritrix, and expand its application inthe basis of field-based analysis and thesaurus link crawling strategy to design themeweb crawler models.(2) Analysis the open source search tools Lucene, including its architecture, theindex structure, data flow, the structure, and function. Focus on Lucene sortingalgorithm, improve Lucene sorting algorithm based on personalized information ofuser interest model, and design retrieval model.(3) Extract relevant information through research techniques, such as regularexpressions, open-source toolkit HTMLParser. Combine Web data with the actualneeds to design an information extraction model.(4) Study the theoretical of user interest modeling, design build user interestmodel by mining user behavior on the use of educational resources algorithm.
Keywords/Search Tags:Personalized vertical search engine, Lucene, Heritrix, User interest model
PDF Full Text Request
Related items