Font Size: a A A

The Research And Implementation Of Educational Resource Oriented Search Engine

Posted on:2016-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2308330473455637Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of information age and the rapid development of cyber resources, the internet is an important access to educational information for educators and learners, providing them with various educational resources. With the increasing amount of data, the retrieval results always brings about useless information when using traditional search engines owing to the wide searching area. However, vertical search engines, orient a specific domain and users, which can provide more accurate information in retrieval service. In this thesis, how to find the educational resources more effectively is the main problem to be solved.Based on the retrieval demand of educational resource, this thesis introduces the general search engines and the vertical search engines and compare both of them, and studies system architecture, working principle, key technology, working process and other related theories and technologies about vertical search engines.Then based on the requirements analysis which includes objective, subject scope, foreground and background business requirements, this thesis designs the structure and procedure of the system, and divides the system into modules and gives out their detail designs.This thesis implements and integrations an educational resource oriented vertical search engine by the expansion of Heritix, HTMLParser and Lucene. The main contents include: 1. Crawling webpages by Heritix, extending and customizing it to realize the URL filtering then downloading them to local; 2. Extracting the contents of webpages by regular expressions and HTMLParser, forming topic words by processing sample pages, then realizing the content filtering by vector space model; 3. Using Lucene to index and search, optimizing the indexing speed and the order of search results, improving the weight of the title and the weight of the document which has a higher relative degree with the topic. 4. Realizing fulltext retrieval of website database resources by building and optimizing index, maintaining the consistency of database and index. 5. By testing, the system has met the needs. The system provides educational resource oriented retrieval service to the subject which the system is part of.
Keywords/Search Tags:Educational resource, Vertical search engine, Fulltext retrieval, Crawler
PDF Full Text Request
Related items