Font Size: a A A

Design And Implementation Of Search Engine In Digital Library Of A University

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:L H DingFull Text:PDF
GTID:2308330473450515Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the developm ent of the com puter science and intern et technology, library transforms from traditional to digtal. Infor mation which was printed in the books in the past is stored in a digital form in the hard diskes and linked to internet. People can get the digital books conviently in anyplace and any time instead of finding the books in bookshelves before reading. One of the m ost important technique of the Digital Library is Information Retrievel. In this thesis, the key technique of Inform ation Retrievel i s discussed. After that a searching engine of a university is designed and implemented.After the introduction of the basis of rele vant technique theoretics, this thesis measures the needs of the system based on the facts of the digital library of a university. The most important content of this paper is about the construction of the full-text searching engine, the detailed design of every part and the working process of the system. Programming Spiders according to HTTP with Java thread technique is discussed in the first part. In this m odule, the hyperlinks in the web pages are searched on the rules of BFS(Breadth First Seareh). The task queue is stored in the SQL DBMS, the DBMS could be accessed and editted with JDBC technique. Lucene’ s Chinese segment technique and its API are used to achieve the Indexer in the second part. The textmining technique of the HTMLParser and Text Mining tools that deal with m any types of files, such as HTML, TXT, WORD, PDF and so on are involved in this module. The implementation of the Searcher and the us er interface is discussed in the third part. The functions such as English key word searching, Chinese key w ord searching, secondary searching, relevant searching, a nd Multi-Key Words searching are supplied by the Searcher. At the same time, searched query string is analysed autom atically by the searcher, then they are highlighted in the text.At last, the system testing is perform ed, hence the ef fectiveness of the design is verified. The result of the test ing indicated that full-text searching engine can facilitate accurate retrieval without the vocabulary segmentation, the retrieval comprehensiveness and accuracy are improved.
Keywords/Search Tags:spider, Full-text Search, Information Retrievel, Lucene
PDF Full Text Request
Related items