Font Size: a A A

Digital Library Information Search Key Technologies

Posted on:2011-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2208360305497636Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information Retrievel is one of the most important technique of the DigitalLibrary.We will discuss the key technique of DL about Information Retrievel in this paper.We first introduce the backdrop of the study,the basis of relevant technique theoretics,and the overall design of the full-text searching engine.The main content of the paper is about the detailed design and implementation of every part of the full-text searching engine.The first part is about using HTTP and Java thread technique to program Spiders.This module uses BFS(Breadth First Search) to search the hyperlink in the web pages,then uses SQL DBMS to store the task queue and uses JDBC technique to access the DBMS.The second part is about the implementation of the Indexer on the basis of Lucene's Chinese segment technique and its API.This module integrates the text mining technique of the HTMLParser and TextMining tools.It can deal with many types of files,including HTML,TXT,WORD and PDF.The third part is about the implementation of the Searcher.Its function including English Search,Chinese Exact Search,Multi-Key Word Search,Secondary Search and Relevant Search.This module can do automatic analysis with query string and the main content of the index files,and highlight the text of the query results.Paper ultimately achieved full-text searching engine can not use the vocabulary segmentation, facilitate accurate retrieval,and to improve the retrieval speed and accuracy.
Keywords/Search Tags:Bot, Spider, Full-text Search, Information Retrievel, Keywords Search, Lucene
PDF Full Text Request
Related items