Font Size: a A A

The Research And Application Of Segmentation And Sorting In Vertical Search Engine

Posted on:2015-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2268330428976508Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network information technology, today’s society has been into the Information Age. Each field of the data and information has increased dramatically, massive data enrich people’s lives but also increase the user’s time of screening information. How to dig the potential valuable information from the disorganized and strong interference massive data, which is human intelligence information processing capability, presents unprecedented challenges. In some professional field, the general information coverage and retrieval precision of search engine are constantly falling, and users to obtain more accurate and more detailed professional information demand is growing.Faced with these challenges, domain-specific and personalized information retrieval vertical search engines emerged. Based on Lucene’s vertical search engine has become the current search engine and Web data mining in a hotspot and difficulty, the expanded of this paper is to study the hot and difficult technology.First of all, the research progress and current status of vertical search engines were analyzed, and introduces the composition of vertical search engines and described its principle of work; overview of the full-text search engine Lucene related technologies; including Lucene framework structure, indexing and search mechanisms, and compared the Lucene indexing and database indexes.Second, a common word segmentation system for the field effect is not ideal this problem, to study the vertical search engine Chinese word segmentation algorithm; Analysis of the characteristics of the domain vocabulary of books, this paper proposes a mechanism for double-word hash dictionary with words long, and on the basis of this mechanism to improve the forward maximum matching word segmentation algorithm.Third, aiming at the sorting technology of Lucene only focus on the web page content and ignore the importance of web page itself, this issue study based on linked pages sorting algorithm; Considering the importance of books and pages of data characteristics on the basis of improved based on PageRank lucene sorting algorithm.Fourth, on the basis of word segmentation and sorting improvement, design and implement a vertical search engine system in book information, realize the function mainly includes web crawl, web information extraction, the establishment of the index and query interface, etc. And validated by comparing the results to better segmentation and sorting.The last, I summarized the main contents of this paper, points out the problems of the system, as well as prospects for the future.
Keywords/Search Tags:Vertical Search Engine, Full Text Search Engine Lucene, Field Of WordSegmentation, Results the Sorting
PDF Full Text Request
Related items