Font Size: a A A

Research And Implementation Of Text Mining System Based On Vertical Search Engine

Posted on:2015-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ZhengFull Text:PDF
GTID:2208330428981148Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
General search engine offer users broad and wide search services for massive information; however, general search engine cannot meet the users’ needs in terms of searching for specific and profound information in any specific fields. To solve this problem, an increasing number of vertical search engines have been designed to meet people’s demand for information retrieval for specific areas. With the development and popularization of network technology, the amount of text information resources on the Internet is increasing rapidly. Hence, it is essential to discover the method of deriving high-quality information via text mining. Moreover, it is of great significance to carry out research on text mining in specific fields and make analysis of the results.This paper mainly discusses vertical search engine technology as well as how to apply text mining algorithms, and text clustering techniques to actual system. The paper includes three aspects as follows:(1)Propose vertical search engines based on Heritrix, Lucene and WebKit, and implement the information acquisition, preprocessing, indexing and retrieval for specific fields. In the key technologies for web crawler, using WebKit to resolve dynamic webpages and extract structured information.(2)Introduce various clustering algorithms for text mining, and propose an improved single-pass clustering algorithm based on analysis and research. This approach combines the idea of hierarchical clustering. The first step is to generate initial cluster, and then complete by single-pass algorithm clustering. This algorithm was analyzed and the results showed that improved algorithm for clustering precision increased by10%, recall increased by12%and F1-measure increased by11%.(3)Provide details of the design and implementation of text mining system based on vertical search engine. In the aspect of system design, the text mining system mainly consists of four parts: text acquisition module, text preprocessing module, text mining module, and text service module. In the aspect of system implementation, give the overall system diagram and the implementation process of each module. The system implements text mining on mobile phone reviews, and vertical search on mobile phone evaluation information.
Keywords/Search Tags:Vertical search engine, Text mining, Lucene, Heritrix, Single-passalgorithm
PDF Full Text Request
Related items