Font Size: a A A

Research And Implementation Of Vast Amounts Of Heterogeneous Data Search

Posted on:2014-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:K HuFull Text:PDF
GTID:2248330398970742Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of information and communication technology and Internet technology, the information content of the Internet explosion of growth, how to search out the information the user needs from these vast amounts of data quickly, ready to search out the information the user needs has become the concern of the intersection of a number of companies and research institutions. The launch of the search engine technology, greatly accelerates user retrieval efficiency, shorten time. Relying on the National Eleventh Five-Year Technology Support Program "A safe and reliable reproductive health carrier-grade Operations Support System key technology research" eugenic knowledge-based systems, discusses the background and system development at home and abroad heterogeneous data search engine technology development status Search eugenic knowledge-based systems of heterogeneous data the existing insufficient to transform based. We call the new system "the heterogeneous eugenic knowledge-based system".This paper’s main work is reflected in the following aspects:1. Performed a unified text processing to heterogeneous data document, inspected the use of open source tools, executed a unified format conversion to different types of documents (PDF, WORD, XML).2. Conducted research on a variety of Chinese word segmentation tool and performed performance testing to identify a tool which is suitable to the heterogeneous eugenic knowledge base search system of Chinese word and thus make improvements.3. Ranking search results of Heterogeneous data documents and improved PageRank algorithm, and applied to the sorting of the results of this system.4. Coding and testing the heterogeneous eugenic knowledge-based system. The main contribution of the paper is to solve the following defects of the original eugenic knowledge base system:1. Data source entirely from a single relational database.2. The segmentation and parsing of user’s query keyword are imperfect.3. Index maintenance is difficult and can not establish incremental indexing and delete indexing.4. Search results are not sorted, the user can not get the most important results in a timely manner. Finally, after performed test to the heterogeneous eugenic knowledge-based system, it can be found that the system can complete retrieval of vast amounts of heterogeneous data.
Keywords/Search Tags:Search Engine, Heterogeneous data, Index maintenance, No Links documentation Sort
PDF Full Text Request
Related items