Font Size: a A A

The Key Technologies Of Search Engines And Implementation

Posted on:2009-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z G WangFull Text:PDF
GTID:2208360272459190Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer industry, There is increasing information stored into computer storage devices. Bascally, These data can be classified into two main categroies, and they are structured data and unstructured data. The examples of structured data are enterprise financial accounts, producing data, student score data and so on. The unstructured data contains some text data, image data and sound data, etc. According to statistical analysis, unstructured data occupies more than 80% of the whole amount of information in the world. As for structured data,it is a best way to use RDBMS to maintain it. but RDBMS has some birth defects when it is used to manage a great amount of unstructured data, especially, the answering time to is unbearable when users query these unstructured data. The reason to cause this flaw exists in RDBMS'S understratum structure. Through full-text retrieval technology, we can manage these unstructured data efficiently. By the development of these years, the full-text retrieval technology evolves to be a powful software which integratedly manages unstructured data which ranges from the primitive strings to new unstructured data, such as hug text, voice, images, active movies, etc.Essentially, the search engine technology is a major application of the full-text retrieval technology. Currently, The use of search engine has become the second most population on internet after E-mail system. Search engine comes from traditional full-text retrieval theory.it is that a designed computer procedure builds the words index information and stores them into a inverted file by scaning each word in each page. then, the search procedure check the inverted file to found the pages which match the keyword, rank the matching pages according to the frequency and probability that the keyword appears in each matching pages, and outputs the sorted results. The full-text retrieval technology is the core supporting techonlogy of search engine.Based on an excellent full-text retrieval model-IRST(Inter-Relevant Successive Tree), in this thesis. we research the combination between Inter-Relevant Successive Tree mode and search engine technology and the implementation of key technologies about search engine. In the process, we mainly focus on three topics which are match-degree computing, the associating query between search engine and RDBMS and rank technology. we propose two unified formulas to compute match-degree which not only concise the process of caculating , but also concern all possible cases of matching. By importing the concept and technology of memory database, we sucessfully implement the association query between search engine and RDBMS, which make the users get their real need more effciently, conviently, and quickly. In the end, we propose and implement a dynamic partition and multi-values sorting algorithm, which improves the sorting efficiency by reducing the unnecessary operations of sorting, just extracts the needing page data and ranks the page data. The combination of Inter-Relevant Successive Tree mode and search engine technology makes the Inter-Relevant Successive Tree mode to be a new kind of method and theory in search field.
Keywords/Search Tags:Search Engine, Full-Text Retrieval, IRST(Inter-Relevant Successive Tree)
PDF Full Text Request
Related items