Font Size: a A A

Research On Fast Text Retrieval Methods And Optimization Of Engineering Realization

Posted on:2019-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:X N ZhuFull Text:PDF
GTID:2438330563457634Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
At present,search engine is an important part of the Internet and is also a hot research topic in the field of intelligent information processing.However,with the development of the information age,the size of the data is also explosive growth,and the unstructured information is getting larger and larger.Therefore,the mass scale and unstructured information become two characteristics of the network,how to quickly retrieve the required information from a large number of unstructured data has become the key issue of this study.Information Retrieval usually refers to the retrieval of text information,including the establishment,storage,organization,representation,query and access of information,and its core is the index structure and retrieval model of text information.Information retrieval techniques include: the establishment of inverted index,storage,information retrieval model and method,sorting and so on.Chinese information retrieval also involves the segmentation technology.According to the research of Chinese information retrieval technology,the content of the article can be divided into the following parts.First of all,in this article,a large number of ancient literature retrieval is taken as the research object,which describes in detail the organizational structure of the text retrieval index,How it works,the data structures stored in the index,and algorithms for index construction.An efficient inverted file storage structure based on single-character index and its retrieval method are proposed.According to actual needs,this article uses a hierarchical index retrieval mechanism to design and implement a three-level index,the first index is the character to locate the specific coordinate position in the text,the second index is to locate the character in the text document containing the character,The third level index is an index for indexing the first level index,including the byte position of the character and the length of the characters to be intercepted,and carries out the search for the input keywords at different levels of the index.Finally,the comprehensive results of the experimental system test are given.Finally,the experiment uses the ancient literature text 2.1 billion characters as the research object,to improve inverted index construction algorithm,at the same time it designs and implement the inverted index structure which based on single character and used to ancient literature retrieval.This idea improves the retrieval mechanism of information,and establishes a hierarchical index as an effective retrieval mechanism.so a large number of unstructured text retrieval problems can be solved.In the end,experimental system tests are carried out to verify the feasibility of the design and algorithm of the experimental system.
Keywords/Search Tags:information retrieval, inverted index, retrieval, hierarchical index
PDF Full Text Request
Related items