Font Size: a A A

Study On Hadoop-based Inverted Index

Posted on:2012-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:C C DongFull Text:PDF
GTID:2178330338953827Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the arrival of information age, a lot of new data are generated every day, and IT industry pay close attention to how to deal with such a vast amount of data effectively. Inverted index is a tool which can realize rapid retrieval, it is most commonly used in search engine information retrieval system currently. Meanwhile, as a distributed system platform, Hadoop with the strongest distributed storage and computing power is most commonly used at present. So, we will put huge amounts of data on the distributed platform for storage and processing, and the relevant inverted index files are also run on it. Therefore, study on Hadoop-based inverted index has a far-reaching significance. This paper faced Hadoop system platform, studied the technology of Inversed index, specific as follows:1 ) Combined with the character that HDFS file system only support write-once-read-many but do not support any position's amendment, this paper introduced the design of MIIS(Multi-level Inverted Index Structure), whose main idea is to put inversed index files into HDFS. Maintaining the inversed index in multi-level to support updating document in batch can improve the speed of querying.2) Considering the relevance of documents and its inversed index, this paper puts forward a strategy named AMPS(Align and Merge Placement Strategy),which can reduce the cost of communication between nodes and achieve the goal that locate the document quick in local when searching keywords.3) Combined with the MIIS and AMPS, this paper designed the algorithm of constructing inverted index, adding and deleting inversed index in batch and searching inversed index, which made the inversed index in Hadoop have a better application.4) Build Hadoop cluster to make a test, it verified the MIIS and AMPS, did raise the efficiency of finding keywords and locating documents, and did reduce the cost of communication between nodes and accelerate the speed of updating in batch.
Keywords/Search Tags:Hadoop, Inversed Index, MIIS, AMPS, multi-level index
PDF Full Text Request
Related items