Font Size: a A A

The Research Of Mass Text Management System Based On P2P

Posted on:2015-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J J FengFull Text:PDF
GTID:2268330431956827Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, the data that people need to process and store grows explosively. The market research from IDC shows, from2006to2011, the global amount of data had increased from200EB to2ZB, about10-fold in5years. By2015the data amount will be more than8ZB and up to40ZB in2020.In the vast amounts of data, text data forms a large proportion. With the extensive application of web2.0and the popularity of social networking, the amount of text data is growing rapidly. How to store and retrieval of massive unstructured text data has become an important topic in the data management field.Mass data storage and sharing which uses P2P has been a hot point in data storage, and be considered one of the most promising applications of P2P technology. Due to peering technology, P2P storage systems comparing with traditional storage systems have the following advantages:Firstly, independent of a special node, the system itself has better expansibility, and doesn’t exist performance bottlenecks of single points; Secondly, Each node has the equal function, so that the whole system can still work after missing any node and high fault tolerance; Thirdly, High scalability and high fault tolerance, which makes it building large-scale high-performance storage service using low-cost machines possible. Lastly, Without central control, P2P storage system can greatly reduce the total cost of the storage system, each node increases the transmission speed greatly by taking advantage of the boundary bandwidth of the network. In the P2P data management system, how to locate the data fast and maintain the index with low network overhead at the same time is the most important problem which needs to be solved.Therefore, the paper focus on the text storage and location in retail. It first makes an overview and analysis of the existing implementations of several P2P storage mode, and a comprehensive introduction to the relevant background knowledge of P2P networks, such as full-text indexing, Bloom filter, etc. Then it presents a Double-Chord Text Storage Model (DCTSM) and a Text Retrieval Model based on Counting Bloom Filter. It elaborates mathematical description, the structure and composition and the relative algorithms of the two models. The result of simulation shows thatthe text data management model based on P2P design in the paper have high scalability and fault tolerance, in addition, a smaller index maintenance overhead and high data retrieval efficiency comparing with other data management model.Finally, the paper designs a Mass Text Management System based on DCTSM and CBFTRM, describing its software structure and data process.The main work in the future will focus on how to sort the search results by the semantic of text and how to manage the data access privileges.
Keywords/Search Tags:structured P2P, Data Management system, Double chord, CountingBloom Filter, Search tree
PDF Full Text Request
Related items