Research On Distributed Index Construction Method For Data Space

Posted on:2022-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:P Liu

Full Text:PDF

GTID:2518306353477334

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,the amount of data faced by the data management system is increasing.Traditional relational databases are gradually unable to meet the rapidly increasing amount of data.And data is often not composed of a single data source,but distributed in various data sources.The data format and semantic relationship between each data source are different.Users need to consume a lot of time and I/O resources to process the data,and cannot quickly obtain valuable information from multi-source heterogeneous data.In order to quickly adapt to this multi-source heterogeneous data environment,data space,a new data management model,can be used to solve current difficulties.Taking the personal data space management system as an example,users no longer need to pay attention to the underlying complex and changeable data formats and data semantic relationships,and can directly and efficiently obtain valuable information from the data.Inverted indexes are widely used in actual information retrieval systems,and how to use inverted indexes to quickly obtain valuable data from multi-source heterogeneous data in the data space is the focus of current index architecture research.This paper analyzes and studies a variety of index architectures,and proposes a distributed index architecture method based on query records.Mining the user's historical query records,clustering high-frequency search terms with a controllable size and load according to the user's query preferences,and dynamically assigning high-frequency words and cache copies to each process according to the different processing capabilities of each node After the index query strategy and query records are accumulated to a certain extent,the dynamic update adjustment strategy of the inverted index partition strategy ensures the load balance among the processor nodes of the distributed index system and improves the parallel retrieval capability.After dividing the inverted index into each processor node,as the amount of data in each processor node cumulatively increases,there are too many token words in each node,and the length of each inverted list is too long,which reduces query performance.Based on the research and analysis of the horizontal partition and vertical partition of the traditional inverted index,this paper proposes a hybrid partition index based on frequent pattern mining.This paper proposes a new data structure dynamic frequent pattern tree and the corresponding creation and update adjustment algorithm to improve the traditional frequent pattern mining algorithm FPgroup,which improves the occurrence of frequent itemsets and infrequent items when data is updated.The performance of the structure update caused by the set conversion.The appropriate token words are excavated from the dynamic frequent pattern tree for vertical division,and then horizontal division is performed on the basis of the vertical division to construct an inverted index of mixed division to improve the efficiency of the index.

Keywords/Search Tags:

Data space, Inverted index, Load balancing, Partition index

PDF Full Text Request

Related items

1	Research On Partition-based Inverted Index Compression Algorithm
2	Design And Implementation Of Multi-Keyword Parallel Ciphertext Retrieval System Based On Inverted Index
3	Research On Inverted List Parallel Query Method Based On Dataspaces
4	Space- And Time-efficient Compression And Intersection Algorithms For Inverted Index
5	Research On Inverted Index Compression Algorithms Based On Pattern Coding
6	Research On Key Technologies Of Full-text Index Compression In Cloud Environment
7	Some Research On On-Line Index For Dynamic Text
8	The Research Of Index Techonology Based On Semantic Web Document
9	The Design And Realization Of Image Set Compression Method Based On Inverted Index
10	Research And Implementation Of Inverted Index For Large-scale Visual Search