Font Size: a A A

Research On Machine Learning Enhanced Query Techniques

Posted on:2022-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W CaiFull Text:PDF
GTID:2518306611495674Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Processing queries plays a vital role in the application of database.It is a basic prob-lem in many fields such as information retrieval,an important research direction of data analysis and application,and is also used in information analysis and mining.In different application scenarios,the query types are different.For example,multi-set query refers to finding the set where the target data is located among multiple sets,and k-nearest neigh-bor query refers to finding the k data points closest to the query point.The results of data queries can be applied to other stages of data processing.Due to its important signifi-cance in data application and management,a large number of researchers have carried out research on indexing technology to support big data query.Although the existing index technology can effectively solve many big data queries,it still has shortcomings,that is,it cannot satisfy all aspects of a good index at the same time:query efficiency,precision and low storage,and it cannot be well applied to high-dimensional data.With the development of artificial intelligence in recent years,the combination of data structures and machine learning(ML)has opened up a new research direction called learning-based indexing techniques.The core idea is to treat the index as a”model”that predicts the”location”of a related data object,i.e.to simulate the CDF function through a machine learning model such as a deep neural network.Once the model is trained,the ex-ecution time is usually negligible in practice,and the space is greatly reduced,allowing it to fit into memory to reduce I/O cost.It has been demonstrated that learning-based index-ing techniques can outperform traditional indexing techniques by exploiting the pattern features of data distribution.Inspired by the recent learning index,this paper takes big data as the application background,and aims to improve the accuracy,reduce the query time and index space,and study the learning-based index enhancement technology.The main contributions of this paper are as follows:(1)This paper proposes a new indexing framework,LMQF,and studies how to use learning-based network models to improve the performance of traditional multi-set query processing.The key idea is to train a network model to predict which set contains the query data item as a classification problem.To ensure accurate query results,we combine the learning-based network model with standard bloom filters and precise lookup indexes to capture data items that the learning-based model cannot correctly identify.Theoreti-cal proof and experimental results show that our algorithm uses less memory compared to the current state-of-the-art algorithms with comparable speed while achieving 100%accuracy.(2)This paper proposes a general index augmentation framework for solving the k-nearest neighbor problem:HKC+-index,it is the first indexing framework to solve the k-nearest neighbor problem using a learning-based approach.The method first builds a traditional tree-based index and uses it for query processing.The training data is obtained through the original tree structure,and the convolutional neural network is trained.A large number of experiments on various real high-dimensional data sets show that HKC+-index can improve the running time by 6 times compared with the traditional tree index while maintaining high precision,and the index size is smaller than the traditional index structure 8 times.(3)This paper presents a method to improve index structure based on learning,that is,to further improve the structure of the network model.This method uses the original data input features more fully,and at the same time,for the judgment of the final category,we adopt a multi-level classification method.Such a model structure reduces the memory space by at least half on the basis of the original space size.
Keywords/Search Tags:Processing queries, multi-set query, high-dimensional data, k-nearest neighbor, neural network
PDF Full Text Request
Related items