| As the big data market rapidly grows in size and value,the combination of multiple data sources for joint querying,analysis,and modeling has brought better analytical results and higher economic benefits to many data applications.However,privacy issues have become one of the biggest obstacles to joint querying due to conflicts of interest between entities and privacy laws and regulations.Therefore,it is imperative to design efficient security protocols for different joint querying applications.Today,a vast amount of structured data from enterprises,social organizations,and governments is stored in databases,and many applications such as advertising recommendations and credit investigations hope to integrate multiple databases and provide a unified query interface.However,existing database joint querying systems suffer from low-security guarantees,high communication/computation complexity of security protocols,and difficulties in querying complex statements,making it difficult to apply them in large-scale data scenarios.Additionally,the queries proposed by requesters are diverse and constantly increasing over time,while applications such as medical research require a joint querying system with minimal latency and minimal overhead.However,existing methods face performance bottlenecks due to the heavy computation and communication required by underlying secure computing technologies,as well as difficulties in storing and utilizing intermediate results during the querying process,which make it challenging to meet the requirement of rapid response.Finally,with the increase in the number and performance of intelligent devices,unstructured data is growing rapidly,and the demand for activity-level privacy matching and querying of unstructured data is also increasing.However,due to the richness and diversity of content and the high correlation of unstructured data,it is difficult to construct a unified representation and implement cross-modal querying.Therefore,this work focuses on designing efficient secure querying protocols for different application scenarios and requirements.Specifically,the main contributions and innovations of this work are summarized as follows:1.This dissertation proposes a highly available and maliciously secure multisource database joint querying scheme.We design secure protocols with lower computational/communication complexity for SQL operators such as join and group-byaggregation and support the automatic generation and optimization of secure execution plans for complex SQL statements.Compared with the general join protocol in the benchmark work with a communication cost of O(n2),the communication cost of the proposed general join protocol in this dissertation is O(n log2 n+m),where m is the upper bound of the result size of join operator,which is usually of the same order of magnitude as the input data size n.Furthermore,the communication cost of the primary-foreign key join protocol can be reduced to O(n log2 n).To address the issue of high cost in oblivious sorting among many SQL operator protocols,this dissertation designs protocols such as oblivious permutation and oblivious distribution protocol,which can replace expensive sorting operations or complete data reshuffling at a lower cost.To address the problem of high round complexity in group traversal operations,inspired by the parallel prefix network,this dissertation designs secure protocols with a communication round of O(log n)and a communication cost of O(n).Furthermore,this dissertation designs secure protocols with a round complexity of O(1)for the sum and count functions in group-by-aggregation.Based on cost functions and execution plan optimization strategies,we can generate the minimum cost secure plan for complex query statements,complete the query requests of complex SQL statements by sequentially invoking secure protocols for different SQL operators,and ensure that no party can obtain information beyond its prior knowledge during the query process.2.This dissertation proposes a low-latency approach for multi-source database joint queries.It designs a connection-grouping-aggregation secure algorithm for complex conditional query statements,which can generate reusable shared mapping and quickly respond to query requests with less online execution time.In applications with multiple data sources,query statements are often repeatedly requested by the requester,and the requester hopes to receive the query result quickly.We take the most typical application,the medical database federated querying,as the starting point and propose a powerful conditional query paradigm for complex medical queries involving kinship relationships.We also design a join-group-aggregation secure algorithm to efficiently obtain query results.For complex conditional query requests,the algorithm proposes two optimization points that greatly reduce the computation/communication overhead of the security protocol:(1)transforming the foreign key join operation into a primary key join operation;(2)saving and reusing the association mappings between shared input tables.This dissertation designs a constant-round primary key join protocol that satisfies malicious security and then designs a mapping association establishment protocol on top of it,which can obtain shared mappings based on primary keys with O(n)communication overhead and O(1)communication rounds.The algorithm restructures and sorts the input shared tables during the initialization phase,and establishes and saves shared mappings.During the querying phase,the algorithm generates topology and corresponding secure execution plans for specific conditional queries.If the number of nodes in the corresponding topology graph for a query statement is v,the system can obtain query results with O(vn)communication overhead.Experimental results verify the efficiency and low-latency performance of our system.3.This dissertation proposes an activity-level unstructured data privacy query approach.It establishes a unified expression paradigm for interrelated multimodal data through semantic graphs and designs two privacy query protocols with different security-level.Unstructured data,as a carrier of rich information,is generated in various social activities,and the contents are often interrelated.Classifying and querying data at the activity level has become an important requirement.To address the problems of diverse data modalities,difficulty in expressing social activities,and difficulty in cross-modal querying,this dissertation designs a plaintext multimodal aggregation algorithm that can accurately aggregate multimodal data generated in the same activity,and a mechanism for handling conflict semantic tags.This dissertation proposes an activity-semantic graph as a unified description form for activities,which unifies multimodal data at the semantic level and supports cross-modal querying at the activity level through similarity calculations between semantic graphs.We design private data querying protocols at different security levels.The low-security level protocol can efficiently complete data queries under the premise of only leaking semantic tags known to both parties,while the high-security level protocol can further protect the semantic tags of user data at a slightly higher cost of computation/communication overhead. |