Font Size: a A A

Issues On The Query Processing Of Corpora Based On Relational Model

Posted on:2016-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J LiuFull Text:PDF
GTID:1108330503954918Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of natural language processing technologies and descriptive linguistic studies in the areas of Information Science and Social Science, research institutions, enterprises, organizations and even individuals have developed, compiled and accumulated a large quantity of language corpora. These corpora, as precious repositories, are the major resources that support the data analysis in language teaching, translation studies and language acquisition as well as the major approaches for the improvement of natural language processing methods. Studies focus on corpus has already evolved to a brand new paradigm—Corpus Linguistics. Hence, the approach of managing and extracting information from corpus is become a critical issue and the modeling and query processing of corpus data is aroused as a challenging task.This dissertation implement a systemic study on the major issues of data modeling, basic query problem and its query processing, keyword query problem and its query processing, sentence-based semantic query problem and its query processing and corpus query system architecture and its prototype system development. The complete research is outlined below:Firstly, in accord with the conceptual definition of corpus, a formalized definition is described. As a combination of both the formalized definition and relational model, a logical data model D-corpus for corpus data is proposed, and the completeness of the model is also proved.Secondly, according to the D-corpus model and the analysis of search semantics based on traditional KWIC output, basic query problems are formally defined and the data complexities are analyzed. In terms of these query problems, a corpus query oriented algebra operations including selection, projection, join, union, difference, cartesian product, renaming and recursion are described, based on which, a non-recursive query processing algorithm and a recursive query processing algorithm is proposed. The experimental result reports the effectiveness and efficiency of the research findings.Thirdly, a novel keyword query problem and its query processing method over the corpus are proposed. Traditional approaches for the keyword search over relational database can not answer the user keyword queries containing the recursive semantics. Thus, a new data graph model describing the inner-relation tuple joining connections and a ranking strategy supporting the recursive semantics as well as a dynamic strategy based tuple tree enumeration algorithm are proposed. The experimental results also validated the effectiveness and efficiency of the proposed method.Fourthly, a sentence-based semantic problem and its query processing method are proposed. Existing solutions encountered some deficiencies in the effectiveness of extracting semantically similar sentences from the corpus. To solve the problem, this paper first proposes a semantic similarity measurement of the set of dependency relations based on the Word Net. Secondly, an effective dependence structure similarity measuring method is proposed. Thirdly, a sentence similarity measurement is proposed by combining both semantic similarity and syntactic similarity. Fourthly, a query processing algorithm is proposed based on the sentence similarity measuring methods and an experiment is conduct to validate the effectiveness of the methods proposed in this paper.Finally, the architecture of the corpus query system is presented. A prototype corpus query systems R-CQS is developed to test the validity of the methods proposed in this dissertation.
Keywords/Search Tags:Corpus, Relational Model, Query Processing, Keyword Query, Semantic Query
PDF Full Text Request
Related items