Identification Of Regular Expressions Based On Continuous Repeated Substring Left Alignment

Posted on:2020-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:G Lin

Full Text:PDF

GTID:2415330590963047

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The inference of formal language is devoted to the study of how to derive the definition of the language from the limited information.Regular language is a class of formal language widely used.The inference algorithm for regular expressions is widely used in gene sequence identification,XML schema inference,graph database path query learning,information extraction,etc.Therefore,research on the inference algorithm for regular expressions not only has important theoretical significance,but also has great practical application value.Identification in the limit is a classical model of language learning.Studying regular expressions inference algorithms in this model ensures the good characteristics of the algorithm.This paper studies regular expressions learning algorithms based on this model.The contributions are listed as follows.(1)This paper has proposed a learning framework of inference regular expressions based on continuous repeated substring detection.Firstly,the framework seeks for the longest continuous repeated substring.Secondly,the framework blocks based on continuous repeated substring.Thirdly,the framework matches the blocks leftmost.Finally,the framework generalizes the results to regular expressions.Studying regular expressions learning algorithms in this framework can identify a class of regular expressions for which unary operator can act on multiplied characters.This overcomes the limitation of most existing algorithms.(2)This paper has proposed two generalization strategies and implemented two algorithms for standard regular expressions and regular expressions with counting respectively.This paper has analyzed the subclass of standard regular expressions and the subclass of regular expressions with counting that can be identified from the two algorithms.This paper also has provided the construction method of characteristic sample for each algorithm.In order to verify the theoretical analysis results,this paper has developed a set of tools for the generation of expressions and their characteristic sample.The experimental results verified the correctness of the theoretical analysis.This paper has showed the advantages of the algorithms by comparing to the existing algorithm.(3)This paper has discussed the application of the algorithm in graph database query learning and analyzed that for a class of reachability query,the path constraint which is defined by regular expressions is consistent with the subclass of regular expressions with counting as proposed in this paper.This paper has discussed the general steps of reachability query learning and investigated the application of the algorithm in reachability query learning by examples.

Keywords/Search Tags:

regular expressions, learning, identification in the limit, continuous repeated substring, reachability query

PDF Full Text Request

Related items

1	Web-based Study On New Words And Expressions
2	Research On Catastrophic Forgetting In Continuous Learning Based On Evolutionary Strategy
3	Chinese - Thai Translation And Extension Of Cross - Language Query
4	The Research Of The Limit Scope Adverb
5	The Research On The New Flctlonal Documentary
6	The Influence Of Repeated Reading On New Words In Different Contexts
7	Limit The Field Of Basic Declarative Sentence Parsing
8	The Novel Word Learning During Repeated Picture-book Reading For Chinese Children:Effect Of Phonetic Regularity
9	Effects Of Repeated Study And Reward On Judgement Of Learning And Accuracy
10	Exploring The Relationship Between Implicit And Explicit Memory By Event-related Potential (ERP) Effects Of Repeated Learning