Font Size: a A A

The Key Technology Of RFC's Rule Extraction Based On NLP

Posted on:2020-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:W Z HanFull Text:PDF
GTID:2428330602951048Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the application on the Internet and the encryption technology develops quickly,encrypted secure communication has become the main method of sending messages on the Internet.The certificate validation system which can ensure the encrypted communication has been applied to the internet application.However,the structure of the certificate used for encrypting is very complex,and the atrributes of the certicate have many restrictions with each other.It's difficult for the exiting system of validating encrypted certificate to implement all the restrictions,thus,leading to the illegal certificate invasion.In order to ensure the security of users' obtaining online certificates,this paper has designed and implemented a tool(RFCcert NLP)for validating the certificate validation system.The tool has made full use of the benefits of NLP technology.In more concrete terms,RFCcert NLP uses the tokenize,sentence splitting and part of speech tagging of NLP to solve the problem of unstructured RFC text data which the standard defines;And uses relation extraction model to make the process of the information extraction more intelligent.In the procedure of data pretreatment,this paper use the RFC texts as the input text,and design an algorithm to extract sentences of the unstructured RFC texts.The NLP technology has been applied to the following three aspects: erase the pages' headers,the pages' footers and the useless information;using the tokenize tech and the sentence splitting tech of NLP to get the integrated sentence;using the part of speech tagging tech of NLP to classify the sentences.In the procedure of information extraction,this paper has designed a <degree,conditionclass,condition-value,result-class,result-value> quintuple to define the RFC rules and build an end-to-end model to extract the quintuple from the sentences from RFC text.At the same time of relation extraction,the pronoun will be replaced by the attribute's name which is produced by the anaphora resolution tech of NLP in order to improve the availability of the extracted information.In the end,this tool uses Dropout to solve the problem of overfitting.According to the above design,this paper has implemented the RFC rules extraction based on the NLP technology,and do three comparative experiments including the performance experiment of extracting information algorithm,the effectiveness experiment of the machine learning model,and the ability of finding bugs.As the testing result shows,the tool has improved the exiting methods a lot.In the aspect of rule sentence extraction,the algorithm of this paper outperforms other tools a lot in the test of time complexity and space complexity.In the aspect of rule extraction,the machine learning model of this paper makes advances in precision,recall and F1 score.In the aspect of validating the certificate validation system,RFCcert NLP uses 32 rules to produce 89 certificates.Compared with the same 21 rules with RFCcert,RFCcert NLP finds 38 bugs but RFCcert only finds 24 bugs.In general,the tool shows effectiveness overall and successfully find more bugs of existing certificate validation system.
Keywords/Search Tags:Certificate Validation, X.509 Certificate, Information Extraction, Rule Extraction, RFC Text
PDF Full Text Request
Related items