Development Of A Machine Learning Algorithm To Compare RAMS Standards Documents

Posted on:2019-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:L P i e r r e - V i c t o r

Full Text:PDF

GTID:2428330590951790

Subject:Management Science and Engineering

Abstract/Summary:

Language is a tool that humans make use of to communicate together.Even though it is common to all people,our knowledge and our culture directly influence the way we communicate with others,consequently different sentences can have the same meaning.The Natural Language Processing is the field of study that focuses on interactions between computers and languages.It has gained significant interest upon the last decades,with the rising importance of the informatic tools,as it just became easier and quicker to analyze fragments of text.More precisely,text comparison,is a key task to many applications such as machine translation,information retrieval and question answering among others.The main difficulty of this task is to ensure that the computer program fruitfully processes a text fragment or a large corpus to truly understand the meaning of a sentence.In this research work,we focus on the application of the Paraphrase Identification task(a binary judgement task which aims to classify pairs of sentences as paraphrases or non-paraphrases)to compare RAMS standards documents.Our approach deploys a large number of lexical,syntactic and semantic features.We examine the influence of these features on the performances of our models,in particular when combining them together to ensure a global understanding of the pairs of sentences.We then train two different kinds of models which make use of those attributes,a majority wins algorithm,and several machine learning classifiers(linear and non-linear).We find that the feature selection and the feature combination are the key steps to ensure good performances to the Paraphrase Identification task.Also,we conclude that although the empirical and more traditional method of the majority wins algorithm works pretty well,it is out passed by almost all machine learning classifiers implemented.By going through algorithm tuning of a Support Vector Classifier trained on our model,we achieve state-of-the-art results for the Paraphrase Identification task.

Keywords/Search Tags:

natural language processing, paraphrase identification, machine learning classifier, feature selection, RAMS standards

Related items

1	Research On Paraphrase Processing Methods Based On Neural Networks
2	Research On Feature Space Backdoor Attack Methods For Natural Language Processing Models
3	Research On Paraphrase Identification Method Based On Deep Semantic Understanding
4	Research On Machine Learning For Natural Language Processing And Transmission
5	Research On Controllable Paraphrase Generation
6	Research On The Technology And Key Problems Of Automatic Video Clip And Mixing Based On Natural Language Processing
7	Research On Patent Value Classification Prediction Model Based On Machine Learning
8	Research On Text Classification Based On Natural Language Processing And Machine Learning
9	Research On Fine-grained Chinese Paraphrase Extraction Technology Based On Deep Learning
10	The Design And Implementation Of Hidden Hazard Analysis System Based On Natural Language Processing