Font Size: a A A

Development Of A Machine Learning Algorithm To Compare RAMS Standards Documents

Posted on:2019-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:L P i e r r e - V i c t o r Full Text:PDF
GTID:2428330590951790Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Language is a tool that humans make use of to communicate together.Even though it is common to all people,our knowledge and our culture directly influence the way we communicate with others,consequently different sentences can have the same meaning.The Natural Language Processing is the field of study that focuses on interactions between computers and languages.It has gained significant interest upon the last decades,with the rising importance of the informatic tools,as it just became easier and quicker to analyze fragments of text.More precisely,text comparison,is a key task to many applications such as machine translation,information retrieval and question answering among others.The main difficulty of this task is to ensure that the computer program fruitfully processes a text fragment or a large corpus to truly understand the meaning of a sentence.In this research work,we focus on the application of the Paraphrase Identification task(a binary judgement task which aims to classify pairs of sentences as paraphrases or non-paraphrases)to compare RAMS standards documents.Our approach deploys a large number of lexical,syntactic and semantic features.We examine the influence of these features on the performances of our models,in particular when combining them together to ensure a global understanding of the pairs of sentences.We then train two different kinds of models which make use of those attributes,a majority wins algorithm,and several machine learning classifiers(linear and non-linear).We find that the feature selection and the feature combination are the key steps to ensure good performances to the Paraphrase Identification task.Also,we conclude that although the empirical and more traditional method of the majority wins algorithm works pretty well,it is out passed by almost all machine learning classifiers implemented.By going through algorithm tuning of a Support Vector Classifier trained on our model,we achieve state-of-the-art results for the Paraphrase Identification task.
Keywords/Search Tags:natural language processing, paraphrase identification, machine learning classifier, feature selection, RAMS standards
PDF Full Text Request
Related items