Font Size: a A A

On The Application Of Tolerance Rough Sets In Natural Language Processing

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:H H JiangFull Text:PDF
GTID:2428330614458629Subject:Systems Science
Abstract/Summary:PDF Full Text Request
With the rapid development of information technique,innumerable text data are continuously growing.Unlike digital data,the processing of text data is more complex and difficult.Tolerance Rough Set Model(TRSM)is an extension of classical rough set theory.In TRSM,the traditional equivalence-based relation is replaced by a tolerance-based relation in the universe,which can be well applied in natural language processing.In this paper,from the perspective of uncertainty and imprecision of text data,the problem of text processing in natural language processing is studied by using tolerance rough sets.Firstly,for document representation is one of the foundations of natural language processing,we propose two tolerance rough set based Bag-of-Words models,called TRBo W1 and TRBo W2 according to different weight calculation methods.They solve the problem of sparsity and lacking of latent semantic relations of the traditional Bag-of-Words model,and can learn the document representation without any training or prior knowledge.Comparative experiments on various document representation methods for text classification on different datasets have verified the performance of our methods.Secondly,we improve the traditional tolerance rough set model,with the advantage of lower time complexity and becoming incremental compared to the traditional one.Then we introduce the probabilistic tolerance rough set to improve the text representation algorithm and apply it in sentence similarity calculation,and propose a sentence similarity computation model from the perspective of uncertainty of text data based on the probabilistic tolerance rough set model,which has the ability of mining latent semantics information and is unsupervised.Each sentence is represented by a pair of upper approximation and lower approximation.The upper approximation similarity and lower approximation similarity of each sentence pair are computed,respectively.The ultimate sentence similarity is modeled as the linear combination of the upper approximation similarity and the lower approximation similarity.Experiment on SICK2014 task to calculate sentence similarity identifies a significant and efficient performance of our model.Finally,for account that Z-number can measure the reliability of uncertainty events,we integrate the tolerance rough set,and put forward an evaluation method of machine translation quality based on tolerance rough set and Z-number.It computes the score of machine translation,which transforms Z-number into conventional fuzzy number,uses fuzzy comprehensive evaluation algorithm,and integrates the tolerance rough set.An instance is analyzed at last.
Keywords/Search Tags:tolerance rough sets, document representation, sentence similarity, machine translation evaluation
PDF Full Text Request
Related items