On The Application Of Tolerance Rough Sets In Natural Language Processing

Posted on:2021-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:H H Jiang

Full Text:PDF

GTID:2428330614458629

Subject:Systems Science

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technique,innumerable text data are continuously growing.Unlike digital data,the processing of text data is more complex and difficult.Tolerance Rough Set Model(TRSM)is an extension of classical rough set theory.In TRSM,the traditional equivalence-based relation is replaced by a tolerance-based relation in the universe,which can be well applied in natural language processing.In this paper,from the perspective of uncertainty and imprecision of text data,the problem of text processing in natural language processing is studied by using tolerance rough sets.Firstly,for document representation is one of the foundations of natural language processing,we propose two tolerance rough set based Bag-of-Words models,called TRBo W1 and TRBo W2 according to different weight calculation methods.They solve the problem of sparsity and lacking of latent semantic relations of the traditional Bag-of-Words model,and can learn the document representation without any training or prior knowledge.Comparative experiments on various document representation methods for text classification on different datasets have verified the performance of our methods.Secondly,we improve the traditional tolerance rough set model,with the advantage of lower time complexity and becoming incremental compared to the traditional one.Then we introduce the probabilistic tolerance rough set to improve the text representation algorithm and apply it in sentence similarity calculation,and propose a sentence similarity computation model from the perspective of uncertainty of text data based on the probabilistic tolerance rough set model,which has the ability of mining latent semantics information and is unsupervised.Each sentence is represented by a pair of upper approximation and lower approximation.The upper approximation similarity and lower approximation similarity of each sentence pair are computed,respectively.The ultimate sentence similarity is modeled as the linear combination of the upper approximation similarity and the lower approximation similarity.Experiment on SICK2014 task to calculate sentence similarity identifies a significant and efficient performance of our model.Finally,for account that Z-number can measure the reliability of uncertainty events,we integrate the tolerance rough set,and put forward an evaluation method of machine translation quality based on tolerance rough set and Z-number.It computes the score of machine translation,which transforms Z-number into conventional fuzzy number,uses fuzzy comprehensive evaluation algorithm,and integrates the tolerance rough set.An instance is analyzed at last.

Keywords/Search Tags:

tolerance rough sets, document representation, sentence similarity, machine translation evaluation

PDF Full Text Request

Related items

1	Design And Implementation Of Heuristic Analogy Translation Mechanism In IHSMTS
2	Research On Document-Level Neural Machine Translation
3	Research On Example-Based Automatic Machine Translation For English-Chinese Patent
4	An Approach Of Measuring Sentence Similarity Based On Word Vector And Its Application To Example-based Machine Translation
5	Research And Application Of Multi-document Automatic Summarization
6	The Design And Study Of The Electric-Document Translation Assisting Tool
7	Two Direction Machine Translation Based On Sentence Semantic Embedding And Its Evaluation
8	Research And Application Of Machine Translation Technology On Recurrent Neural Network
9	Study On Technology Of Corpus Processing And Its Quality Evaluation For Statistical Machine Translation
10	Based On The Instance Of English-chinese Translation System