Research And Application Of Unsupervised Text Simplification Based On Pre-trained Language Modeling

Posted on:2022-11-20

Degree:Master

Type:Thesis

Country:China

Candidate:F Zhang

Full Text:PDF

GTID:2518306614954559

Subject:Books intelligence

Abstract/Summary:

PDF Full Text Request

The main purpose of the text simplification task is to reduce the complexity of text content and syntax,while preserving the main information and meaning of the source text.The main function of text simplification is to help people with less linguistic knowledge to better understand text content.Most of the current text simplification methods are based on neural network models.Such text simplification methods usually require large-scale parallel corpora to train neural networks to achieve the performance of text simplification.However,the existing text simplification corpora contains many problems,such as insufficient text simplification provided by the corpus,and different meanings of simple-complex sentence pairs,which lead to the ineffectiveness of the trained text simplification model.The research contents of this paper are as follows:(1)An unsupervised statistical text simplification based on phrase-based machine translation system(UnsupPBMT)achieved good performance.However,this method also has some disadvantages:1)UnsupPBMT aligns a large number of non-similar words to initialize the phrase tables,which will bring the noise to the simplification system.2)Many simple high-similar words are hard to find using word embedding modeling,which is not in accordance with the aim of TS task.Based on the above problems,this paper proposes an unsupervised statistical text simplification method based on pre-trained language modeling.Using per-trained language modeling to find synonyms or high-similar words,this method outperforms significantly all unsupervised text simplification methods and has comparable performance to strong supervised methods.(2)Current English lexical simplification(LS)methods adopt one simplification scheme for all users,regardless of individual users'language proficiency.To solve the problem,this paper proposes a personalized English LS method for Chinese based on their English certification.This method only simplifies these words that need to be simplified and utilizes both per-trained language modeling BERT and Thesaurus to generate the best substitutes.This is the first work that focuses on personalized English LS for Chinese and the experiment results show that this method obtains the most suitable simplification compared with the baselines.(3)This paper designs an English text simplification system based on the Django framework,which can simplify the complex English text input by users.The English text simplification system includes three modules:the system configuration module is mainly used to set various parameters in the model and system server.The preprocessing module of the system includes text segmentation,text data cleaning,and text feature extraction,mainly for pre-processing the English text.The text simplification module mainly uses the pre-trained language model Bert to simplify the text,and returns the simplified text to the front-end page for display.

Keywords/Search Tags:

Text simplification, Pretrained language model, Word embedding model

PDF Full Text Request

Related items

1	The Optimization Of Extractive Text Summarization Based On Pretrained Language Model
2	The Research Of Text Classification Based On Word2Vec Language Model And Graph Kernel
3	Research On Jointly Learning Word Embeddings And Latent Topics In Text
4	Research And Implementation Of Full-text Retrieval Combining Word Matching And Context Interaction
5	Research On News Text Summarization Algorithm Based On Pre-trained Language Model
6	Research On Chinese Word Segmentation Method Based On Word Embedding
7	Research On Joint Learning Of Topic And Embedding Model
8	Automatic Text Generation System For English Scientific Papers
9	Research On Short Text Topic Model Based On Semantic Information And Word Triangle
10	Improving Sentence Simplification Models Based On Sequence To Sequence Model