Font Size: a A A

Unsupervised Extractive Text Summarization Using Sentence Embedding

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Ahmad ShehzadFull Text:PDF
GTID:2518306602475994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The aim of this study is to perform text summarization on emails and papers using Python.The majority of text summarization datasets that are publicly available are for long documents and posts.Because the structure of long papers and articles differs significantly from that of short emails,models trained with supervised methods may suffer from a lack of domain adaptation.As a test,this thesis is aiming at looking into unbiased summary prediction using unsupervised methods.Dense vector representations of words,and more recently,sentences have been shown to boost performance in various NLP tasks.We propose a method for performing unsupervised extractive text summarization using sentence embeddings.We test two datasets and discovered that performance improved dramatically over a simple baseline,approaching a competitive baseline.In NLP,there have been numerous efficient implementations of dense vector representations of terms.Textual similarity and entailment estimation and text categorization have recently been shown to be accurate using dense vector representations of sentences.Using sentence embedding,we propose a method for detecting paraphrases in text summarization in this project.Text summarization involves condensing a source text into a shorter version while maintaining its content quality and overall context.With the abundance of data available on the Web in unstructured text,efficient methods of summarizing text are needed due to people's inability to assimilate vast quantities of knowledge.Document summarization techniques typically employ several mechanisms to either recognize or delete redundant sentences in the text.We propose finding semantically connected groups of sentences by clustering sentences.They are projected to a high-dimensional vector space and selecting members from these clusters to form a summary.
Keywords/Search Tags:text summarization, word embedding, natural language processing, sentence embedding, skip thought vectors
PDF Full Text Request
Related items