Font Size: a A A

Headline Generation Based On Deep Semantic Mining

Posted on:2020-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ChiFull Text:PDF
GTID:2428330575957064Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the development of informatization,people can do many things conveniently.However,the network is also full of redundant and unstructured text,which makes trouble to people for getting information from the text of webpage.People can capture information quickly with the help of automatic Summarization,and more concise summary can be obtained by headline generation.Headline generation is the main task of this thesis.The research mines text semantic deeply from text feature representation and cascaded model with short text corpus and long text corpus for headline generation.Firstly,the thesis studies on text feature representation based on word embedding for mining semantic deeply.On the basis of word embedding,we extend text features by introducing common Summary Oriented Features(SOF),such as word frequency,word position,cluster,hierarchical distribution feature and so on,which makes text feature representation task-oriented for headline generation.The different combinations and representations of the text features are also studied.The research can mine semantic deeply from linguistics and statistics by introducing SOF features.In text feature representation,word embedding is mainly stitched with the introduced features directly.On the basis of that,we train a special word embedding for headline generation by constructing new training corpus using Named Entity Recognition(NER),Part of Speech(POS),topic features and other linguistics information,so that the new trained word embedding can contain linguistics and statistics information.SOF and word embedding can complement each other in expressing semantics in the task of headline generation for mining semantic deeply from different perspectives.The results of many comparison experiments show the effect of the proposed method.In addition,the text representation of sparse word is also explored,and experimental results show that operation of text feature representation can improve the performance slightly.Secondly,the thesis proposes a cascaded model to generate headline for short text and long text corpus.The cascaded model contains sampling algorithms and sequence mapping for mining important semantic information deeply.Sampling algorithm includes sampling probability formula and Determinantal Point Processes(DPPs)sampling.Seq2seq model is applied in Sequence mapping.For short text corpus,the cascaded model samples words for mining important semantic firstly and then maps sequence to sequence by using seq2seq model.For long text corpus,DPPs sampling is used to sample sentences semantic from input text.Dynamic DPPs and static DPPs are proposed in the research for increasing the diversity of semantics.Experimental results demonstrate that when the word number after DPPs sampling is limited in a certain number,the quality of headline generation will be improved.
Keywords/Search Tags:headline generation, cascaded model, feature representation, sampling algorithm, deep semantic mining
PDF Full Text Request
Related items