Research Of Chinese Automatic Summarization Based On Word2vec

Posted on:2018-10-22

Degree:Master

Type:Thesis

Country:China

Candidate:X F Wang

Full Text:PDF

GTID:2428330566998803

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

With the emergence of information explosion in the Internet era,text summarization technology plays an important role in document text analysis,information compression and content induction.Artificial abstract extraction needs to rely on experienced experts,but there are problems such as differences,high cost,low efficiency and so on.Using automatic text summarization technology instead of manual summarization can effectively reduce the labor cost of abstract acquisition and the difference of abstract.In this paper,according to text extraction task of Chinese summarization,three abstract steps of extracting text from text preprocessing,keyword extraction and graph sorting algorithm are proposed to implement automatic text summarization extraction.Aiming at the linguistic features of Chinese characters with no spaces as delimiters,accurate segmentation of input texts is necessary to generate high-quality summaries.The coding and decoding model is used to complete the encoding and decoding process of the input text.Aiming at the problem that the recurrent neural network can not obtain the text context effectively,two-way long and short time memory neural network is used to realize the two-way learning of the text.Aiming at the problem that the model has poor learning effect,a mechanism of attention between the encoding and decoding modu les is introduced.Aiming at the problem that sentence level information can not be effectively used in the process of sequence labeling,the tagging task is accomplished by a combination of bidirectional neural network and conditional random field.The experimental verification shows that the depth-based preprocessing method used in this paper has achieved good results in the segmentation and part-of-speech tagging tasks.Aiming at the characteristics of the main idea that some sentences in the text can express the original text,this paper proposes an automatic text summarization scheme based on decimation.The preprocessing method is used for sentence segmentation,word segmentation,part-of-speech tagging,stop word filtering and part-of-speech filtering.Keywords are the key words that characterize the subject of texts and also an important preparation for the extraction of textual summaries.According to the spatial distance between words and vectors,the word similarity of words can be expressed.Word2 vec,which is used to generate word vectors,can calculate the similar words among candidate words.According to whether similar words appear in the text or not,we can get keywords by choosing words with higher weight.Aiming at the characteristics of the similarities between sentences in texts,this paper constructs a graph model structure between sentence nodes by using Text Rank algorithm based on improved graph ordering,and updates the weights of sentences according to the location information of sentences and the appearance of keywords.Similarity between sentence nodes,in order to ensure the readability of the output summarization,the sorted results is in accordance with the order of the text output.According to the abstraction method of this paper,the manual abstracts evaluation method is used to evaluate the abstracts extracted from the traditional abstracts and the improved abstracts.The experimental results show that the accuracy,recall rate and F values of the improved abstracts method proposed in this paper are both increased and thus prove the validity and correctness of the abstract method.

Keywords/Search Tags:

abstract, extraction, sequence labeling, neural network, word embedding, graph sorting

PDF Full Text Request

Related items

1	Event Extraction From Short Texts Based On Natural Language Processing
2	Research On Lexical Analysis Based On Neural Networks
3	Research On Text Causality Extraction Based On Deep Learning And Sequence Labeling
4	Research And Implementation Of Entity Relation Extraction Algorithm In News Field Based On Distant Supervision And Seouence Labeling
5	Research On Key Technologies Of Event Extraction On Event Evolutionary Graph
6	Research Of Sequence Labeling Model Based On Fine-grained Word Representations
7	Research On Sequence Labeling Model Of Natural Language Processing Based On Deep Learning
8	Research On Graph Neural Network With Graph Embedding Model For Session-based Recommendations
9	Research On Long Text Classification Based On Word Embedding Technology
10	Research On The Proofreading Method Of Chinese Typos Based On Sequence Labeling Mode