Font Size: a A A

Research Of Chinese Automatic Summarization Based On Word2vec

Posted on:2018-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2428330566998803Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
With the emergence of information explosion in the Internet era,text summarization technology plays an important role in document text analysis,information compression and content induction.Artificial abstract extraction needs to rely on experienced experts,but there are problems such as differences,high cost,low efficiency and so on.Using automatic text summarization technology instead of manual summarization can effectively reduce the labor cost of abstract acquisition and the difference of abstract.In this paper,according to text extraction task of Chinese summarization,three abstract steps of extracting text from text preprocessing,keyword extraction and graph sorting algorithm are proposed to implement automatic text summarization extraction.Aiming at the linguistic features of Chinese characters with no spaces as delimiters,accurate segmentation of input texts is necessary to generate high-quality summaries.The coding and decoding model is used to complete the encoding and decoding process of the input text.Aiming at the problem that the recurrent neural network can not obtain the text context effectively,two-way long and short time memory neural network is used to realize the two-way learning of the text.Aiming at the problem that the model has poor learning effect,a mechanism of attention between the encoding and decoding modu les is introduced.Aiming at the problem that sentence level information can not be effectively used in the process of sequence labeling,the tagging task is accomplished by a combination of bidirectional neural network and conditional random field.The experimental verification shows that the depth-based preprocessing method used in this paper has achieved good results in the segmentation and part-of-speech tagging tasks.Aiming at the characteristics of the main idea that some sentences in the text can express the original text,this paper proposes an automatic text summarization scheme based on decimation.The preprocessing method is used for sentence segmentation,word segmentation,part-of-speech tagging,stop word filtering and part-of-speech filtering.Keywords are the key words that characterize the subject of texts and also an important preparation for the extraction of textual summaries.According to the spatial distance between words and vectors,the word similarity of words can be expressed.Word2 vec,which is used to generate word vectors,can calculate the similar words among candidate words.According to whether similar words appear in the text or not,we can get keywords by choosing words with higher weight.Aiming at the characteristics of the similarities between sentences in texts,this paper constructs a graph model structure between sentence nodes by using Text Rank algorithm based on improved graph ordering,and updates the weights of sentences according to the location information of sentences and the appearance of keywords.Similarity between sentence nodes,in order to ensure the readability of the output summarization,the sorted results is in accordance with the order of the text output.According to the abstraction method of this paper,the manual abstracts evaluation method is used to evaluate the abstracts extracted from the traditional abstracts and the improved abstracts.The experimental results show that the accuracy,recall rate and F values of the improved abstracts method proposed in this paper are both increased and thus prove the validity and correctness of the abstract method.
Keywords/Search Tags:abstract, extraction, sequence labeling, neural network, word embedding, graph sorting
PDF Full Text Request
Related items