A Deep Dictionary Learning Model Based On Tensor

Posted on:2018-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:F Wang

Full Text:PDF

GTID:2428330569485434

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of complicated Internet information,how to get the important and concise information quickly is a very valuable research point.Multi-document automatic summary is a derived technology based on this background,it refers to a topic under a number of overview documents to compress,to re-extract and extract key information,so that people can quickly and accurately access to this topic refining information,without detailed reading on each document and improve the efficiency of user access to information.However,the multi-document automatic summary is still a very complicated task for Chinese.The reason is as follows: On the one hand,Chinese has the characteristic of polysemy,which makes the process of document quantification difficult.On the other hand,It is difficult to extract the central sentence that best expresses the gist of the article in different documents;Finally,since the extracted sentences come from many documents,how to sort and output the sentence is also a difficult problem.In this paper,we use the method of text clustering to realize the news multi-document automatic abstract.In the text preprocessing,considering the characteristics of polysemy,the sentence similarity is calculated based on the HowNet word similarity,and the word feature and the semantic method are merged to make the sentence similarity calculation more reasonable;In clustering,the canopy algorithm is used to solve the problem that the k-means algorithm can not determine the initial cluster center and the K value;In the extraction of key words,the use of TextRank algorithm and the most important features of the news are combined to calculate the weight of all the sentences of a cluster and extract key sentences,taking into account the sentence context and the impact of the structure of the text;Finally,the extracted key sentences are sorted by the relevant rules to get the abstract.The experiment was carried out by using the multi document automatic summarization corpus of Harbin University of Science and Technology as well as the data of some practical public opinion analysis conducted by the laboratory.The results show that in the multi document automatic summarization corpus of Harbin University of Science and Technology,the F value is raised by 1~3 percentage points;In the some practical public opinion analysis project,after the project data is processed according to the system's specific requirement,the F value is raised by 1~4 percentage points,basically meeting the user's needs for the summary.

Keywords/Search Tags:

Multi document automatic summarization, Text clustering, Sentence semantic similarity, TextRank algorithm

PDF Full Text Request

Related items

1	Design And Implementation Of Automatic Summarization System Based On Textrank Algorithm
2	Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm
3	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
4	Research And Application Of Multi-document Automatic Summarization
5	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
6	Research On Automatic Multi-document Summarization Based On Statistics And Semantic Analysis
7	Research On Automatic Text Summarization Technique Of News Documents
8	The Approach For Event-based Multi-document Automatic Summarization
9	Research Of Web Multi-document Automatic Summarization
10	Chinese Text Clustering Based On Text Similarity