Font Size: a A A

A Deep Dictionary Learning Model Based On Tensor

Posted on:2018-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2428330569485434Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of complicated Internet information,how to get the important and concise information quickly is a very valuable research point.Multi-document automatic summary is a derived technology based on this background,it refers to a topic under a number of overview documents to compress,to re-extract and extract key information,so that people can quickly and accurately access to this topic refining information,without detailed reading on each document and improve the efficiency of user access to information.However,the multi-document automatic summary is still a very complicated task for Chinese.The reason is as follows: On the one hand,Chinese has the characteristic of polysemy,which makes the process of document quantification difficult.On the other hand,It is difficult to extract the central sentence that best expresses the gist of the article in different documents;Finally,since the extracted sentences come from many documents,how to sort and output the sentence is also a difficult problem.In this paper,we use the method of text clustering to realize the news multi-document automatic abstract.In the text preprocessing,considering the characteristics of polysemy,the sentence similarity is calculated based on the HowNet word similarity,and the word feature and the semantic method are merged to make the sentence similarity calculation more reasonable;In clustering,the canopy algorithm is used to solve the problem that the k-means algorithm can not determine the initial cluster center and the K value;In the extraction of key words,the use of TextRank algorithm and the most important features of the news are combined to calculate the weight of all the sentences of a cluster and extract key sentences,taking into account the sentence context and the impact of the structure of the text;Finally,the extracted key sentences are sorted by the relevant rules to get the abstract.The experiment was carried out by using the multi document automatic summarization corpus of Harbin University of Science and Technology as well as the data of some practical public opinion analysis conducted by the laboratory.The results show that in the multi document automatic summarization corpus of Harbin University of Science and Technology,the F value is raised by 1~3 percentage points;In the some practical public opinion analysis project,after the project data is processed according to the system's specific requirement,the F value is raised by 1~4 percentage points,basically meeting the user's needs for the summary.
Keywords/Search Tags:Multi document automatic summarization, Text clustering, Sentence semantic similarity, TextRank algorithm
PDF Full Text Request
Related items