Font Size: a A A

Research And Implementation Of Document Summarization Based On Combined Multi-Feature

Posted on:2018-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:T W HuangFull Text:PDF
GTID:2348330518995562Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of the Internet plus era, the amount of information shows an explosive growth. Nowadays, the quantity of information become larger and larger, and the content of information is more complex,the form of data presentation is more and more diverse. Under these trends,there is an urgent demand for a technique capable of timely processing information and extracting important and simple convenient information from information-ocean for human to browse. Automatic document summarization technology is a powerful tool to process information. It can help people survive from information-ocean. It can help people to discover critical data, capture significant information and clarify important events.However, previous automatic document summarization technology is still not perfect to achieve this vision. Currently automatic document summarization technology is still limited by the development of many other technologies. Traditional automatic document summarization techniques are rarely related to the analysis of semantic content especially for extractive summary. Extractive summary is just the focus of this dissertation. We use hierarchical Latent Dirichlet Allocation(hLDA)algorithm for document modeling. After research on modeling results, we propose a shallow semantic feature which is called Level Distribution for document summary system. We analyze in detail the performance of the proposed hierarchical distribution feature in various summarization tasks and systems, i.e., single document summarization and multilingual multi-document summarization and demonstrate the effectiveness of the new feature by massive experiments.On the basis of these works, we compare and combine the new feature with other traditional features for document summarization, and further demonstrate the rationality and innovation of this dissertation.
Keywords/Search Tags:automatic document summary, hierarchical Latent Dirichlet Allocation, Level Distribution, extractive summary, multilingual multi-document summarization
PDF Full Text Request
Related items