Font Size: a A A

Multi-Document Automatic Summarization Of Chinese

Posted on:2014-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2248330398971583Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the growing of network information, people have to read a large number of text information to select, in the end, what they want. That’s not only wasting energy, but often missing some important information because of negligence. So, multi-document automatic summarization has very practical significance.The existing automatic abstract methods mostly have a large degree of uncertainty and poor readability, because they determine the important information through the appearance frequency in the text. In order to solve the two problems, this paper presents an automatic abstracting method based on the shallow dependency relationship.In this paper, research works are summarized as follows:(1) The system design of multi-document automatic summarization system. It mainly includes data preprocessing, text tree storage, NLP (Natural language processing), syntactic knowledge matching, extraction method and abstract generation. The text pretreatment is the text classification.(2) The key technology of multi-documents automatic summarization. Mainly includes the text classification, NLP, the ontology of text structure and abstract generation method. Text classification takes simple Bayes classification method. The feature extraction use TF-IDF as selection criteria, and by sentences’ three-tuple words to define selection range, which achieve the purpose of natural dimension. NLP includes text segmentation, tagging, parsing, three-tuple extraction, etc. The ontology of text structure takes character described as ontology modeling sample. Abstract generation method, mainly through text structure matching results, abstracts the information layer-by-layer and combination isomorphic information, which can not only retain the important information, but also ensure the abstract of good readability.(3) Multi-document summarization’s results and evaluation. Taking the character described as an example, we judge the result of automatic abstract from the information coverage, readability and accuracy three aspects. Combined with artificial scoring system, automatic summarization system can achieve good results. Using TAC’s evaluation method as standard, the effect is also good.
Keywords/Search Tags:multi-document automatic summarization, shallowdependency relationship, NLP, ontology of text structure
PDF Full Text Request
Related items