Font Size: a A A

Research Of Non Domain Knowledge Dependent Text Summarization Method

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:S S XieFull Text:PDF
GTID:2308330461968115Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The arrival of the information age, the large mass of information full of people’s lives. At the same time, technology and civilization become more advance and the rhythm of life of the community be faster. How to get the consequent needs of the most useful information in a short time to adapt to society’s survival and competition issues has been growing attention. Information dissemination and document focused on multimedia and natural language text, then the text summarization research field has become widespread concern. It’s not only has far-reaching implications for social management, business value, but also the impact of the vision and the ability of each individual to obtain information. From its founding, text summarization has about 57 years of research history. Today’s text summarization methods based on machine learning, or knowledge to approach that has upgrade from simple word frequency gradually from began. Method works constantly getting better, but there is still a big gap between their results and summary of human experts. Frontier approach focuses on the field of machine learning methods and knowledge-based approach. Knowledge-based entity to identify and eliminate ambiguity has some effect, but to compare with methods based on machine learning have no significant effect improve. At the same time, the introduction of a large knowledge base, made summaries generate cost a long time, and more bigger running space required, multi-language conversion have obstacles. And for no good description of semantic relations, there is excessive understand for new concepts and new areas can not recognize. Therefore, before good knowledge based constructed, text summarization methods focus on text feature and shallow semantic scene analysis to get more accurate and a lot more applied research looks determine more value and significance.Almost all off the existing methods of text processing summary inseparable text features, text features continues be found more and more with the research going on. However, due to the diversity of language, refers to the relationship between a variety of text and word form are all have some interference on the ability the ability to indicate important information of a document of word frequency feature, and other features, such as the title of coincidence. Meanwhile, the existing text summarization methods focus on using the surface feature or rely on domain knowledge base to use the topic associated information as feature to judge text’s important and there is no good find potentially important semantic discovery capabilities of linguistic and functional grammar. Consider on these two problems, this paper proposes a method called Left Align to handle with text expression diversity issues and two Potential scene analysis algorithm called LAPS, and LAPSx expansion algorithm. In an attempt to get more accurately determine on the potential semantic scene. To provides a new way to obtain an important part of natural language texts.In our research, we firstly focus on the phenomenon that the indicating of important part’s ability being reduced by language diversity used of the existing text feature, proposed an text summarization method based on Left align processing. Subsequently, combining the theoretical basis of linguistics and functional grammar, we proposed Left align potential scene analysis algorithm (LAPS) and its expansion algorithms. In after studies, presented a potential scene analysis algorithm based on manifold sorting algorithms and LAPS called LAPSx and its expansion algorithms.Research work is as follows:First, proposed text summarization method based on Left Align processing. This method focus on the issue that important information indicating the ability of many text features to be reduced by language expressing diversity. The method will first restore co-reference the text, then unity all non-stop words synonymous with the chain’s first word by aligning, then the changing fixed into a unified speech word form, reducing the noise of the feature,which rely on word or diversity expressing, calculation and statistics.Second, proposed Left align potential scene analysis algorithm (LAPS). The algorithm is mainly analysis from the text generated source, that the linguistic constructs theory believe text is generated under functional grammar rule as the cornerstone. Trying to build a language model by functional grammar theroy, and then by Markov chain model to calculate the weight, based on language model weight to calculate the weight of the scene sentence weights, rely on this to inference the word potential existence and importance of the scene.Third, proposed LAPSx algorithm based on manifold ranking and LAPS algorithm. The algorithm under the framework of LAPS algorithm, using the traditional text feature determining the language model weight based on certain fusion methods, then the initial model weights, sorted by manifold ranking algorithm to calculate language model weight. At last, using weight determination scene sentence weight based on the model to analyze important potential scene. Follow-up study, proposed method of secondary fusion to fixed feature coverage phenomenon that caused by LAPSx indirect use text features to predict important scenes.Experimental results show that the proposed summary text processing ideas and algorithms, compared to the traditional text summarization algorithm has a better ability to determine importance of scenes, more accurately to judge important information. At the same time the idea of this paper’s algorithms can broaden new vision and idea for further researchers.
Keywords/Search Tags:text summarization, left align processing, potential scene analysis, natural language processing
PDF Full Text Request
Related items