Font Size: a A A

Textual Theme Content Analysis Towards Reading Comprehension

Posted on:2017-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:L N XunFull Text:PDF
GTID:2348330512951087Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Question answering system,the system is not only used to answer simple factual questions.The question answering system now pays more attention to the speech understanding complex problems,among these problems,reading comprehension is most difficult.To get accurate answer,reading comprehension in a single document needs to base on a background material and combines with the natural language processing and information extraction technology.Because of the limitation of the numbers of document,most study of complex problems in the single document still stays on the rule-based methods.So in this paper we propose the discourse topic analysis in the single document method to solve the complicated problem.The paper's main contents are as follows:(1)Using k-means clustering to divide the topic segments.The calculation of similarity compared three methods:based on Cilin,based on Bag-of-Word and word2vector.Among them,the word2vector method is that through training the single document vector,and using the method of weight,we generate sentences vector,and then generate paragraph vector.Finally,based on the k-means clustering we divide the topic segment.(2)Adopting different strategies to extract sub-topic sentence according to the different division method of topic segments.Then the topic sentence is extracted from sub-topic sentence.(3)Solving complex problem by textual topic analysis.In this paper we mainly answer the question and generalize language reading comprehension part of the university entrance exam.This paper builds a textual theme analysis system,and facilitate the use of human-computer interaction.This article realizes textual topic analysis system which is based on Java language.The system is divided into three modules:topic segments division,the sub-topic sentence extraction,the topic sentence extraction.Through the testing 100 articles from the university entrance exam's Chinese prose,the model tests and the relevant essays,the k-means clustering method based on word2vector is best in the division of topic segments and the identification of topic sentence.The strict accuracy of topic segments' division can reach 40%,loose accuracy can reach 55%,sub-topic recognition accuracy is 54.02%.At the same time,the system testing language reading comprehension acquires a better effect.
Keywords/Search Tags:Topic segmentation, Sub-topic sentence, Topic sentence
PDF Full Text Request
Related items