Font Size: a A A

Research And Realization On Chinese Text Topic Analysis Technology

Posted on:2009-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:W T LiangFull Text:PDF
GTID:2178360272474297Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the era of global information integration, the continuous growth of network resources provides people a number of electronic texts. People can get a lot of knowledge or skills from these electronic texts, but faced with too much information so that they have no enough time to read these texts. Although there are many search sites in which people can search by keywords to find relevant information, the search results still include too much information, so people often only discovered many texts which they don't interest in or require after reading all texts. Therefore, how to effectively do text topic analysis becomes an urgent problem.This paper mainly studies text topic partition and text topic identification of text topic analysis technology, including some of the following:Firstly, this paper reviews research status, related concepts and current technology about text topic analysis technology, and studies how to use the evaluation methods of natural language processing in text topic analysis.Secondly, a new paragraphic similarity method based on SVO is proposed. The method is applied to text topic partition, and then the paper proposes text topic identification method based on key sentences, which is used after text topic partition. After text topic partition is processed, the text is divided to some text blocks. Each text block has relatively independent topic, and then a key sentence is identified for each text block. If the key sentence's semantic or structure is incomplete, it will be processed in order to its integrity. The text topic partition method and the text topic identification's method are referred to as text topic analysis technology based on statistics. Experimental results show that the text topic partition's precision is higher than the traditional partition method of the establishment of paragraphic vector space model to compute paragraphic similarity. The text topic identification's key sentences are superior to the key sentences which are found by Microsoft Word to a certain extent.In addition, the paper proposes another text topic analysis technology based on statistics and knowledge in order to prevent missing some topic identification of text. Synonymous knowledge and topic knowledge are used in the technology. The technology firstly does text topic identification and then do text topic partition. After the text topic partition, text topic identification based on key sentences is used. Two kinds of topic's results will be collected as a result of the entire text topic. The technology has improved the precision about text topic partition and text topic identification to some extent.Afterwards, this paper mixes VC++6.0 and Matlab in order to program the text topic analysis system which can extract text topic and text partition results.In the end, this paper concludes by summarizing the research and indicating its future work.
Keywords/Search Tags:Topic Identification, Topic Partition, Paragraphic Similarity, Topic Knowledge, Synonymous Processing
PDF Full Text Request
Related items