Font Size: a A A

Heuristic Chinese Sentence Compression Algorithm Based On Hot Word

Posted on:2015-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:J HanFull Text:PDF
GTID:2268330428960006Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently text compression is a promising technology and it is the core technology of the sentence compression. It makes sense to study sentence compression. Sentence compression technology can effectively trimming redundant information of the original sentence, retaining theme, make it easy for readers to get the most valuable information. The existing Sentence Compression approachs are of two kinds:rule-based approachs and statistical-based approachs. However, the existing rule-based methods are mostly applicable to the English sentence compression and can not be applied to Chinese sentence compression directly. And the statistics-based approachs rely on the "original-compression" parallel corpora, which is really difficult to obtain. So the lacking of parallel corpora increases the difficulty of the study.A linguistically-motivated heuristics Chinese sentence compression algorithm is proposed after the study of traditional methods. By analysing the human-produced compression and linguistic knowledge, two sets of rules are proposed. One is in word layer and the other is in clause layer. Two sets of rules based on the parse tree and the words dependencies are used to compress sentence, and enhance the algorithm by hot word in order to keep the algorithm flexibility and accuracy. In the last step the compression result is cleaned and repaired. Human-produced compression, WeiXu’s algorithm, rule-only algorithm and hot word enhanced algorithm are compared then the results are evaluated in compression rate, grammaticality, informativeness and heat.The experimental results show that with the heuristic Chinese sentence compression algorithm we get better compression results than those existing aogorithm based on rules. The algorithm based on hot word not only improve the heat of compression results, but also improve the accuracy of the algorithm. By using the simulated annealing algorithm we obtain the best weights of each rule. We use these weights when compressing the sentences and we get best compression results.
Keywords/Search Tags:Chinese sentence compression, hot word, simulated annealing
PDF Full Text Request
Related items