Font Size: a A A

Chinese Sentence Compression Based On Probability Statistics And Dependency Analysis

Posted on:2013-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2248330371466680Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the widespread use of digital devices, higher demands are proposed by people for information accessing and receiving. It is greatly accelerated the speed of looking for information with the search engine, but the results are links related to the query, and most of the content is repeated or similar, so it is necessary to identify information by manual method, which are more time-consuming.In order to obtain the main message quickly and accurately, while adapting to the information displaying requirements on mobile terminals and digital multimedia technology, many Internet products have emerged: content syndication (RSS), e-mail alerts and subtitles generation, these are all used text rewriting technology. In recent years, this technology is applied in many fields, such as multi-document summaries, quizzes systems, machine translation and natural language processing, etc. Sentence compression occupies an important position in which it is committed to retain the key information of the original sentence to generate a more brief one, which is still grammatical.Our Chinese sentence compression tool is based on noisy-channel model while also inspired by Jing and McKeown. She uses a set of sentences, aligned with human-written compressed ones, which is similar to our training parallel corpus. She also applied a syntactic parser to analyze the syntactic structure of the input sentences. Instead, we adopt using the collapsed dependencies produced by Stanford Parser and compute the probability of each dependency to extract the rules for compression. On the basis of previous studies, thinking of some particular features of the Chinese corpus, we use several knowledge resources to preprocess the original sentences, such as glossary of abbreviations and Numbers to Digitals table.
Keywords/Search Tags:text rewrite, sentence compression, probobility statistics, denpendency analysis, knowledge base
PDF Full Text Request
Related items