Font Size: a A A

Keyword Extraction Based On Statistic And Syntactic Parsing

Posted on:2013-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q WuFull Text:PDF
GTID:2268330401982979Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of the network, there are vast amounts ofinformation presenting every day. The explosive growth of the information is difficultand important problem confronting us in the field of computer natural languageprocessing. How to effectively control the massive data, accurately identify anddistinguish the information whether people need, has become a problem that need tobe solved today. So raised this topic of the keyword extraction, will help people toidentify and distinguish the vast amounts of information, if an article is able to extracthigh quality of keywords. The text Keywords automatically extracted processingtechnology can be widely used in many fields, such as text classification, informationfeedback system, and network information filtering systems, information retrieval,digital libraries, and automatic summarization.This article employs the keyword extraction algorithm based on TF statistics, andparsing. These includes the Chinese word segmentation, parsing, syntactic analysis,keyword extraction and so on technology, the main contents are as follows:1. Elaborate on Chinese keywords automatically extracted theoretical solutions andexperimental analysis. And raise the keyword extraction algorithm based on TFstatistics and parsing.2. Introduces in detail the Chinese participle technology, and summarizedsegmentation ambiguities. Then describes some more mature word segmentationalgorithm today, and compared, experimental data selected the Chinese Academy ofSciences segmentation system which results significantly better than the otheralgorithms experiment as the subject of the preliminary work tools. Then proposed astatistical method based on the actual application, further divided the initialsegmentation of the Chinese Academy of Sciences. 3. Detailed description of the most popular syntax analysis method: The rule-basedand statistics-based two methods. Rules and statistical comparisons with the two methods,through the research and analysis by other scholars, finally using the combination of thetwo approaches to build the tree bank.4. At parsing algorithm, introduce the more popular method briefly, and describe thecurrently recognized Chart algorithm detailed.5. In parsing, syntactic analysis, is by the University of Pennsylvania’s Penn corpus,extract the information of the structure of the sentence. And according to the practicalapplication of Chinese grammar, sentence elements respectively assigned to thedifferent levels of the value.6. Finally, through statistical and grammatical analysis, there are six kinds ofcharacteristic value as weight parameters and then explain, analysis it in detail.
Keywords/Search Tags:Keyword extraction, Syntax analysis, segmentation
PDF Full Text Request
Related items