Font Size: a A A

Chinese Keyphrases Extraction Technique

Posted on:2011-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:W M LiangFull Text:PDF
GTID:2178360308452440Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Keyphrases extracted from news articles are beneficial in helping people boostbrowsing speed, but unfortunately keyphrases are rarely available for news articlesdue to the high expense of labor and time for manual annotation. This paper proposesa practical approach to extracting keyphrases for Chinese news articles using the Tex-tRank and query log knowledge. Our system can cover all areas of news. Experimentalresults prove that our algorithm outperforms the other traditional methods.Phrase is more informative than word. Previous work is word based, while ourapproach uses phrase as its basic element. We generate phrases by employing sev-eral statistical criteria with the huge amount of queries as a training corpus. We alsooptimize the statistical criteria in software code, and accelerate the generation speed.Experimental result shows the precision and recall are both improved.We use TextRank, a graph-based learning algorithm, for extracting keyphrasesfrom Chinese news articles. In addition, some instructive information, lengths ofphrases, positions of phrases, pseudo-semantic link between phrases and backgroundinformation are incorporated into the TextRank model. Experimental results demon-strate that our methods improve the performance significantly.Currently, an official specification for Chinese keyphrase extraction is needed.We also create such specification, and contribute a gold standard corpus to Chinesekeyphrase extraction development. All the articles are manually assigned keyphrasesby a third party.
Keywords/Search Tags:Chinese keyphrase extraction, TextRank, phrase generation, query log, feature integration, Chinese keyphrase extraction specification
PDF Full Text Request
Related items