Font Size: a A A

Research On Keyword Extraction From Chinese News Web Pages Based On Compose Features

Posted on:2014-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:X W MaoFull Text:PDF
GTID:2248330398456786Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the network information is exploding, news pages has become an important way for people to obtain information. How to quickly and efficiently retrieve and process the information in the news pages has become an important research work. Web content and keyword extraction is the basis for automatic text processing in the field of search engine.In this paper, the theoretical knowledge about keyword extraction is introduced firstly, including the concept of keyword extraction, natural language processing and web content extraction. Secondly, the concept and generation method of compound word are introduced. Then the method of keyword extraction from Chinese news web pages based on compose features is proposed. Based on the segment result of web page text, the method get candidate keywords by calculating its weight of text features. Combined with compound word generated by the compound word generation algorithm which based on directed graph, the method can judge which word to be the final keyword. Finally, the experiment tested on real web pages. The result of experiment shows that the proposed method can extract keywords from news web page effectively.
Keywords/Search Tags:keyword extraction, compose feature, compound word, directed graph, newsweb page
PDF Full Text Request
Related items