Font Size: a A A

Research On Chinese Keyword Extraction Algorithm Based On News Report

Posted on:2017-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q HuFull Text:PDF
GTID:2308330503957666Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The keywords can help people to understand the main content and theme of the article, save the browsing time and provide decision whether to read further the news report for the user. However, the internet as an important channel for the dissemination of news and most of the news pages without keywords. In addition, the network news gradually get rid of the shackles of traditional media news writing style, which is formed for their own writing features. Therefore, the existing keyword extraction methods are not entirely suitable for the network pages and select the appropriate keyword extraction method will improve people’s browsing speed and satisfaction.Based on the above considerations, this paper improved the existing keyword extraction method according to writing characteristics of news reports, proposed new keyword extraction method which adapt to the content and writing structure of news reports on the basis of analyzing domestic and foreign research situation and comparing several methods of key words extraction.Improve the keyword extraction method of feature statistics on the basis of the research on the writing structure of news report. The general statistical method mainly considers the position, part of speech, word frequency etc. Besides, the position features divide the article into several parts of the title, abstract, first paragraph, the tail section, the text and assign different parameter values to words as a feature of keyword recognition item according to the position of words in the article. But for news reports or web page text, this position analysis is not appropriate while many of them without abstract or title and only have one paragraph to describe. Thus, this paper in-depth analysis of the location of the key words in the text and put forward the spacing feature which is more suitable for the characteristics of the news report.On the basis of the research on the content of news report, the keyword extraction method based on clustering is improved. On account of the news content always reports the recent facts, which will include the latest natural vocabulary and network terms and these words will not be included into the knowledge base timely, thus lead to these words cannot be recognized when calculating the similarity of words. Therefore, this paper consider to the two aspects of semantic and word association, propose to add the calculation and selection of mutual information in the general clustering method to improve the accuracy of keyword extraction. In the final results of the experiment, the method proposed in the paper has been improved in accuracy rate and recall rate, which shows that the improved algorithm of this paper is effective for the content and writing structure of the news reports.
Keywords/Search Tags:Keywords extraction, word spacing, feature statistics, clustering analysis, mutual information, news report
PDF Full Text Request
Related items