Method Of Webpage Keyword Extraction Based On Word Span

Posted on:2016-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:G Xu

Full Text:PDF

GTID:2308330470960228

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Key words commonly used in main content indexing papers, information retrieval system using the keyword collection for readers to check. But in today’s society is the era of Internet, the amount of information Webpage on the huge network application, more and more rich, the importance of keywords.The foreign research on Webpage keyword extraction started earlier, USA IBM, H.P.Luhn first proposed the automatic indexing of keywords, today, nearly 60 years of development. ADM Turney for the first time in the automatic extraction of key phrases of the genetic algorithm and C4.5 decision tree machine learning method. A special method for the automatic extraction of keywords around Webpage Webpage, according to Webpage with ordinary text, make full use of various markers of Webpage with Webpage keywords automatic extraction.Keyword extraction algorithm commonly used statistical-based methods, methods based on semantic network based on the words of the method, the paper on the basis of existing algorithms given page keyword extraction method based on word span, relying on web surface special, make full use of various Web pages were analyzed to identify the text, and then use the position words in the article content appears first and last occurrence of the word appears in the text as well as over the total number of paragraphs and paragraphs of text ratio and other factors, improved algorithms weight the right formula, help reduce the impact on the local keyword extraction results, but also give full consideration to the proposed method of word frequency factor, POS factor, word location factor, word length factor, appears in the prompt word After other characteristics factors, these factors through weight calculation extract keywords. In addition, the application of high-frequency combination of words generated by this algorithm also help to improve the accuracy of the algorithm. The traditional method due considerations less feature items considered not much, so the overall effect is not as good as our algorithm. The results showed that: compared with the traditional algorithms, our algorithm has been significantly improved in the recall and precision, and with the increase in the number of test set, the more detailed test results. At the same time for different lengths and types of text, this algorithm have shown a strong stability, and no results deteriorated sharply phenomenon for a particular type of test set.

Keywords/Search Tags:

Keyword extraction, Page keywords, weight calculation, the word span

PDF Full Text Request

Related items

1	Chinese Keyword Extraction Method Based On Word Span And Its Application In Text Classification
2	Sort Of Facing Pages Keyword Weight Calculation
3	Study On Extraction Of Uygur Keywords In Public Opinion Analysis
4	TF-IDF And Rules Based Automatic Extraction Of Chinese Keywords
5	Word Network-based Keywords, Automatic Extraction Methods, And In The Chinese Web Page Classification In The Study
6	Research On Keyword Extraction From Chinese News Web Pages Based On Compose Features
7	An Intelligence Searching Method Of Scinetific Articles
8	Chinese Keyword Extraction And Analysis Based On Tourism Weibo
9	Research Of Answer Ranking Method Based On Weighted Keywords
10	Design And Implementation Of Core Word Extraction System In Search Engine