Keyword Extraction Base On Statistical And Collaborative Filtering

Posted on:2016-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:H C Li

Full Text:PDF

GTID:2348330488957088

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, vast amounts of information are being created daily on the network. Especially the explosive growth of text information become an important issue in the field of computer natural language processing. How to accurately find the information that people need in the flood of information, it has become the current problem to be solved. To retrieve the vast amounts of textual information, first of all, the document must be effective and accurate keyword extraction. The text keyword extraction technology research plays a very important role, the application of this technology is widely used in the field, can be used for information retrieval, document classification, information feedback system, automatic summarization.This thesis focus on the keyword extraction algorithm. Firstly, based on the characteristics of Chinese text structure, an improved participle algorithm on the basis of the ICTCLAS participle system was proposed. Then the statistical characteristics of the document were analyzed and discussed. Four features which has the word frequency, the speech of word, the position of word and the length of word were selected from the common statistics features. The formula was proposed to calculate the statistical features score of the words and keywords were selected by comparing the size of the statistical features score. In addition to the statistical features, this thesis also considers the similarity between two the document. The keyword extraction algorithm based on collaborative filtering was proposed. The algorithm firstly trains the existing keyword documentation and uses collaborative filter algorithm to calculate the similarity between the document that needs extraction keywords and the document that owns keywords. Then the algorithm select the documents which have high similarity as candidate keywords. Finally, the algorithm calculate the statistical composite score for the candidate keywords in the document to select keywords. Finally, the keyword extraction algorithm based on statistical features and the keyword extraction algorithm based on collaborative filtering were combined. When there is a lot of content-related documents in the database, the algorithm will use the keyword extraction algorithm based on collaborative filtering. When there is only a few content-related documents in the database, the algorithm will use the keyword extraction algorithm based on statistics. The experiment showed that the new algorithm are more universal.This thesis also discussed the values of the parameters that appear in the algorithm and compared the performance of several different algorithms. Finally, the thesis also describes the algorithm validation tool which is designed for validation and analysis algorithm, and introduces the framework of the tool and the various functional modules.

Keywords/Search Tags:

Keyword extraction, collaborative filtering, Chinese participle

PDF Full Text Request

Related items

1	Research And Application Of Collaborative Filtering Algorithm Based On Keyword Extraction Technology
2	Chinese Participle Algorithm Research Based On Word Table Structure
3	Zhengzhou Tv Automatic Chinese Word Segmentation System Realization
4	Research On Chinese Word Segmentation And Keyword Extraction Model Based On Deep Learning
5	Research On Keyword Extraction And Sentiment Analysis For Chinese Text
6	Research On Simultaneous Text Summarization And Keyword Extraction Based On Hypergraph
7	Research On Keyword Extraction From Chinese News Web Pages Based On Compose Features
8	Research On Chinese Automatic Summarization Based On Keyword Filtering And Text Structure
9	Chinese Text Keyword Extraction Algorithm Based On Graph And LDA
10	The Research Of Chinese Web Text Orientation Classification