Research On Chinese Semantic Keyword Extraction Method Based On Multiple Features

Posted on:2020-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:L J Li

Full Text:PDF

GTID:2428330623467256

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid development of information technology promotes the geometric growth of network data,which leads to more and more data and makes it more and more difficult to search and utilize text information effectively.In the face of massive information,especially in the face of explosive growth of text information,it has become an urgent problem to efficiently capture useful information from massive text.In order to solve this problem,we need to extract the central words from the text which can reflect the theme of the text.These words are called keywords.Keywords can well reflect the author's thoughts and the theme of the article,so that readers can quickly understand the main content of an article,so it is of great significance to have a skilled automatic keyword extraction method.Keywords as the core content of the text,should not only reflect the importance of words,but also reflect the relevance between the text and the theme.However,there are few researches on the relevance of keyword themes,and most of them focus on the linguistic probabilistic model of words or lexicograph-based research,so the implicit semantic characteristics of words cannot be mined.In addition,most text in the display world does not provide tagged keywords.If manual labeling is adopted,it is not only inefficient,time-consuming and laborious,but also the subjective consciousness of people has a great impact on the labeling results.Therefore,manual keyword allocation is a time-consuming and tedious task.Based on the above factors,this paper mainly studies the topic relevance of keywords and the problem of less marked corpus.The main contents of this paper are as follows:(1)This paper proposes a method to calculate the correlation between words and text topics.The text preprocessing algorithm firstly get corresponding candidate text keyword sequence,and combined with domain knowledge to training text corpus data get word vector list,and then according to the word or word vector list corresponding text vector sequence,the single word in the text,vector clustering text clustering center,finally calculated each candidate keywords and similarity of clustering center,as the semantics of the correlation between words and text theme.(2)Aiming at the problem that the topic relevance of keywords is not strong,this paper proposes a keyword extraction method that integrates semantic features.This algorithm research focuses on feature extraction of candidate keywords in text.On the basis of previous studies,this paper extracted four kinds of characteristics including word frequency,length,location and language information of candidate keywords,including similarity features of words and text topics,which were used as training sample data of classification model to train keyword classification model.The experimental results show that the keyword extraction method with semantic feature fusion improves the accuracy by 16.2% and f-score by 20.5% compared with the traditional TFIDF method.Keywords extracted can not only reflect the importance of words,but also reflect the relevance of the theme of words.(3)To solve the problem of less marked corpus,this paper combines the multifeature keyword extraction method with semi-supervised learning method,and proposes an improved semi-supervised keyword extraction method.The algorithm improves the method of initial training sample selection,and extracts training samples with high confidence through cross validation,to improve the accuracy of the model.Experiments show that,with certain experimental data,the supervised algorithm can only learn the rules from labeled samples,while the semi-supervised algorithm can not only learn the rules of labeled samples,but also dig out the internal rules of unlabeled samples.

Keywords/Search Tags:

keywords extraction, word embedding, semantic features, support Vector machine, semi-supervised learning

PDF Full Text Request

Related items

1	Research On Semi-Supervised Support Vector Machine Learning Methods
2	Research On Supervised And Semi-Supervised Support Vector Machine
3	Research On Models And Algorithms Of Semi-supervised Support Vector Machine
4	Research On Semi-supervised Support Vector Machine Learning Algorithsm
5	Sparse Laplacian Support Vector Machine For Semi-supervised Learning
6	Research On Classification Method Of Semi - Supervised Support Vector Machine
7	The Web Pages Classification Method Based On Semi-supervised Support Vector Machine
8	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
9	Automatic Classification Of Semantic Relation Between Nominals
10	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning