Font Size: a A A

Research On Extraction Methods Of Chinese Articles And Topickey Phrases

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:C H LiuFull Text:PDF
GTID:2428330596479598Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Nowadays,with the development of Internet technology,the text information generated by the increasing number of Internet users needs to be processed in a timely and effective manner.Therefore,efficient text mining technology has become a key research topic.The article key phrase and topic key phrase extraction technology is the basic work of text mining,which affects the application quality of the text mining in various fields.At present,key phrase extraction technology of articles and topic is widely used in many fields,such as keyword search engine,AI speech recognition,text emotion analysis,user intelligent recommendation and so on.The main work is based on statistics,natural language processing and machine learning.Based on the three classical algorithms,three improved key phrase extraction methods are proposed.And the specific research content and results are as follows:In the first scheme,a Chinese key phrase extraction method based on TF-IDF and multi-feature constraints is proposed.Firstly,the limitation of setting TF-IDF statistics is analyzed,and more constraints are added to complete multi-feature constraints according to the characteristics of Chinese words.Then the sequential combination technology is added to make up for the defect that TF-IDF can not extract phrases.On this basis,the Chinese word segmentation system and the improved phrase sorting technology are incorporated to form the main body of the scheme.Finally,the specific parameters of the algorithm are determined in a large number of experiments.Finally,the experimental results of comparison between the proposed scheme and the classical algorithms at home and abroad are given.The numerical results show that the key phrase mining effect of this scheme is significantly improved compared with the contrast algorithm.In the second scheme,the key phrases extracted by the classic key phrase extraction algorithm have low accuracy,high ambiguity and less information coverage.Firstly,inspired by the English key phrase extraction algorithm TAKE,the Chinese word segmentation system is added to improve the Chinese word segmentation ability of the original algorithm.And then the new word recognition technology based on multi-domain specificity is incorporated to improve the final word segmentation effect.On this basis,an improved take algorithm is formed by the fusion of word filtering and feature calculation,which is applied to the key phrase mining of Chinese text.Finally,compared with various traditional key phrase extraction algorithms,the experimental results show that the proposed scheme has a significant improvement compared with the traditional algorithms in the quantitative results of the extraction accuracy,recall rate and f-value index.In the third scheme,we proposed a key phrase extraction method.Firstly,we introduces L statistics to improve the segmentation effect of the original algorithm Kert,aiming at the insufficiency of the Chinese word segmentation ability of the original algorithm,and the key phrase extraction algorithm based on the same topic is proposed to extract the key phrases from several articles under the same topic.According to the problem of phrase word order ambiguity produced by FP-Growth in Kert,a constraint merging algorithm is proposed,and then the whole framework of the algorithm is completed by improving the sorting algorithm.Finally,the experimental results show that the proposed scheme has a better performance than the contrast algorithm for topic key phrase mining.
Keywords/Search Tags:Text mining, Participle, Key phrase, Feature constraint, Phrase sort
PDF Full Text Request
Related items