Font Size: a A A

Research On Multi Feature Based Extract Text Keyword Algorithm

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:S X LinFull Text:PDF
GTID:2428330626958932Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the exponential growth of the number of literature informations or short news,the amount of text informations are increasing all the time.Therefore,how to extract keyword from information,use information extract technology to classify text,and meet the needs of information retrieval,have been became a hotspot research in the field of natural language processing.Traditional keyword extraction method was mainly used TF-IDF algorithm,by setting different size thresholds for different fields to extract text keywords.Although this method is fast and suitable for the application of searching engine,it is limited by chance and domain,and its accuracy is high or low occasionally.The algorithm of basing on semantic extract text keywords can reach the level of word meaning analysis,but different needs people have different ways to understand text,reading the same article can get different keyword results.If synthesize the information extraction technology with a various of word feature on the basis of semantics,and infer users' subjective preference,the result of keyword extraction can not only meet all the needs of different people,but also be suitable for keyword extraction of text in different scenes,which can improve the accuracy and stability of keyword extraction technology.Therefore,basing on the issue of current situation,this paper make a profound study.In order to meet the needs of different preference keywords,this paper mainly includes the following four aspects:1.Proposing a DIP semantic similarity algorithm based on Word Net semantic dictionary,which makes use of the five structural of relationships among words in the dictionary.By extracting information from path factor,information content and attribute factor,improved the definition of traditional information,quantified the correlation between words,and not only improved the accuracy of keyword extraction,but also solved the problem of traditional one-sides semantic algorithm.2.Proposing a SA semantic analysis algorithm based on semantic similarity.Using the number of word references as a moderatingfactor.Counting the number of the key position word meanings' DIP semantic similarity after removing meaningless words between the key words' DIP semantic similarity are greater than a specific threshold.Use it instead the traditional algorithm of counting word meaning coverage rate.It could achieve the purpose of accurately locating the meaning of polysemous words.3.Put forward five tuples with the structure of multi feature(word frequency,word length,word span,word position feature,semantic similarity).By setting different size of feature value,we can meet the need of different scene,different fields,and different preferences.4.This decision tree of feature gain value and the method of iterative calculation for user's subjective preference was proposed.According to the user's result feedback,it can infer the proportion relationship between feature value in the five tuples of user's needs,which improves the accuracy of keyword extraction and ensures results will meet people's wishes.Finally,in order to verify the accuracy,rationality and domain independence of the algorithm in this paper,a multi-feature system by extracting text keyword was built.It used the English abstracts of 200 papers under five major classifications of ten disciplines in How Net.As the information source,reference keywords given by the author.After reading the literature,100 times of second iteration calculation are carried out,and then 100 times of automatic extraction are carried out,too.At the same time,with calculating the accuracy of keywords,it is compared the traditional similarity algorithm based on semantics at home and abroad.It is proved that the algorithm based on multi feature extract text keywords can improve the accuracy,and the results of keywords will satisfy the wishes of different preferences.
Keywords/Search Tags:Semantic analysis, information extraction, multi feature, subjective preference, keyword extraction
PDF Full Text Request
Related items