Font Size: a A A

Research On Chinese Text Feature Extraction Method Based On Semantics

Posted on:2018-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:2348330542990825Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of technology,the amount of information available to people is growing in geometric multiples,most of which are circulating in the form of text.In the face of such an information-exploding age,it is imperative to capture our target information quickly and efficiently from these massive amounts of data.As a kind of effective text information data mining method,the significance of text classification method lies in the text clear classification according to the theme content and improving the timeliness of the target information.As the critical part of text classification,the main function of feature extraction is to reduce the text feature space dimension,through it,selecting the most rich special words that contain the topic of the text.The collection of special words selected will serve as an effective guarantee for determining the type of text.Traditional feature extraction methods are mostly based on the simple thought of mathematical statistics,which thought key words were independent between each other.So this ignores the text structure and semantics of selection of key importance,which leading to the semantic factors cannot play a role in the process of feature extraction and affecting the accuracy of text classification.Therefore,due to the semantic deficit problem in the traditional methods,this paper presents a Chinese text feature extraction method based on semantics.Firstly,the collection of special words after pretreatment as a network node is expressed as a weighted semantic network structure and connect the words in the sentence that are less than or equal to 2.The calculation method of the weight of the edge is calculated based on the semantic correlation degree based on the wikipedia knowledge base.Secondly,in order to effectively extract special words that content the abundant of key theme information,this paper proposes a method based on K-shell decomposition algorithm.According to this method,text weighted semantic network is divided into several layers based on the central nature of the nodes and the higher the hierarchy,the more central the inner core of the inner node.Finally,the first n keywords are chosen according to the order of the highest order.Finally,in order to validating the feasibility and effectiveness of Chinese text feature extraction method based on semantics proposed in this paper,lots of experiment bwtweenthis method and traditional methods were made.These results of experiment show that the proposed feature extraction method has strong stability under different feature dimension.At the same time,on the basis of the full rate,the accuracy rate and the F1 value,the three evaluation indexes are better than the traditional method in recall.Thus method proposed in this paper has been proved effective.
Keywords/Search Tags:feature extraction, text semantic network, k-shell, semantic relevance
PDF Full Text Request
Related items