Research On Chinese Text Feature Extraction Method Based On Semantics

Posted on:2018-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:Q Yu

Full Text:PDF

GTID:2348330542990825

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of technology,the amount of information available to people is growing in geometric multiples,most of which are circulating in the form of text.In the face of such an information-exploding age,it is imperative to capture our target information quickly and efficiently from these massive amounts of data.As a kind of effective text information data mining method,the significance of text classification method lies in the text clear classification according to the theme content and improving the timeliness of the target information.As the critical part of text classification,the main function of feature extraction is to reduce the text feature space dimension,through it,selecting the most rich special words that contain the topic of the text.The collection of special words selected will serve as an effective guarantee for determining the type of text.Traditional feature extraction methods are mostly based on the simple thought of mathematical statistics,which thought key words were independent between each other.So this ignores the text structure and semantics of selection of key importance,which leading to the semantic factors cannot play a role in the process of feature extraction and affecting the accuracy of text classification.Therefore,due to the semantic deficit problem in the traditional methods,this paper presents a Chinese text feature extraction method based on semantics.Firstly,the collection of special words after pretreatment as a network node is expressed as a weighted semantic network structure and connect the words in the sentence that are less than or equal to 2.The calculation method of the weight of the edge is calculated based on the semantic correlation degree based on the wikipedia knowledge base.Secondly,in order to effectively extract special words that content the abundant of key theme information,this paper proposes a method based on K-shell decomposition algorithm.According to this method,text weighted semantic network is divided into several layers based on the central nature of the nodes and the higher the hierarchy,the more central the inner core of the inner node.Finally,the first n keywords are chosen according to the order of the highest order.Finally,in order to validating the feasibility and effectiveness of Chinese text feature extraction method based on semantics proposed in this paper,lots of experiment bwtweenthis method and traditional methods were made.These results of experiment show that the proposed feature extraction method has strong stability under different feature dimension.At the same time,on the basis of the full rate,the accuracy rate and the F1 value,the three evaluation indexes are better than the traditional method in recall.Thus method proposed in this paper has been proved effective.

Keywords/Search Tags:

feature extraction, text semantic network, k-shell, semantic relevance

PDF Full Text Request

Related items

1	Semantic Feature Extraction Algorithm, The Contents Of Text Classification
2	Text Understanding Based On Semantic Relevance Under Internet Environment
3	Research On Methods Of Text Semantic Analysis Oriented Towards Different Views
4	Research On Ontology-Based Semantic Text Categorization
5	Research On Feature Selection Method Based On Text Category Relevance Degree And Latent Semantic Analysis
6	Research On The Svm-based Video Semantic Extraction And Relevance Feedback
7	Research On The Svm-Based Video Semantic Extraction And Relevance Feedback
8	Research On Text Semantic Relevance Calculation And Its Application
9	Creating a biomedical ontology indexed search engine to improve the semantic relevance of retreived medical text
10	Research Of Chinese Text Preprocessing Based On Semantic