Font Size: a A A

Deep Learning Based Domain Concept Extraction Algorithm

Posted on:2015-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J HongFull Text:PDF
GTID:2268330431962842Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Domain concept is a form of domain knowledge and is used for the abstract description of a particular object by human in the cognitive process. Domain concept reflects the development in the domain to which belongs. The main research area of domain concepts extraction is how to use computer automatically or semi-automatically to acquire domain concepts from the domain material, related achievements have been widely used in information retrieval, text classification, machine translation and other areas of natural language processing.Domain concept can be divided into single word type and compound type. Compound type is composed of two or more words and has a larger proportion. Literature survey found that the research results of compound type extraction methods are more abundant, relevant methods generally combine statistical methods and linguistic rules. However, the research results of single word type extraction are relatively rare. For the single word type, existing methods mainly take ’termhood’into account by defining some statistic to quantify the degree of domain correlation calculation, and then get the order of candidates in accordance with their ’termhood’ values. Finally the candidates are filtered by setting the filter threshold. The feature selection of these methods is relatively simple, the ability to overcome the noise is not strong, feature weights and the set of threshold are lack of scientific basis, which makes them susceptible to subjective factors, and the accuracy is to be improved. However, single word is an important component of compound, so it is very helpful to improve the extraction result of the single word for compound type. Based on the consideration above, we choose the single word type domain concept extraction as the research object of this paper.Machine learning has been successfully applied in many fields including natural language processing since its birth. Artificial neural networks (ANN) is a mature machine learning method and simulates the structure and function of the human neural networks, which has advantages of learning ability, robustness and self-adaptability and is suitable for modeling the complex mapping between the feature data and category tag of domain concept. Deep Learning is an emerging machine learning method, which is mainly used to solve learning problems of artificial neural networks including multiple hidden layers. This deep neural network model simulates the human brain further, which has showed a more powerful learning ability.Considering that the current study is insufficient and taking advantages of artificial neural networks and Deep Learning to solve complex pattern classification problems, one deep neural network model is constructed to improve the recognition effect of single word type domain concept in this paper. The main work of this paper includes:1) We propose feature extraction method based on single word type domain concept. We choose term frequency, document frequency, inverse document frequency, word length, term variance and domain consistency as features according to characteristics of distribution of domain concept in domain document set and other document set, improving the discrimination of feature vector.2) We propose to use neural networks to model the single word type domain concept, which effectively represents the complex mapping relationship between the multi-dimensional feature vector and class label of the single word type domain concept, prevents the impact of noise to extraction algorithm and avoid artificial set feature weights and thresholds.3) We propose Deep Learning based single word type domain concept extraction algorithm, construct deep neural network model that has multi hidden layer to identify domain concepts and fully excavate the combination relationship between the original features. Firstly we use the deep belief networks to learn a more reasonable initial network parameters unsupervised, which reduces the risk of network going to local optimum value. Secondly we use back-propagation algorithm to fine-tune the network through a supervised training process.In this paper, the text classification corpus offered by’Sogou laboratory’is used for experiment. We select100texts of the military domain as the training set and select30texts as the testing set. Experimental results show that the method proposed is effective and feasible, the deep neural network model proposed gets the precision rate of74.27%, the recall rate of51.80%and the F-value of61.03%. On the same data set, the F-value achieved by existing KNN model and SVM model is52.63%and58.50%respectively. The deep neural network model obtains a higher F-value compared to the shallow model and can achieve the balance of precision rate and recall rate and better overall results.
Keywords/Search Tags:Domain concepts, automatic domain concepts extraction, artificial neuralnetworks, Deep Learning, deep belief nets
PDF Full Text Request
Related items