Font Size: a A A

Research And Implementation Of Text Categorization Algorithm In Knowledge Management System

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2348330542998634Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,knowledge updates faster and faster,the Internet is full of all kinds of knowledge data.In addition,with the development of information retrieval and search technology,obtaining information through the Internet has become the most common way of obtaining information in people's lives.On the Internet,most of the knowledge exists in the form of text.In order to effectively manage and better obtain the target text information,text-based information retrieval and data mining have become the areas of great interest in recent years.Among them,text classification is the basis of information retrieval and text data mining.Its main task is to learn the mapping between text content and text category according to the given text content and text category,then use that data to train a classification model,and finally use this classification model to deal with unknown classified text to determine or predict the unknown text category.Text categorization is widely used in NLP(natural language processing),information retrieval,information filtering,digital library,news classification and other fields.Knowledge management system is a distributed system to store and manage large-scale knowledge documents.In order to improve the efficiency and usability of the system,the system need to classify the documents in different fields to improve the speed and accuracy of the retrieval.Because of the huge number of texts,it is not feasible to use manual classification.Based on this situation,it is necessary to design an efficient,high accurate text classification method to complete the text classification automatically.This article studies the characteristics of the text in knowledge management system.It is very difficult to extract the features,because these texts are characterized by high dimensionality,sparsely and polysemy.In addition,some traditional machine learning uses only the keyword as the feature that will make the contextual semantic information lost,which leads to the low classification accuracy;On the other hand,in order to improve the accuracy,it is necessary to design a complex feature project for a specific classification area,which increases the design cost of the model.In a word,the current situation is that text classification method is still need to be improved.In view of the above two problems,this paper proposes to use the word embedding method to represent the text,and use deep learning algorithm to automatically extract text features and do classification,and use the way of multi-model fusion to improve the classification model.The results of experiments show that the text classification method based on deep learning can achieve good classification effect,and the fusion of multiple models can indeed improve the accuracy.
Keywords/Search Tags:knowledge management system, natural language processing, text categorization, deep learning
PDF Full Text Request
Related items