Font Size: a A A

Research On Text Classification Algorithm Based On Knowledge Graph

Posted on:2019-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y B PanFull Text:PDF
GTID:2428330545997429Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of information technology,the rapid growth of information resources has led to the production of vast amounts of complex data.Among these data,text data occupies about 70%.How to efficiently and accurately retrieve the required information from the huge text data is one of the issues that people need to solve urgently.Therefore,the dissertation focuses on how to effectively and accurately classify large amounts of textual data,thereby greatly improving the efficiency of people's information retrieval.There are many factors influencing the effect of text categorization,including the process of word segmentation,the process of feature extraction,and the classification process.This dissertation introduction the technology of knowledge graph for the research and improvement of these three processes:1.There are many traditional word segmentation tools for word segmentation,and many word segmentation tools have better word segmentation effects.However,different word segmentation methods have different results for the same data text set.It is difficult to determine which one is the best.This dissertation uses knowledge graph to calculate the sum of similarity distances between word vectors in other word segmentation results and other word vectors,and uses the smallest distance as the best word segmentation result to improve the accuracy of word segmentation results.2.For feature extraction,although the common feature extraction methods have good extraction effects,they often ignore the feature words which play an important role in a segment but are not import in the full text.To a certain extent,the feature extraction is incomplete.This dissertation uses knowledge graph to calculate the semantic similarity between feature words,and then selects the top K feature words with the largest semantic similarity scores as feature vectors,which improves the comprehensiveness of feature selection and improves the accuracy of feature selection.3.There are many kinds of classification algorithms for classification,and the effects are also very different.This dissertation uses knowledge graph to calculate the concept distance between word vectors to improve the KNN classification algorithm,mainly to improve the measurement of similar distances.The concept of distance between texts is added on the basis of distance,which can better reflect the similar distance between word vectors and improve the effect of text classification.The dissertation is also tested using Chinese and English data sets,for the experimental results of our improved algorithm,this dissertation uses a unified indicator to compare the experimental results of different methods,these indicators include accuracy(P),recall rate(F)and F measure(F)three indicators,through Experiments have proved the effectiveness of our improved algorithm.At the same time,it also proves that the improved algorithm has a significant improvement over other algorithms.
Keywords/Search Tags:Participle, Feature Extraction, Text Classification
PDF Full Text Request
Related items