Research On Text Classification Algorithm Based On Knowledge Graph

Posted on:2019-10-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Pan

Full Text:PDF

GTID:2428330545997429

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the continuous advancement of information technology,the rapid growth of information resources has led to the production of vast amounts of complex data.Among these data,text data occupies about 70%.How to efficiently and accurately retrieve the required information from the huge text data is one of the issues that people need to solve urgently.Therefore,the dissertation focuses on how to effectively and accurately classify large amounts of textual data,thereby greatly improving the efficiency of people's information retrieval.There are many factors influencing the effect of text categorization,including the process of word segmentation,the process of feature extraction,and the classification process.This dissertation introduction the technology of knowledge graph for the research and improvement of these three processes:1.There are many traditional word segmentation tools for word segmentation,and many word segmentation tools have better word segmentation effects.However,different word segmentation methods have different results for the same data text set.It is difficult to determine which one is the best.This dissertation uses knowledge graph to calculate the sum of similarity distances between word vectors in other word segmentation results and other word vectors,and uses the smallest distance as the best word segmentation result to improve the accuracy of word segmentation results.2.For feature extraction,although the common feature extraction methods have good extraction effects,they often ignore the feature words which play an important role in a segment but are not import in the full text.To a certain extent,the feature extraction is incomplete.This dissertation uses knowledge graph to calculate the semantic similarity between feature words,and then selects the top K feature words with the largest semantic similarity scores as feature vectors,which improves the comprehensiveness of feature selection and improves the accuracy of feature selection.3.There are many kinds of classification algorithms for classification,and the effects are also very different.This dissertation uses knowledge graph to calculate the concept distance between word vectors to improve the KNN classification algorithm,mainly to improve the measurement of similar distances.The concept of distance between texts is added on the basis of distance,which can better reflect the similar distance between word vectors and improve the effect of text classification.The dissertation is also tested using Chinese and English data sets,for the experimental results of our improved algorithm,this dissertation uses a unified indicator to compare the experimental results of different methods,these indicators include accuracy(P),recall rate(F)and F measure(F)three indicators,through Experiments have proved the effectiveness of our improved algorithm.At the same time,it also proves that the improved algorithm has a significant improvement over other algorithms.

Keywords/Search Tags:

Participle, Feature Extraction, Text Classification

PDF Full Text Request

Related items

1	The Research Of Chinese Web Text Orientation Classification
2	Research On Feature Extraction And Classification Algorithm In Text Categorization
3	Design And Implementation Of Text Classification Model Based On The Improved TF-IDF Feature Extraction
4	Reasearch On Text Classification In The Application Of Customer Complaint Prediction Of Operator
5	Design And Implementation Of Short Message Classification System Based On Naive Bayesian
6	Chinese Text Feature Extraction And Classification Based On The Semantics Association
7	Research On Topic Feature Extraction And Text Classification In Social Internet Community
8	Research And Application Of Talent Job Online Matching Based On Text Feature Extraction Technology
9	A Research On Feature Extraction Applied For Text Classification
10	Research On Some Problems In Text Classification