Research On Fast And Precise Classification Algorithm Of Long Text Based On FastText

Posted on:2019-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Li

Full Text:PDF

GTID:2428330548479767

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the Times,people's lifestyle has a huge change,and the digital reading is becoming more and more popular.But the great mass of electronic books is classified by their autho,so these e-books can be classified with the strong subjectivity.So,we need to find an automatic long text classification way to handle the matter.Most of the traditional classification method based on bag-of-word model and TF-IDF features.These methods will have good effect for some data sets.But the two ways only care about word meaning and word frequency information,and they do not consider semantic,grammar and word order.So,bow and TF-IDF will lose a lot of useful information and they cannot have ideal classification.Therefore,we studies the text representation method and the text classification method,and we provide a fast and precise classification algorithm of long text based on FastText.The specific research contents are as follows:(1)We study the text preprocessing method,the main text representation and several important traditional machine learning algorithms.And in this paper,we extracted feature vector based on the BM25 algorithm,and conducted the related experiments and analysed the result data.Experiments show that the classification model based on SVM algorithm has better classification effect in other training model.(2)We proposed a method to extracted the key statement block.Long text size is relatively large,the training model and predict tend to spend more time,so we consider to compress the size of a long text.Considering extracting keywords from the long text will lose a lot of semantic information,so we extract the key statements,and make the key statements consolidation and eliminated.So a long text e-book will be divided into multiple short blocks,and we can eliminate redundant and useless information and improve the model's classification speed.(3)We proposed a FastText algorithm based on key statement blocks.This model use key block in training and predicting phase,and can classify samples in short time by a high concurrency way,and get the final result by voting.This algorithm will greatly improve the classification speed.This paper has verified that the algorithm.Result show the algorithm is faster than the original FastText algorithm,and the classification accuracy is about 76%,which is better than most traditional machine learning algorithms.(4)Finally,a fast and precise classification model based on FastText is proposed.The model based on the 3)and introduce the error correction based on naive bayes and key blocks weighted module.The experiment showing that the classification accuracy of the classification model is about 80.4%,therefore,it has a better classification.

Keywords/Search Tags:

FastText, Long text, Naive Bayes, NLP

PDF Full Text Request

Related items

1	Text Categorization Based On Naive Bayes Method
2	A Text Classifier About High Blood Pressure Based On Naive Bayes
3	Research On Text Classification Algorithm Based On Naive Bayes Method
4	The Study Of Naive Bayes Text Classification System Based On Artificial Intelligence
5	Data Mining Systems And Their Applications - Improve The Performance Of The Naive Bayes Text Classifier, Associated Characteristics
6	Research On Spam Text Classification Based On Improved Naive Bayes Algorithm
7	Research On Long Text Abstract Generation Based On Deep Learning
8	Correlation Between The Text Classification. Word
9	Text Classification Algorithm Research Based On Naive Bayes
10	Incremental Learning Of Naive Bayes Chinese Classification System