| With the development of the Times,people's lifestyle has a huge change,and the digital reading is becoming more and more popular.But the great mass of electronic books is classified by their autho,so these e-books can be classified with the strong subjectivity.So,we need to find an automatic long text classification way to handle the matter.Most of the traditional classification method based on bag-of-word model and TF-IDF features.These methods will have good effect for some data sets.But the two ways only care about word meaning and word frequency information,and they do not consider semantic,grammar and word order.So,bow and TF-IDF will lose a lot of useful information and they cannot have ideal classification.Therefore,we studies the text representation method and the text classification method,and we provide a fast and precise classification algorithm of long text based on FastText.The specific research contents are as follows:(1)We study the text preprocessing method,the main text representation and several important traditional machine learning algorithms.And in this paper,we extracted feature vector based on the BM25 algorithm,and conducted the related experiments and analysed the result data.Experiments show that the classification model based on SVM algorithm has better classification effect in other training model.(2)We proposed a method to extracted the key statement block.Long text size is relatively large,the training model and predict tend to spend more time,so we consider to compress the size of a long text.Considering extracting keywords from the long text will lose a lot of semantic information,so we extract the key statements,and make the key statements consolidation and eliminated.So a long text e-book will be divided into multiple short blocks,and we can eliminate redundant and useless information and improve the model's classification speed.(3)We proposed a FastText algorithm based on key statement blocks.This model use key block in training and predicting phase,and can classify samples in short time by a high concurrency way,and get the final result by voting.This algorithm will greatly improve the classification speed.This paper has verified that the algorithm.Result show the algorithm is faster than the original FastText algorithm,and the classification accuracy is about 76%,which is better than most traditional machine learning algorithms.(4)Finally,a fast and precise classification model based on FastText is proposed.The model based on the 3)and introduce the error correction based on naive bayes and key blocks weighted module.The experiment showing that the classification accuracy of the classification model is about 80.4%,therefore,it has a better classification. |