Font Size: a A A

Study Of The Natural Language Processing Based On Machine Learning Algorithms

Posted on:2021-10-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Saqib AlamFull Text:PDF
GTID:1488306314498984Subject:Computer & Application Technology
Abstract/Summary:PDF Full Text Request
Text mining(TM)refers to the process of exploring and analyzing large amounts of unstructured text data.In TM,we can utilize various machine learning techniques to identify potentially useful concepts,patterns,topics,and keywords in the data.In our first work,we focused on the changes of lexicons during different centuries with the help of which we can identify the age of a specific text from which century does the script belong?We proposed a Term Pace Calculating Model(TPCM)that calculates the velocity of a lexicon along with the structural variations in a dictionary.TPCM is composed of four modules,Term Relevance Counting(TRC),Inverse Average Fragment(IAF),Fragment Lexicon Weighted Comparison(FLWC),and Term Pace(TeP).The first module extracts the most common words;the second part IAF distinguishes that either a term is credible or noncredible.Tep determines the expeditious words change,as well as in which century these changes occur.The specified FLWC technique discovers the changes of lexicons from time to time,i.e.,15th,16th,17th,18th,and 19th centuries.Moreover,we proposed a model that identifies the age/century of a specific text,the model based on an unsupervised algorithm Lifelong Text Extractor(LTE).LTE is an innovative approach that analyzes the textual data and ascertains the lifetime of the predicted text.LTE assigns topics to each input textual data based on the frequencies of the words.Furthermore,the proposed algorithm differentiates the similarity between different documents.Unlike other frameworks,the suggested framework benefits multiple data sets;it is boundless to training a specific dataset.Besides our first and second work,a fusion complex classification algorithms(FCCAs)proposed for sentiment analysis,that improves the classification accuracy.FCCAs are the hybrid combination of state-of-the-art machine learning algorithms,i.e.,Naive Bayes(NB),Maximum Entropy(MaxE),and Support Vector Machines(SVM).We proposed Composite NB(CNB),Composite MaxE(CMaxE),and Hybrid Huberized Support Vector Machine(HHSVM).In this study,it remained observed that FCCAs had increased the accuracy of sentiment classification.Furthermore,the importance of deep learning(DL)in natural language processing(NLP)boosts the current area.Currently,DL techniques(for example,Graph Convolutional Networks)were used for an initial pre-process of TM(such as text classification)for better results.Text classification or text categorization is an essential and classic issue in the development of natural language processing(NLP).A series of studies have applied convolutional neural networks for classification(convolution in conventional grids,such as sequences).However,only a limited number of studies have explored the activities of the most flexible convolutional neural network graph(non-network convolutions,such as arbitrary graphs).The proposed work was to use graph convolutional networks to classify text.A unique graphical version of the corpus-based on the relationship between words and sentences in a document and proposed a technique Sentence Graph Convolutional Network(SentGCN).In addition to SentGCN,Pointwise Mutual Information of Sentence(PMIS)and Term Frequency and Inverse Document Frequency of a Word(TF—IDFs,d)algorithms have been proposed for the said study.SentGCN is initialized with one-hot representation for sentence,as supervised by the known class labels for documents.The experimental results on multiple benchmark datasets demonstrate that the SentGCN,without any external word embeddings or knowledge,outperforms state-of-the-art methods for sentence classification.
Keywords/Search Tags:Text Mining, Lifelong Text Extractor, Fusion Complex Classification Algorithms, Graph Convolutional Networks
PDF Full Text Request
Related items