| Humans have been exchanging news since they were able to communicate.While the news is expected to be presented accurately and objectively,fake news is sometimes present in news broadcasting.People who carry news may subtract or add other information intentionally,while some intend to spread fake news for a harmful purpose.Fake news sometimes consists of provocative titles,humorous stories,dramatic stories,or exaggerated content to attract readers.However,people should be aware of fake news since it has many adverse effects.Some fake news results in human deaths or adversely affects national stability.Hence,this dissertation is focused on proposing artificial intelligence approaches to detect fake news.This dissertation applied pre-training,deep learning,text augmentation,and topic modeling techniques for fake news detection tasks.Artificial intelligence was utilized to capture the content’s writing style based on lexicon,syntax,semantics,and discourse to distinguish fake news from true news.The following approaches were used to enhance the classification performance.1)text augmentation and graph neural networks on the word-word and word-document nodes.2)transfer learning to examine the relationship between words.3)a model based on multi-grained tokenization and shorteningupsampling to enhance performance and reduce resource usage.And 4)a system that arranges the training dataset using a topic-based model and then generates counterfeit text.In this dissertation,the first study proposed augmentation and heterogeneous graph neural networks(GNN): graph convolutional network(GCN),graph attention network(GAT),and Graph SAGE(SAmple and Aggregate).The input of the system utilized Easy Data Augmentation(EDA)(random deletion(RD),random insertion(RI),random swap(RS),and synonym replacement(SR))techniques to generate more training data with some limitations.Varying words in training text will enrich the vocabulary in the dataset and produce more diverse vectors.GNN models use token connectivity for edges and nodes in the graph as a data source,both for word-word-edges and document-word-edges.GCN,GAT,and graph SAGE are three different approaches to learning graph-structured data.GCN uses convolutional operations on graphs to learn representations of the nodes and edges in a graph.GAT uses attention-based mechanisms to learn node representations,while graph SAGE uses a combination of pooling and message passing to learn node representations.Second,a hybrid neural network-based model was proposed.It consists of three parts: Bidirectional GRU or Bi GRU,Attention(CNN)layer,and Bi GRU-CRF layer.This hybrid-based aims to enhance the capture capability in a single model by optimizing pretraining output(BERT and GPT2)and utilizing the attention layer in the model.BERT and GPT2 are transformers-based language models.The models have high competitiveness.Models using BERT yields a better result than those using GPT2 in this experiment.Third,this paper proposes a multi-grained tokenizer for preprocessing and constructed a model that has shortening-upsampling layers.The purpose of multi-grained is to get more perspective on the input process.By using multi perspectives,features or vectors can be extracted thoroughly.Multi-grained is combined from a fine-grained and a coarse-grained tokenizer.Fine-grained is directly from BERT Tokenizer,while coarsegrained is processed using Named Entity Recognition(NER)+BERT Tokenizer.The attentive features are obtained by max-pooling the coarse-grained and fine-grained output.The model used shortening-upsampling,extracting some important parts into smaller layers and then propagating context information into dense layers.These approaches were tested using four fake news datasets.And it proved the proposed method improved classification performance.Fourth,topic-based techniques are proposed.The methods utilize topic-based and generated counterfeit text using Latent Dirichlet Allocation(LDA),Bidirectional and Auto-Regressive Transformer(BART),and Cosine Documents Similarity.The name of this approach is Topic-Based BART Counterfeit Generator(TB-BCG).The purposes are to reduce resource usage during training,select training datasets,and generate different words to improve model performance.The models selected training datasets based on topics to get the most impacted data and then generated counterfeit text using BART and cosine similarity to add more training data.The proposed method was performed well using LSTM,CNN,Bi GRU-Attention-Caps NET-(Bi GRU-CRF),and BERT.The Covid-19 fake news detection dataset was compared across all approaches(accuracy and F1-score).TB-BCG had the highest accuracy and F1-scores.The TB-BCG method success augmented training data to boost model performance.The technique that gains the second highest performance is Multi-grained with Shortening-Upsampling Layers.This method successfully makes the input model more varied by using two tokenizers.Also,the model can extract essential text features and reconstruct them in a more profound form.The method with the lowest accuracy is GPT2-Bi GRU-AttentionCaps NET-(Bi GRU-CRF).Although the model is a hybrid model but still cannot compete with other methods. |