Abstract:
The popularity and usage of online media platforms are increasing day by day and the
dissemination of data is rapidly raised. The rise of social networks has accelerated the
dissemination of rumors, satires, and false information, increase in the distribution of fake
news. So the identification of such news as real or fake is an important task in digital life.
The fake news may be on different domains such as political domain, entertainment domain,
sports domain, etc. Various studies regarding machine learning and deep learning algorithms
are found in the literature. Generalizing a learning model by identifying patterns in a text
will help to differentiate fake news from the real one. Fake news detection using BERT and
LSTM techniques is the most competitive study happening now. A model is proposed using
BERT and DistliBERT to detect fake news on multiple domains and the performance is
compared with Naive Bayes, Decision Tree, Random Forest, Logistic Regression and SVM
classifiers. It is evaluated using the datasets: the Twitter dataset, ISOT dataset, LIAR
dataset, and Kaggle dataset. BERT is a widely used pre-trained transformer model for
various Natural Language Processing applications. Pre-training and Fine tuning are the two
tasks carried out by BERT. Pre-training includes named Masked Language Model (MLM)
and Next Sentence Prediction (NSP), these are train on simultaneously. The pre-training
task improves the performance of BERT model. BERT is an encoder stack, so the outputs
are some vectors. The output vectors are given to a fully connected layer. The number of
neurons in the layer should be equal to the number of tokens in the vocabulary. Softmax
activation is used to convert a word vector to a distribution. DistilBERT is a distilled model
of BERT used to reduce the training time and memory size. The BERT model obtained
an accuracy of 94.8%, 100% and 99.89% on the Twitter, ISOT, and the Kaggle datasets
respectively; DistilBERT obtained an accuracy of 78.68% on the LIAR dataset.