DSpace Repository

A MULTI-DOMAIN FAKE NEWS DETECTION MODEL USING BERT AND DistilBERT

Show simple item record

dc.contributor.author Sreethu, P R
dc.contributor.author Sumod, Sundar
dc.date.accessioned 2022-10-14T09:49:37Z
dc.date.available 2022-10-14T09:49:37Z
dc.date.issued 2022-07
dc.identifier.uri http://210.212.227.212:8080/xmlui/handle/123456789/220
dc.description.abstract The popularity and usage of online media platforms are increasing day by day and the dissemination of data is rapidly raised. The rise of social networks has accelerated the dissemination of rumors, satires, and false information, increase in the distribution of fake news. So the identification of such news as real or fake is an important task in digital life. The fake news may be on different domains such as political domain, entertainment domain, sports domain, etc. Various studies regarding machine learning and deep learning algorithms are found in the literature. Generalizing a learning model by identifying patterns in a text will help to differentiate fake news from the real one. Fake news detection using BERT and LSTM techniques is the most competitive study happening now. A model is proposed using BERT and DistliBERT to detect fake news on multiple domains and the performance is compared with Naive Bayes, Decision Tree, Random Forest, Logistic Regression and SVM classifiers. It is evaluated using the datasets: the Twitter dataset, ISOT dataset, LIAR dataset, and Kaggle dataset. BERT is a widely used pre-trained transformer model for various Natural Language Processing applications. Pre-training and Fine tuning are the two tasks carried out by BERT. Pre-training includes named Masked Language Model (MLM) and Next Sentence Prediction (NSP), these are train on simultaneously. The pre-training task improves the performance of BERT model. BERT is an encoder stack, so the outputs are some vectors. The output vectors are given to a fully connected layer. The number of neurons in the layer should be equal to the number of tokens in the vocabulary. Softmax activation is used to convert a word vector to a distribution. DistilBERT is a distilled model of BERT used to reduce the training time and memory size. The BERT model obtained an accuracy of 94.8%, 100% and 99.89% on the Twitter, ISOT, and the Kaggle datasets respectively; DistilBERT obtained an accuracy of 78.68% on the LIAR dataset. en_US
dc.language.iso en en_US
dc.relation.ispartofseries ;TKM20MEAI13
dc.title A MULTI-DOMAIN FAKE NEWS DETECTION MODEL USING BERT AND DistilBERT en_US
dc.type Technical Report en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account