EXMULF is an innovative system designed to detect fake news by analyzing both the textual content and associated images. It combines the power of explainable AI techniques and multimodal data analysis to provide veracity analysis. This repository provides the source code and resources to replicate the experiments presented in the EXMULF paper.
Key features:
- Multimodal Analysis: Combines text and image analysis using Latent Dirichlet Allocation (LDA) for topic modeling and Vision-and-Language BERT (VilBERT) for extracting contextual relationships.
- Explainability: Uses Local Interpretable Model-agnostic Explanations (LIME) to provide users with explanations for the system’s decision-making process.
- Datasets: Tested on two real-world multimodal datasets (i.e., Weibo and Twitter) to ensure accuracy and effectiveness.
If you use this system in your research, please cite our paper:
Amri, S., Sallami, D., Aïmeur, E. (2022). EXMULF: An Explainable Multimodal Content-Based Fake News Detection System. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2021. Lecture Notes in Computer Science, vol 13291. Springer, Cham. https://doi.org/10.1007/978-3-031-08147-7_12.
- LIME_Text.ipynb - Contains the code for explainable text-based fake news detection using the LIME library.
- LIME_Image.ipynb - Contains the code for explainable image-based fake news detection using the LIME library.
- Topic_Modeling_for_Text.ipynb - Demonstrates text topic modeling using Latent Dirichlet Allocation (LDA).
- Topic_Modeling_for_Images.ipynb - Demonstrates image topic modeling for fake news detection.
- VilBERT.ipynb - Implements the VilBERT model for vision-and-language-based analysis.
- Python 3.x
- Jupyter Notebook
- Required Libraries:
- LIME
- Pytorch
- scikit-learn
- nltk
- OpenCV
- Vision-and-Language BERT (VilBERT)
To run the experiments, clone the repository and execute the relevant Jupyter notebooks. Each notebook corresponds to different aspects of the system (text-based, image-based, multimodal, or explainability). For a complete fake news detection pipeline:
- Text Analysis: Run
LIME_Text.ipynb
andTopic_Modeling_for_Text.ipynb
to perform explainable text-based fake news detection. - Image Analysis: Run
LIME_Image.ipynb
andTopic_Modeling_for_Images.ipynb
to perform explainable image-based fake news detection. - Multimodal Analysis: Use
VilBERT.ipynb
to combine both text and image analysis for a comprehensive fake news detection approach.
-
Clone the Repository:
git clone https://github.com/dorsafsallami/EXMULF.git
cd EXMULF
-
Launch Jupyter Notebook:
jupyter notebook
Open the notebook of your choice (LIME_Text.ipynb, LIME_Image.ipynb, etc.) and execute the cells. -
Input Data:
Ensure the dataset is placed in the correct folder. For text analysis, you can modify the dataset path in the corresponding notebook.\ For image analysis, do the same in theLIME_Image.ipynb
. -
Run the Notebooks:
Execute all cells sequentially in each notebook to get the results and explanations for the fake news detection.
The EXMULF system was evaluated on two real-world datasets: Twitter and Weibo, for both text-based and image-based fake news detection. It demonstrated superior performance in terms of accuracy, precision, recall, and F1-score when compared with state-of-the-art models. The results highlight the effectiveness of the multimodal approach integrating both text and image analysis with explainability provided by LIME.
Below are the classification results for fake news detection across different models for both datasets:
Dataset | Model | Accuracy | Fake News | Real News | |||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | |||
Text Only | |||||||
BERT(_{ft}) | 0.572 | 0.602 | 0.586 | 0.597 | 0.543 | 0.553 | |
BERT(_{ft+TT}) | 0.577 | 0.612 | 0.574 | 0.598 | 0.551 | 0.564 | |
Image Only | |||||||
ResNet-34 | 0.624 | 0.712 | 0.567 | 0.6 | 0.558 | 0.72 | |
VGG-19 | 0.596 | 0.698 | 0.522 | 0.593 | 0.531 | 0.698 | |
Multimodal | |||||||
Fusion | 0.7695 | 0.820 | 0.726 | 0.779 | 0.719 | 0.798 | |
SpotFake | 0.7777 | 0.751 | 0.900 | 0.82 | 0.832 | 0.606 | |
AMFB | 0.883 | 0.89 | 0.95 | 0.92 | 0.87 | 0.76 | |
HMCAN | 0.897 | 0.971 | 0.81 | 0.926 | 0.853 | 0.979 | |
BDANN | 0.830 | 0.810 | 0.580 | 0.710 | 0.830 | 0.930 | |
VilBERT | 0.898 | 0.934 | 0.92 | 0.926 | 0.859 | 0.88 |
Dataset | Model | Accuracy | Fake News | Real News | |||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | |||
Text Only | |||||||
BERT(_{ft}) | 0.680 | 0.731 | 0.712 | 0.709 | 0.667 | 0.646 | |
BERT(_{ft+TT}) | 0.682 | 0.739 | 0.72 | 0.711 | 0.672 | 0.684 | |
Image Only | |||||||
ResNet-34 | 0.694 | 0.701 | 0.604 | 0.698 | 0.671 | 0.711 | |
VGG-19 | 0.633 | 0.640 | 0.635 | 0.637 | 0.637 | 0.641 | |
Multimodal | |||||||
Fusion | 0.8152 | 0.865 | 0.774 | 0.88 | 0.764 | 0.889 | |
SpotFake | 0.8293 | 0.802 | 0.964 | 0.932 | 0.847 | 0.876 | |
AMFB | 0.832 | 0.82 | 0.86 | 0.84 | 0.85 | 0.81 | |
FND-SCTI | 0.829 | 0.863 | 0.78 | 0.815 | 0.920 | 0.835 | |
HMCAN | 0.885 | 0.902 | 0.875 | 0.926 | 0.856 | 0.926 | |
BDANN | 0.482 | 0.820 | 0.67 | 0.850 | 0.80 | 0.85 | |
VilBERT | 0.9204 | 0.946 | 0.948 | 0.946 | 0.879 | 0.893 |
- Best Overall Model: VilBERT showed the best performance across both datasets for both fake and real news detection, especially in terms of precision and recall.
- Multimodal Performance: Multimodal approaches consistently outperformed text-only and image-only approaches.
These results indicate the robustness of the EXMULF system and its explainability-enhanced detection capabilities.