-
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update the readme to be up to date with the repo content (#19)
Co-authored-by: Rens van de schoot <[email protected]> Co-authored-by: Jonathan de Bruin <[email protected]>
- Loading branch information
1 parent
5b26bbd
commit 1094184
Showing
1 changed file
with
56 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,58 @@ | ||
# Semantic Clusters | ||
Experimental repository aimed at using transformers (such as CovidBERT) and Deep Learning techniques to retrieve and visualize semantic clusters underlying the CORD-19 database. | ||
# ASReview Semantic Clustering | ||
This repository contains the Semantic Clustering plugin for | ||
[ASReview](https://github.com/asreview/asreview). It applies multiple techniques | ||
(SciBert, PCA, T-SNE, KMeans, a custom Cluster Optimizer) to an [ASReview data | ||
object](https://asreview.readthedocs.io/en/latest/API/generated/asreview.data.ASReviewData.html#asreview.data.ASReviewData), | ||
in order to cluster records based on semantic differences. The end result is an | ||
interactive dashboard: | ||
|
||
![Alt Text](/docs/cord19_semantic_clusters.gif) | ||
|
||
## Usage | ||
The usage of the semantic clustering app is found in the main.py file. The | ||
following commands can be run: | ||
|
||
### Processing | ||
```console | ||
python asreviewcontrib\semantic_clustering\main.py -f <url or local file> | ||
python asreviewcontrib\semantic_clustering\main.py --filepath <url or local file> | ||
``` | ||
|
||
The filepath argument starts the processing of a file for clustering. This file | ||
will be saved to the `data` folder after the processing is done. An example of | ||
usage can be: | ||
|
||
```console | ||
python asreviewcontrib\semantic_clustering\main.py -f "https://raw.githubusercontent.com/asreview/systematic-review-datasets/master/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv" | ||
``` | ||
|
||
### Processing testfile | ||
```console | ||
python asreviewcontrib\semantic_clustering\main.py -t | ||
python asreviewcontrib\semantic_clustering\main.py --testfile | ||
``` | ||
|
||
This argument will start the processing file using the [`van_de_Schoot_2017` | ||
dataset](https://asreview.readthedocs.io/en/latest/intro/datasets.html?highlight=ptsd#featured-datasets), | ||
and can be used as a functionality test. | ||
|
||
### Interactive app | ||
```console | ||
python asreviewcontrib\semantic_clustering\main.py -a | ||
python asreviewcontrib\semantic_clustering\main.py --app | ||
``` | ||
|
||
After the processing has finished with either a new file or the test file, a | ||
file called `kmeans_df.csv` has appeared in the data folder. This file can be | ||
used in the interactive app. When the server has been started with the command | ||
above, it can be found at [`http://127.0.0.1:8050/`](http://127.0.0.1:8050/) in | ||
your browser. | ||
|
||
## License | ||
|
||
MIT license | ||
|
||
## Contact | ||
Got ideas for improvement? For any questions or remarks, please send an email to | ||
[[email protected]](mailto:[email protected]). | ||
|