Skip to content

Commit

Permalink
Update the readme to be up to date with the repo content (#19)
Browse files Browse the repository at this point in the history
Co-authored-by: Rens van de schoot <[email protected]>
Co-authored-by: Jonathan de Bruin <[email protected]>
  • Loading branch information
3 people authored Nov 11, 2021
1 parent 5b26bbd commit 1094184
Showing 1 changed file with 56 additions and 2 deletions.
58 changes: 56 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,58 @@
# Semantic Clusters
Experimental repository aimed at using transformers (such as CovidBERT) and Deep Learning techniques to retrieve and visualize semantic clusters underlying the CORD-19 database.
# ASReview Semantic Clustering
This repository contains the Semantic Clustering plugin for
[ASReview](https://github.com/asreview/asreview). It applies multiple techniques
(SciBert, PCA, T-SNE, KMeans, a custom Cluster Optimizer) to an [ASReview data
object](https://asreview.readthedocs.io/en/latest/API/generated/asreview.data.ASReviewData.html#asreview.data.ASReviewData),
in order to cluster records based on semantic differences. The end result is an
interactive dashboard:

![Alt Text](/docs/cord19_semantic_clusters.gif)

## Usage
The usage of the semantic clustering app is found in the main.py file. The
following commands can be run:

### Processing
```console
python asreviewcontrib\semantic_clustering\main.py -f <url or local file>
python asreviewcontrib\semantic_clustering\main.py --filepath <url or local file>
```

The filepath argument starts the processing of a file for clustering. This file
will be saved to the `data` folder after the processing is done. An example of
usage can be:

```console
python asreviewcontrib\semantic_clustering\main.py -f "https://raw.githubusercontent.com/asreview/systematic-review-datasets/master/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv"
```

### Processing testfile
```console
python asreviewcontrib\semantic_clustering\main.py -t
python asreviewcontrib\semantic_clustering\main.py --testfile
```

This argument will start the processing file using the [`van_de_Schoot_2017`
dataset](https://asreview.readthedocs.io/en/latest/intro/datasets.html?highlight=ptsd#featured-datasets),
and can be used as a functionality test.

### Interactive app
```console
python asreviewcontrib\semantic_clustering\main.py -a
python asreviewcontrib\semantic_clustering\main.py --app
```

After the processing has finished with either a new file or the test file, a
file called `kmeans_df.csv` has appeared in the data folder. This file can be
used in the interactive app. When the server has been started with the command
above, it can be found at [`http://127.0.0.1:8050/`](http://127.0.0.1:8050/) in
your browser.

## License

MIT license

## Contact
Got ideas for improvement? For any questions or remarks, please send an email to
[[email protected]](mailto:[email protected]).

0 comments on commit 1094184

Please sign in to comment.