Skip to content

Commit

Permalink
Improve data handling API and various other minor changes (#30)
Browse files Browse the repository at this point in the history
  • Loading branch information
jteijema authored Nov 22, 2021
1 parent fddacdd commit 799d676
Show file tree
Hide file tree
Showing 6 changed files with 87 additions and 350 deletions.
24 changes: 5 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,8 @@ asreview semantic_clustering --help
Other options are:

```shell
asreview semantic_clustering -f <input.csv or url> -o <output.csv>
asreview semantic_clustering --filepath <input.csv or url> --output <output.csv>
```

```shell
asreview semantic_clustering -t -o <output.csv>
asreview semantic_clustering --testfile --output <output.csv>
asreview semantic_clustering -f <input> -o <output.csv>
asreview semantic_clustering --filepath <input> --output <output.csv>
```

```shell
Expand Down Expand Up @@ -78,32 +73,23 @@ Using `-f` will process a file and store the results in the file specified in

Semantic_clustering uses an [`ASReviewData`
object](https://asreview.readthedocs.io/en/latest/API/generated/asreview.data.ASReviewData.html#asreview.data.ASReviewData),
and can handle either a file or url:
and can handle files, urls and benchmark sets:

```shell
asreview semantic_clustering -f "https://raw.githubusercontent.com/asreview/systematic-review-datasets/master/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv" -o output.csv
asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o output.csv
asreview semantic_clustering -f van_de_Schoot_2017.csv -o output.csv
```

If an output file is not specified, `output.csv` is used as output file name.

### Test file
```shell
asreview semantic_clustering -t -o <output_file.csv>
```

Using `-t` instead of `-f` uses the
[`van_de_Schoot_2017`](https://asreview.readthedocs.io/en/latest/intro/datasets.html#featured-datasets)
dataset as input file. This way, the plugin can easily be tested.

### Transformer
Semantic Clustering uses the
[`allenai/scibert_scivocab_uncased`](https://github.com/allenai/scibert)
transformer model as default setting. Using the `--transformer <model>` option,
another model can be selected for use instead:

```shell
asreview semantic_clustering -t -o <output_file.csv> --transformer bert-base-uncased
asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o <output_file.csv> --transformer bert-base-uncased
```

Any pretrained model will work.
Expand Down
82 changes: 0 additions & 82 deletions asreviewcontrib/semantic_clustering/clustering.py

This file was deleted.

154 changes: 0 additions & 154 deletions asreviewcontrib/semantic_clustering/dim_reduct.py

This file was deleted.

37 changes: 30 additions & 7 deletions asreviewcontrib/semantic_clustering/interactive.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,24 @@ def run_app(filepath):
# Read as STR for discrete colormap
df['cluster_id'] = df['cluster_id'].astype(str)

# Set 'cluster_id' to max if 'inclusion_label' == 1
for row in df.itertuples():
if row.included == 1:
df.at[row.Index, 'cluster_id'] = 'included'

# Show main figure
fig = px.scatter(df, x="x", y="y", color="cluster_id",
color_discrete_sequence=px.colors.qualitative.Set1)
fig = px.scatter(df,
x="x",
y="y",
color="cluster_id",
color_discrete_sequence=px.colors.qualitative.Light24)
fig.update_layout(dragmode="pan")
fig.update_layout(legend={'traceorder': 'normal'},
plot_bgcolor='rgba(0,0,0,0.35)',
height=400,)
fig.update_layout(xaxis=dict(showticklabels=False, title=""),
yaxis=dict(showticklabels=False, ticks="", title=""))

config = dict(
{'scrollZoom': True,
'displayModeBar': False,
Expand All @@ -46,8 +58,11 @@ def run_app(filepath):

# Main semantic cluster graph
html.Div([
dcc.Graph(figure=fig, id="cluster-div", config=config)
], className="six columns"),
dcc.Graph(figure=fig, id="cluster-div", config=config,
style={'width': '100%',
'height': '100%'
},)
], className="six columns", style={'height': '80%'}),

# Div for abstract window
html.Div([
Expand All @@ -56,13 +71,21 @@ def run_app(filepath):
readOnly=True,
placeholder='Enter a value...',
value='This is a TextArea component',
style={'width': '100%', 'height': '300px'},
style={'width': '98%', 'height': '389px'},
id="abstract-div"
)
], className="six columns"),

], className="row"),
])
], className="row", style={'height': '100%'}),
], style={'backgroundColor': 'rgba(0,0,0,0.1)',
'position': 'fixed',
'width': '100%',
'height': '100%',
'top': '0',
'left': '0',
'z-index': '10',
'padding': '10px'
})

# Allow global css - use chriddyp's time-tested external css
app.css.config.serve_locally = False
Expand Down
Loading

0 comments on commit 799d676

Please sign in to comment.