Skip to content

Commit

Permalink
Add authentication layer to ASReview (asreview#1054)
Browse files Browse the repository at this point in the history
Co-authored-by: Yongchao Terry Ma <[email protected]>
Co-authored-by: cskaandorp <[email protected]>
Co-authored-by: PeterLombaers <[email protected]>
  • Loading branch information
4 people authored Apr 13, 2023
1 parent d606288 commit da3042e
Show file tree
Hide file tree
Showing 125 changed files with 10,610 additions and 2,202 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,9 @@ pythonenv*
.spyderproject
.spyproject

# VS Code
.vscode

# Rope project settings
.ropeproject

Expand Down Expand Up @@ -160,3 +163,6 @@ bower_components
psd
thumb
sketch
notes.md

flask_config.json
110 changes: 106 additions & 4 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ Install the ASReview package

Start the Python API server with the Flask development environment

export FLASK_ENV=development
export FLASK_DEBUG=1
asreview lab

For Windows, use

set FLASK_ENV=development
set FLASK_DEBUG=1
asreview lab

### Front end
Expand All @@ -43,12 +43,12 @@ Install both [npm][1] and Python

Start the Python API server with the Flask development environment. Before the front end development can be started, the back end has to run as well

export FLASK_ENV=development
export FLASK_DEBUG=1
asreview lab

For Windows, use

set FLASK_ENV=development
set FLASK_DEBUG=1
asreview lab

Navigate to `asreview/webapp` and install the front end application with npm
Expand Down Expand Up @@ -76,6 +76,108 @@ npx prettier --write .
[1]: https://www.npmjs.com/get-npm
[2]: https://reactjs.org/

## Authentication

It is possible to run ASReview with authentication, enabling multiple users to run their
projects in their own separate workspaces. Authentication requires the storage of user
accounts and link these accounts to projects. Currently we are using a small SQLite
database (asreview.development.sqlite or asreview.production.sqlite) in the ASReview
folder to store that information.

### Bare bones authentication

Using authentication imposes more configuration. Let's start with running a bare bones
authenticated version of the application from the CLI:
```
$ python3 -m asreview lab --enable-auth --secret-key=<secret key> --salt=<salt>
```
where `--enable-auth` forces the application to run in an authenticated mode,
`<secret key>` is a string that is used for encrypting cookies and `<salt>` is
a string that is used to hash passwords.

This bare bones application only allows an administrator to create user accounts by
editing the database without the use of the ASReview application! To facilitate this,
one could use the User model that can be found in `/asreview/webapp/authentication/models.py`. Note that with this simple configuration it is not possible for a user to change forgotten passwords without the assistance of the administrator.

### Full configuration

To configure the authentication in more detail we need to create a JSON file
that contains all authentication parameters. The keys in that JSON file will override any parameter that was passed in the CLI. Here's an example:
```
{
"DEBUG": true,
"AUTHENTICATION_ENABLED": true,
"SECRET_KEY": "<secret key>",
"SECURITY_PASSWORD_SALT": "<salt>",
"SESSION_COOKIE_SECURE": true,
"REMEMBER_COOKIE_SECURE": true,
"SESSION_COOKIE_SAMESITE": "Lax",
"SQLALCHEMY_TRACK_MODIFICATIONS": true,
"ALLOW_ACCOUNT_CREATION": true,
"EMAIL_VERIFICATION": true,
"EMAIL_CONFIG": {
"SERVER": "<smtp-server>",
"PORT": <smpt-server-port>,
"USERNAME": "<smtp-server-username>",
"PASSWORD": "<smtp-server-password>",
"USE_TLS": false,
"USE_SSL": true,
"REPLY_ADDRESS": "<preferred reply email address>"
},
"OAUTH": {
"GitHub": {
"AUTHORIZATION_URL": "https://github.com/login/oauth/authorize",
"TOKEN_URL": "https://github.com/login/oauth/access_token",
"CLIENT_ID": "<GitHub client ID>",
"CLIENT_SECRET": "<GitHub client secret>",
"SCOPE": ""
},
"Orcid": {
"AUTHORIZATION_URL": "https://sandbox.orcid.org/oauth/authorize",
"TOKEN_URL": "https://sandbox.orcid.org/oauth/token",
"CLIENT_ID": "<Orcid client ID>",
"CLIENT_SECRET": "<Orcid client secret>",
"SCOPE": "/authenticate"
},
"Google": {
"AUTHORIZATION_URL": "https://accounts.google.com/o/oauth2/auth",
"TOKEN_URL": "https://oauth2.googleapis.com/token",
"CLIENT_ID": "<Google client ID>",
"CLIENT_SECRET": "<Google client secret>",
"SCOPE": "profile email"
}
}
}
```
Store the JSON file on the server and start the ASReview application from the CLI with the
`--flask-configfile` parameter:
```
$ python3 -m asreview lab --flask-configfile=<path-to-JSON-config-file>
```
A number of the keys in the JSON file are standard Flask parameters. The keys that are specific for authenticating ASReview

pare summarised below:
* AUTHENTICATION_ENABLED: if set to `true` the application will start with authentication enabled. If the SQLite database does not exist, one will be created during startup.
* SECRET_KEY: the secret key is a string that is used to encrypt cookies and is mandatory if authentication is required.
* SECURITY_PASSWORD_SALT: another string used to hash passwords, also mandatory if authentication is required.
* ALLOW_ACCOUNT_CREATION: enables account creation by users, either by front- or backend.
* EMAIL_VERIFICATION: used in conjunction with ALLOW_ACCOUNT_CREATION. If set to `true` the system sends a verification email after account creation. Only relevant if the account is __not__ created by OAuth. This parameter can be omitted if you don't want verification.
* EMAIL_CONFIG: configuration of the SMTP email server that is used for email verification. It also allows users to retrieve a new password after forgetting it. Don't forget to enter the reply address (REPLY_ADDRESS) of your system emails. Omit this parameter if system emails for verification and password retrieval are unwanted.
* OAUTH: an authenticated ASReview application may integrate with the OAuth functionality of Github, Orcid and Google. Provide the necessary OAuth login credentails (for [Github](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/creating-an-oauth-app), [Orcid](https://info.orcid.org/documentation/api-tutorials/api-tutorial-get-and-authenticated-orcid-id/) en [Google](https://support.google.com/cloud/answer/6158849?hl=en)). Please note that the AUTHORIZATION_URL and TOKEN_URL of the Orcid entry are sandbox-urls, and thus not to be used in production. Omit this parameter if OAuth is unwanted.

### Converting an unauthenticated application in an authenticated one

At the moment there is a very basic tool to convert your unauthenticated ASReview application into an authenticated one. The following steps sketch a possible approach for the conversion:

1. In the ASReview folder (by default `~/.asreview`) you can find all projects that were created by users in the unauthenticated version. Every sub-folder contains a single project. Make sure you can link those projects to a certain user. In other words: make sure you know which project should be linked to which user.
2. Start the application, preferably with using the config JSON file and setting the ALLOW_ACCOUNT_CREATION to `true`.
3. Use the backend to create user accounts (done with a POST request to `/auth/signup`, see `/asreview/webapp/api/auth.py`). Make sure a full name is provided for every user account. Once done, one could restart the application with ALLOW_ACCOUNT_CREATION set to `False` if account creation by users is undesired.
4. Run the `auth_conversion.py` (root folder) script and follow instructions. The script iterates over all project folders in the ASReview folder and asks which user account has to be associated with it. The script will establish the connection in the SQlite database and rename the project folders accordingly.

TODO@Jonathan @Peter: I have verified this approach. It worked for me but
obviously needs more testing. I don't think it has to grow into a bombproof solution, but should be used as a stepping stone for an admin
with a little bit of Python knowledge who wants to upgrade to an authenticated version. Anyhow: give it a spin: create a couple of projects, rename the folders in the original project_ids and remove from the projects folder. The script should restore all information.

## Documentation

### Sphinx docs
Expand Down
48 changes: 26 additions & 22 deletions asreview/data/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ def load_data(name, *args, **kwargs):
pass

# Could not find dataset, return None.
raise FileNotFoundError(
f"File, URL, or dataset does not exist: '{name}'")
raise FileNotFoundError(f"File, URL, or dataset does not exist: '{name}'")


def _get_filename_from_url(url):
Expand Down Expand Up @@ -129,9 +128,7 @@ class ASReviewData():
"""

def __init__(self,
df=None,
column_spec=None):
def __init__(self, df=None, column_spec=None):
self.df = df
self.prior_idx = np.array([], dtype=int)

Expand Down Expand Up @@ -163,13 +160,15 @@ def hash(self):
str:
SHA1 hash, computed from the titles/abstracts of the dataframe.
"""
if ((len(self.df.index) < 1000 and self.bodies is not None) or
self.texts is None):
if (
len(self.df.index) < 1000 and self.bodies is not None
) or self.texts is None:
texts = " ".join(self.bodies)
else:
texts = " ".join(self.texts)
return hashlib.sha1(" ".join(texts).encode(
encoding='UTF-8', errors='ignore')).hexdigest()
return hashlib.sha1(
" ".join(texts).encode(encoding="UTF-8", errors="ignore")
).hexdigest()

@classmethod
def from_file(cls, fp, reader=None):
Expand Down Expand Up @@ -232,16 +231,19 @@ def record(self, i, by_index=True):

if by_index:
records = [
PaperRecord(**self.df.iloc[j],
column_spec=self.column_spec,
record_id=self.df.index.values[j])
PaperRecord(
**self.df.iloc[j],
column_spec=self.column_spec,
record_id=self.df.index.values[j],
)
for j in index_list
]
else:
records = [
PaperRecord(**self.df.loc[j, :],
record_id=j,
column_spec=self.column_spec) for j in index_list
PaperRecord(
**self.df.loc[j, :], record_id=j, column_spec=self.column_spec
)
for j in index_list
]

if is_iterable(i):
Expand All @@ -259,9 +261,10 @@ def texts(self):
if self.abstract is None:
return self.title

cur_texts = np.array([
self.title[i] + " " + self.abstract[i] for i in range(len(self))
], dtype=object)
cur_texts = np.array(
[self.title[i] + " " + self.abstract[i] for i in range(len(self))],
dtype=object,
)
return cur_texts

@property
Expand Down Expand Up @@ -296,8 +299,7 @@ def notes(self):
@property
def keywords(self):
try:
return self.df[self.column_spec["keywords"]].apply(
convert_keywords).values
return self.df[self.column_spec["keywords"]].apply(convert_keywords).values
except KeyError:
return None

Expand Down Expand Up @@ -420,8 +422,10 @@ def to_file(self, fp, labels=None, ranking=None, writer=None):
best_suffix = suffix

if best_suffix is None:
raise BadFileFormatError(f"Error exporting file {fp}, no capabilities "
"for exporting such a file.")
raise BadFileFormatError(
f"Error exporting file {fp}, no capabilities "
"for exporting such a file."
)

writer = entry_points[best_suffix].load()
writer.write_data(df, fp, labels=labels, ranking=ranking)
Expand Down
9 changes: 5 additions & 4 deletions asreview/entry_points/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def _format_algorithm(values, name, description):

class AlgorithmsEntryPoint(BaseEntryPoint):
"""Entry point to list available algorithms in ASReview LAB."""

description = "Available active learning algorithms for ASReview."

def execute(self, argv):
Expand All @@ -51,28 +52,28 @@ def execute(self, argv):
s += _format_algorithm(
values=list_feature_extraction(),
name="feature_extraction",
description="feature extraction algorithms"
description="feature extraction algorithms",
)

# classifiers
s += _format_algorithm(
values=list_classifiers(),
name="classifiers",
description="classification algorithms"
description="classification algorithms",
)

# query_strategies
s += _format_algorithm(
values=list_query_strategies(),
name="query_strategies",
description="query strategies"
description="query strategies",
)

# balance_strategies
s += _format_algorithm(
values=list_balance_strategies(),
name="balance_strategies",
description="balance strategies"
description="balance strategies",
)

print(s)
8 changes: 3 additions & 5 deletions asreview/entry_points/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,13 @@ def _base_parser(prog=None, description=None):

# parse arguments if available
parser = argparse.ArgumentParser(
prog=prog,
description=description,
formatter_class=RawTextHelpFormatter
prog=prog, description=description, formatter_class=RawTextHelpFormatter
)
parser.add_argument(
"--embedding",
type=str,
default=None,
dest='embedding_fp',
help="File path of embedding matrix. Required for LSTM models."
dest="embedding_fp",
help="File path of embedding matrix. Required for LSTM models.",
)
return parser
Loading

0 comments on commit da3042e

Please sign in to comment.