Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus Endpoint Tutorial #139

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions docs/services/gpuservice/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ This tutorial teaches users how to submit tasks to the EIDF GPU Service, but it
| [Getting started with Kubernetes](training/L1_getting_started.md) | a. What is Kubernetes?<br>b. How to send a task to a GPU node.<br>c. How to define the GPU resources needed. |
| [Requesting persistent volumes with Kubernetes](training/L2_requesting_persistent_volumes.md) | a. What is a persistent volume? <br>b. How to request a PV resource. |
| [Running a PyTorch task](training/L3_running_a_pytorch_task.md) | a. Accessing a Pytorch container.<br>b. Submitting a PyTorch task to the cluster.<br>c. Inspecting the results. |
| [Setting up a Globus endpoint](training/L4_setting_up_a_globus_endpoint.md) | a. Creating the Service, Volume and Container. <br>b. Creating/Configuring the Endpoint. <br>c. Making the Setup Permanent using PVC. <br>d. Accessing Data on the Persistent Storage. |

## Further Reading and Help

Expand Down
218 changes: 218 additions & 0 deletions docs/services/gpuservice/training/L4_setting_up_a_globus_endpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# Setting up a Globus Endpoint

Before beginning this setup, it is a good idea to make sure you have a [Globus account](https://app.globus.org) set up.

## Creating the Service, Volume and Container

1. Log into your VM, either through the [VDI Portal](https://eidf-vdi.epcc.ed.ac.uk/vdi) or through [SSH](https://epcced.github.io/eidf-docs/access/ssh/).

2. Clone the [globus-connect project from Gitlab](https://gitlab.nrp-nautilus.io/prp/globus-connect.git).

3. Open the globus-connect-volume.yaml file in your text editor of choice, and change the storageClassName from “rook-ceps-block” to “csi-rbd-sc”. The file should now read as:-

``` yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: globus-connect-data
spec:
storageClassName: csi-rbd-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
```

4. Run the following commands, replacing “your-namespace” with your namespace on the machine you’re using:-

``` bash
kubectl create -f globus-connect-service.yaml -n your_namespace
kubectl create -f globus-connect-volume.yaml -n your_namespace
kubectl create -f globus-connect.yaml -n your_namespace
```
You can then check that your pod is running:-

``` bash
kubectl get pods -n your_namespace
```

5. Connect to the pod that you just created:-

``` bash
kubectl exec -it globus-connect-0 -n your_namespace -- bash
```

## Creating/Configuring the Endpoint

1. Change from superuser to a gridftp user:-

``` bash
su - gridftp
```

2. Log into your Globus account;
- Enter the following command:-
``` bash
globus login --no-local-server
```
This will give you an output similar to the following:-

``` bash
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?prompt=login&access_type=offline&state=_default&redirect_uri=https …
… &scope=openid+profile+email+uuview_identity_set+urn%3Aglobus%3Aauth%3Ascope%3Atransfer.api.globus.org%3Aall
------------------------------------
Enter the resulting Authorization Code here:
```

- Copy and paste the URL into a browser window. At this point you will be prompted to log into your Globus account. Do this, and then run through the setup until you are presented with an authorisation code. Copy this into the terminal and press enter.

- You should now be follwed in. You can confirm this by entering:-
``` bash
globus whoami
```
This should return your Globus username.

3. Create your endpoint by entering the following line:-

``` bash
globus endpoint create --personal <your endpoint name> | tee endpoint-info
```

If successful, this should return the following:-
``` bash
Endpoint created successfully
Endpoint ID: <Your unique endpoint ID>
Setup Key: <Your unique setup key>
```

4. Save the Endpoint ID and Setup Key as variables. This is necessary for later steps:-

``` bash
export ep=<Your unique endpoint ID>
export epkey=<Your unique setup key>
```

5. Navigate to the globesconnectpersonal folder, and then run globusconnectpersonal. The folder will be appended with the current version number, so its a good idea to run ```ls``` to confirm the folders name:-

``` bash
cd globusconnectpersonal-3.x.x/
./globusconnectpersonal -setup $epkey
```

You can now verify that the endpoint was set up correctly:-

``` bash
globus endpoint search --filter-scope my-endpoints

ID | Owner | Display Name
------------------| -----------------------| -------------
087ecee8-d7cd-... | [email protected] | <your endpoint name>
```

6. Start the endpoint:-

``` bash
./globusconnectpersonal -start
```

You should be able to confirm that this was worked correctly by navigating to your [Globus accounts Collections](https://app.globus.org/file-manager/collections), and checking to see if your Endpoint is listed.


## Making the Setup Permanent using PVC

Currently, if the container was restarted, it would need to be reconfigured over again. We can avoid this by saving the endpoint setup in a PVC attached to the container.

1. Switch back to the superuser. You can do this by just entering ```exit```.

2. Enter the following lines. These will create the folder that will store the saved setup, give it its proper user ownership, and copy over the setup files:-

``` bash
mkdir -p /data/gridftp-save
chown gridftp.gridftp /data/gridftp-save
cd ~gridftp/
cp -p -r .globus* /data/gridftp-save/
cp -p endpoint-info /data/gridftp-save/
```

3. When you first configure the globus personal endpoint, the default path will be ~/ as defined in ~/.globusonline/lta/config-paths. To persist your data, you must transfer it to /data/gridftp-save/. Follow these directions to allow access to that path in Globus. To make /data/gridftp-save/ available, run this command:-

``` bash
echo "/data/gridftp-save/,0,1" >> ~/.globusonline/lta/config-paths
```
You can confirm that this worked by entering the following command:-

``` bash
cat .globusonline/lta/config-paths
```

This should return:-

``` bash
~/,0,1
/data/gridftp-save/,0,1
```


4. At this point, you will need to re-run the following command to re-copy your changes to the /data/gridftp-save/ directory:-

``` bash
cp -p -r ~/.globus* /data/gridftp-save/
```

At this point, it is a good idea to confirm that any changes you make to the Endpoint directory in the Globus webapp are reflected in the endpoint itself. A very simple way to check this is to select your endpoint from your [Globus accounts Collections](https://app.globus.org/file-manager/collections), navigate to /data/gridftp-save/, and then add a new folder. You should then be able to ```cd``` into the same directory in the endpoint itself and see the new folder there as well.

## Accessing Data on the Persistent Storage

You can now use a second pod to retrieve transferred data from the PVC.

1. Delete the statefulset that you created at the start of the guide (this will by default be called globus-connect):-

``` bash
kubectl delete statefulset globus-connect -n your_namespace
```

2. Create a new .yaml file (in this case called accessPod.yaml), and enter the following:-

``` yaml
apiVersion: v1
kind: Pod
metadata:
name: globus-connect-pod
spec:
containers:
- name: globus-connect-pod
image: ubuntu:latest
resources:
limits:
memory: 100Mi
cpu: 100m
requests:
memory: 100Mi
cpu: 100m
imagePullPolicy: Always
args: ["/bin/bash", "-c", "sleep infinity"]
volumeMounts:
- mountPath: /data
name: globus-connect-data
volumes:
- name: globus-connect-data
persistentVolumeClaim:
claimName: globus-connect-data
```

3. Schedule this new pod, ensuring that it is in the same namespace as the Globus endpoint:-

``` bash
kubectl create -f accessPod.yaml in your_namespace
```

You will now be able to connect to this pod and access the data in your endpoint.

4. You can then re-create the globus-connect statefulset by re-entering the following command from the initial setup:-

``` bash
kubectl create -f globus-connect.yaml -n your_namespace
```
1 change: 0 additions & 1 deletion docs/services/gpuservice/training/L4_template_workflow.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/services/gpuservice/training/L5_template_workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Template workflow
2 changes: 1 addition & 1 deletion docs/services/jhub/docs.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Service Documentation

## Online support
## Online support
9 changes: 5 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ nav:
# - "Using the MFT": services/mft/using-the-mft.md
# - "SFTP": services/mft/sftp.md
- "Policies": services/virtualmachines/policies.md
- "Managed JupyterHub":
- "QuickStart": services/jhub/quickstart.md
- "Tutorial": services/jhub/tutorial.md
- "Documentation": services/jhub/docs.md
#- "Managed JupyterHub":
# - "QuickStart": services/jhub/quickstart.md
# - "Tutorial": services/jhub/tutorial.md
# - "Documentation": services/jhub/docs.md
- "Cerebras CS-2":
- "Get Access": services/cs2/access.md
- "Running codes": services/cs2/run.md
Expand All @@ -68,6 +68,7 @@ nav:
- "Getting Started": services/gpuservice/training/L1_getting_started.md
- "Persistent Volumes": services/gpuservice/training/L2_requesting_persistent_volumes.md
- "Running a Pytorch Pod": services/gpuservice/training/L3_running_a_pytorch_task.md
- "Setting up a Globus Endpoint": services/gpuservice/training/L4_setting_up_a_globus_endpoint.md
- "GPU Service FAQ": services/gpuservice/faq.md
- "Graphcore Bow Pod64":
- "Overview": services/graphcore/index.md
Expand Down