diff --git a/detail_material/nodes/index.html b/detail_material/nodes/index.html
index 5f5eb07..d1bdf7a 100644
--- a/detail_material/nodes/index.html
+++ b/detail_material/nodes/index.html
@@ -70,6 +70,10 @@
CPU Nodes:
+
Login Node:
@@ -155,6 +159,13 @@ dl-srv-04:
(More to be added in the future)
CPU Nodes:
+cpu-srv-01:
+
+- Type: SuperMicro Server
+- CPU: AMD EPYC 9354P (32/64)
+- RAM: 768GB (DDR5)
+- Temporal local storage: None at the moment
+
(More to be added in the future)
Login Node:
diff --git a/index.html b/index.html
index 08aa7e0..adff519 100644
--- a/index.html
+++ b/index.html
@@ -193,5 +193,5 @@ Contents
diff --git a/search/search_index.json b/search/search_index.json
index dfd95aa..272d4c1 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to the Pleiades documentation This documentation is designed to provide users with essential information and instructions to effectively utilize the resources at IEETA. Whether you're new to HPC or an experienced user, this guide aims to help you understand how to access, manage jobs, and leverage the available resources in the most efficient way. Currently we are under a testing phase! Feel free to apply, see How to Access Contents About : This section provides an overview of Pleiades (IEETA's HPC cluster), including its purpose, capabilities, and the team responsible for its supports. How to Access : Detailed instructions on how to gain access to Pleiades Quick start : A concise guide for quickly getting started with the Pleiades. This includes basic steps on logging in, running simple jobs, and understanding the file system. For users seeking more profound insights or for those new to this domain, we have curated additional detailed material that delves into various aspects of utilizing this cluster. Detailed Material : Cluster Nodes Software packages TODO Job management TODO Account management TODO Furthermore, to aid in acclimatization with the HPC stack, we have also included some practical examples: Deep Learning (GPU) Examples Transformers Cuda example MNIST Example TODO Scientific Computing (CPU) Examples DNA Sequencing Example TODO R Example TODO","title":"Home"},{"location":"#welcome-to-the-pleiades-documentation","text":"This documentation is designed to provide users with essential information and instructions to effectively utilize the resources at IEETA. Whether you're new to HPC or an experienced user, this guide aims to help you understand how to access, manage jobs, and leverage the available resources in the most efficient way.","title":"Welcome to the Pleiades documentation"},{"location":"#currently-we-are-under-a-testing-phase-feel-free-to-apply-see-how-to-access","text":"","title":"Currently we are under a testing phase! Feel free to apply, see How to Access"},{"location":"#contents","text":"About : This section provides an overview of Pleiades (IEETA's HPC cluster), including its purpose, capabilities, and the team responsible for its supports. How to Access : Detailed instructions on how to gain access to Pleiades Quick start : A concise guide for quickly getting started with the Pleiades. This includes basic steps on logging in, running simple jobs, and understanding the file system. For users seeking more profound insights or for those new to this domain, we have curated additional detailed material that delves into various aspects of utilizing this cluster. Detailed Material : Cluster Nodes Software packages TODO Job management TODO Account management TODO Furthermore, to aid in acclimatization with the HPC stack, we have also included some practical examples: Deep Learning (GPU) Examples Transformers Cuda example MNIST Example TODO Scientific Computing (CPU) Examples DNA Sequencing Example TODO R Example TODO","title":"Contents"},{"location":"about/","text":"About Pleiades Pleiades is the name of IEETA's High Performance Computing (HPC) cluster. In this page, you will learn about its purpose, capabilities, and the unique opportunities it offers for research and computation-intensive projects. Objectives Support advanced research: The primary objective of Pleiades is to enable and accelerate research that requires substantial computational resources. Offer an open and unified computation platform: By aggregating heterogenous computation devices under the same cluster, we can facilitate in a unified way fair access to all of our reshearchers. Rely on proven open technologies: By using open-source, widely accepted technologies, such as SLURM, we hope to ensure system longevity and vendor-independence, as well as provide valuable skills to our users. Capabilities Pleiades offers a range of computing resources, including traditional CPU and GPU-enabled nodes. Why this name In the real-world, Pleiades is a star cluster located within the constellation of Taurus. A collective noun seemed appropriate to describe the plural, heterogeneous nature of a computing cluster, and Pleiades is both a beatiful word and a brilliant object! Support Team Currently the IEETA HPC is managed by the \"Pelouro de Infraestrutra Computacional\" team: Jo\u00e3o Rodrigues (jmr@ua.pt) Eurico Pedrosa (efp@live.ua.pt) Tiago Almeida (tiagomeloalmeida@ua.pt)","title":"About"},{"location":"about/#about-pleiades","text":"Pleiades is the name of IEETA's High Performance Computing (HPC) cluster. In this page, you will learn about its purpose, capabilities, and the unique opportunities it offers for research and computation-intensive projects.","title":"About Pleiades"},{"location":"about/#objectives","text":"Support advanced research: The primary objective of Pleiades is to enable and accelerate research that requires substantial computational resources. Offer an open and unified computation platform: By aggregating heterogenous computation devices under the same cluster, we can facilitate in a unified way fair access to all of our reshearchers. Rely on proven open technologies: By using open-source, widely accepted technologies, such as SLURM, we hope to ensure system longevity and vendor-independence, as well as provide valuable skills to our users.","title":"Objectives"},{"location":"about/#capabilities","text":"Pleiades offers a range of computing resources, including traditional CPU and GPU-enabled nodes.","title":"Capabilities"},{"location":"about/#why-this-name","text":"In the real-world, Pleiades is a star cluster located within the constellation of Taurus. A collective noun seemed appropriate to describe the plural, heterogeneous nature of a computing cluster, and Pleiades is both a beatiful word and a brilliant object!","title":"Why this name"},{"location":"about/#support-team","text":"Currently the IEETA HPC is managed by the \"Pelouro de Infraestrutra Computacional\" team: Jo\u00e3o Rodrigues (jmr@ua.pt) Eurico Pedrosa (efp@live.ua.pt) Tiago Almeida (tiagomeloalmeida@ua.pt)","title":"Support Team"},{"location":"how_to_access/","text":"How to Access Overview Pleiades primarily serves researchers at IEETA and students of the University of Aveiro. This guide outlines the process for obtaining access and connecting to the cluster. Eligibility and Requesting Access Who Can Access Researchers at IEETA: Faculty members, postdoctoral researchers, and research staff affiliated with IEETA are eligible to access IEETA resources. Students at the University of Aveiro: Undergraduate and graduate students enrolled at the University of Aveiro with IEETA supervisors are eligible to use these computational facility for academic and research purposes. How to Request Access Currently, to request access to Pleiades, eligible users should fill the following form https://forms.gle/WvRmjL1krNykzLnX7 Connecting to the HPC Once your access is granted, you can connect to Pleiades using SSH (Secure Shell) with your user credentials. Note: Access to the Pleiades (pleiades.ieeta.pt) is restricted to the University of Aveiro network, including Eduroam. Ensure you are connected to this network for successful access. SSH Connection Steps Open a Terminal: On your local machine, open a terminal window. Initiate SSH Connection: Use the following SSH command, replacing username with the username provided to you by email: ssh username@pleiades.ieeta.pt Enter Your Password: When prompted, enter the password provided to you. Further Assistance If you encounter any issues or have queries regarding the access process, please feel free to reach out to the Pleiades support team for assistance.","title":"How to Access"},{"location":"how_to_access/#how-to-access","text":"","title":"How to Access"},{"location":"how_to_access/#overview","text":"Pleiades primarily serves researchers at IEETA and students of the University of Aveiro. This guide outlines the process for obtaining access and connecting to the cluster.","title":"Overview"},{"location":"how_to_access/#eligibility-and-requesting-access","text":"","title":"Eligibility and Requesting Access"},{"location":"how_to_access/#who-can-access","text":"Researchers at IEETA: Faculty members, postdoctoral researchers, and research staff affiliated with IEETA are eligible to access IEETA resources. Students at the University of Aveiro: Undergraduate and graduate students enrolled at the University of Aveiro with IEETA supervisors are eligible to use these computational facility for academic and research purposes.","title":"Who Can Access"},{"location":"how_to_access/#how-to-request-access","text":"Currently, to request access to Pleiades, eligible users should fill the following form https://forms.gle/WvRmjL1krNykzLnX7","title":"How to Request Access"},{"location":"how_to_access/#connecting-to-the-hpc","text":"Once your access is granted, you can connect to Pleiades using SSH (Secure Shell) with your user credentials. Note: Access to the Pleiades (pleiades.ieeta.pt) is restricted to the University of Aveiro network, including Eduroam. Ensure you are connected to this network for successful access.","title":"Connecting to the HPC"},{"location":"how_to_access/#ssh-connection-steps","text":"Open a Terminal: On your local machine, open a terminal window. Initiate SSH Connection: Use the following SSH command, replacing username with the username provided to you by email: ssh username@pleiades.ieeta.pt Enter Your Password: When prompted, enter the password provided to you.","title":"SSH Connection Steps"},{"location":"how_to_access/#further-assistance","text":"If you encounter any issues or have queries regarding the access process, please feel free to reach out to the Pleiades support team for assistance.","title":"Further Assistance"},{"location":"quick_start/","text":"Quick start In this page you can find a quick start guide on how to use IEETA cluster (Pleiades). 1. Access IEETA cluster (Pleiades) Access the cluster via SSH using the credentials provided to you by email. If you do not have access yet, please refer to the How to Access page. $ ssh user@pleiades.ieeta.pt By default, upon logging in, you will land on our login node in your home directory, which is located at /data/home . This is a network storage partition visible to all cluster nodes. The login node is where you should prepare your code in order to submit jobs to run on the worker nodes of the cluster. The worker nodes are equipped with powerful resources. Currently, we have: CPU nodes : Nodes with a high amount of RAM and faster CPUs. Currently not added to the cluster yet GPU nodes : Nodes equipped with GPUs and more modest CPU/RAM configurations. For more information about each node check the nodes page . 2. Prepare your software environment The next step is to prepare your environment to run/build your application. We recommend using a virtual environment so that you can install any package locally. First, load the Python module. $ module load python Then create and activate your virtual environment. $ python -m venv virtual-venv $ source virtual-venv/bin/activate You can then install your package dependencies with pip. (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install torch transformers 3. Create your SLURM job script After setting up your runtime environment, you should create a SLURM job script to submit your job. For example: #!/bin/bash #SBATCH --job-name=trainer # create a short name for your job #SBATCH --output=\"trainer-%j.out\" # %j will be replaced by the slurm jobID #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --mem=4G # Total amount of RAM requested source /virtual-venv/bin/activate # If you have your venv activated when you submit the job, then you do not need to activate/deactivate python your_trainer_script.py deactivate The script is made of two parts: 1. Specification of the resources needed and some job information; 2. Comands that will be executed on the destination node. As an example, in the first part of the script, we define the job name, the output file and the requested resources (1 GPU, 2 CPUs and 4GB RAM). Then, in the second part, we define the tasks of the job. By default since no partition was defined the job will run under the default partitaion that in this cluster is the gpu partition, you can check which partitions and nodes are available with: $ sinfo 4. Submit the job To submit the job, you should run the following command: $ sbatch script_trainer.sh Submitted batch job 144 You can check the job status using the following command: $ squeue","title":"Quick Start"},{"location":"quick_start/#quick-start","text":"In this page you can find a quick start guide on how to use IEETA cluster (Pleiades).","title":"Quick start"},{"location":"quick_start/#1-access-ieeta-cluster-pleiades","text":"Access the cluster via SSH using the credentials provided to you by email. If you do not have access yet, please refer to the How to Access page. $ ssh user@pleiades.ieeta.pt By default, upon logging in, you will land on our login node in your home directory, which is located at /data/home . This is a network storage partition visible to all cluster nodes. The login node is where you should prepare your code in order to submit jobs to run on the worker nodes of the cluster. The worker nodes are equipped with powerful resources. Currently, we have: CPU nodes : Nodes with a high amount of RAM and faster CPUs. Currently not added to the cluster yet GPU nodes : Nodes equipped with GPUs and more modest CPU/RAM configurations. For more information about each node check the nodes page .","title":"1. Access IEETA cluster (Pleiades)"},{"location":"quick_start/#2-prepare-your-software-environment","text":"The next step is to prepare your environment to run/build your application. We recommend using a virtual environment so that you can install any package locally. First, load the Python module. $ module load python Then create and activate your virtual environment. $ python -m venv virtual-venv $ source virtual-venv/bin/activate You can then install your package dependencies with pip. (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install torch transformers","title":"2. Prepare your software environment"},{"location":"quick_start/#3-create-your-slurm-job-script","text":"After setting up your runtime environment, you should create a SLURM job script to submit your job. For example: #!/bin/bash #SBATCH --job-name=trainer # create a short name for your job #SBATCH --output=\"trainer-%j.out\" # %j will be replaced by the slurm jobID #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --mem=4G # Total amount of RAM requested source /virtual-venv/bin/activate # If you have your venv activated when you submit the job, then you do not need to activate/deactivate python your_trainer_script.py deactivate The script is made of two parts: 1. Specification of the resources needed and some job information; 2. Comands that will be executed on the destination node. As an example, in the first part of the script, we define the job name, the output file and the requested resources (1 GPU, 2 CPUs and 4GB RAM). Then, in the second part, we define the tasks of the job. By default since no partition was defined the job will run under the default partitaion that in this cluster is the gpu partition, you can check which partitions and nodes are available with: $ sinfo","title":"3. Create your SLURM job script"},{"location":"quick_start/#4-submit-the-job","text":"To submit the job, you should run the following command: $ sbatch script_trainer.sh Submitted batch job 144 You can check the job status using the following command: $ squeue","title":"4. Submit the job"},{"location":"detail_material/account_management/","text":"","title":"Account management"},{"location":"detail_material/cuda/","text":"Cuda example This example shows how you can compile and run a cuda program in one of the clusters GPU nodes. We expected that you already have access and know how to login. 0. Git clone the guides repo with the examples To facilitate the demonstration we already prepared the code and scripts necessary, your job is to first run and then understand by looking into more detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda 1. Preprare the environment First step is the preparation of the development enviroment which in this case would be to load the gcc compiler and CUDA libraries. $ module load gcc/11 $ module load cuda Here we load gcc 11 and not 12, since the currently installed CUDA version (11.8) advises to run with gcc 11. 2. Compile the cuda program For compiling the cuda program just call the nvcc $ nvcc vector_addition.cu -o vector_addition 3. Submit the job lunch_cuda.sh already contains the code to lunch the slurm job while requesting a gpu. $ sbatch lunch_cuda.sh Check your directly for the output file and cat: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 lunch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"Cuda example"},{"location":"detail_material/cuda/#cuda-example","text":"This example shows how you can compile and run a cuda program in one of the clusters GPU nodes. We expected that you already have access and know how to login.","title":"Cuda example"},{"location":"detail_material/cuda/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration we already prepared the code and scripts necessary, your job is to first run and then understand by looking into more detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda","title":"0. Git clone the guides repo with the examples"},{"location":"detail_material/cuda/#1-preprare-the-environment","text":"First step is the preparation of the development enviroment which in this case would be to load the gcc compiler and CUDA libraries. $ module load gcc/11 $ module load cuda Here we load gcc 11 and not 12, since the currently installed CUDA version (11.8) advises to run with gcc 11.","title":"1. Preprare the environment"},{"location":"detail_material/cuda/#2-compile-the-cuda-program","text":"For compiling the cuda program just call the nvcc $ nvcc vector_addition.cu -o vector_addition","title":"2. Compile the cuda program"},{"location":"detail_material/cuda/#3-submit-the-job","text":"lunch_cuda.sh already contains the code to lunch the slurm job while requesting a gpu. $ sbatch lunch_cuda.sh Check your directly for the output file and cat: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 lunch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"3. Submit the job"},{"location":"detail_material/job_management/","text":"","title":"Job management"},{"location":"detail_material/nodes/","text":"Cluster Nodes We are currently in a testing phase, and nodes will be gradually added or migrated to the cluster. GPU Nodes: dl-srv-02: Type: HP-workstation CPU: i7-12700 (12C) RAM: 16GB GPU-0: A2000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a2000 dl-srv-03: Type: Asus Server CPU: EPYC 7543 (32C/64T) RAM: 256GB GPU-0: A6000 GPU-1: A6000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a6000 dl-srv-04: Type: AlianWare workstation CPU: Ryzen 9 5900 (12C/24T) RAM: 96GB GPU-0: RTX 4070 Temporal local storage: NONE Slurm GPU resource name: nvidia-rtx-4070 (More to be added in the future) CPU Nodes: (More to be added in the future) Login Node: Type: VM CPU: 22 Cores RAM: 66GB","title":"Cluster Nodes"},{"location":"detail_material/nodes/#cluster-nodes","text":"We are currently in a testing phase, and nodes will be gradually added or migrated to the cluster.","title":"Cluster Nodes"},{"location":"detail_material/nodes/#gpu-nodes","text":"","title":"GPU Nodes:"},{"location":"detail_material/nodes/#dl-srv-02","text":"Type: HP-workstation CPU: i7-12700 (12C) RAM: 16GB GPU-0: A2000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a2000","title":"dl-srv-02:"},{"location":"detail_material/nodes/#dl-srv-03","text":"Type: Asus Server CPU: EPYC 7543 (32C/64T) RAM: 256GB GPU-0: A6000 GPU-1: A6000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a6000","title":"dl-srv-03:"},{"location":"detail_material/nodes/#dl-srv-04","text":"Type: AlianWare workstation CPU: Ryzen 9 5900 (12C/24T) RAM: 96GB GPU-0: RTX 4070 Temporal local storage: NONE Slurm GPU resource name: nvidia-rtx-4070 (More to be added in the future)","title":"dl-srv-04:"},{"location":"detail_material/nodes/#cpu-nodes","text":"(More to be added in the future)","title":"CPU Nodes:"},{"location":"detail_material/nodes/#login-node","text":"Type: VM CPU: 22 Cores RAM: 66GB","title":"Login Node:"},{"location":"detail_material/software_packages/","text":"","title":"Software packages"},{"location":"examples/dl/cuda/","text":"Cuda example This example demonstrates how to compile and execute a CUDA program on one of the cluster's GPU nodes. It is assumed that you already have access and know how to log in. 0. Git clone the guides repo with the examples To facilitate the demonstration, we have pre-prepared the necessary code and scripts in a repo. Your just need to execute the code and then explore it in further detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda 1. Preprare the environment The initial step involves setting up the development environment, which in this case means loading the GCC compiler and CUDA libraries. $ module load gcc $ module load cuda Currently there are two versions of CUDA installed (12.1 and 11.8). By default the latest one is always loaded when a version is not specified. Note that if you want to run CUDA 11.8 you aldo need to use gcc 11 due to compatibility issues from CUDA. 2. Compile the cuda program To compile the CUDA program, simply use the NVCC compiler: $ nvcc vector_addition.cu -o vector_addition 3. Submit the job The launch_cuda.sh script contains the necessary code to submit the Slurm job while requesting a GPU. $ sbatch launch_cuda.sh Submitted batch job 93 Check your directory for the output file and view its contents: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 launch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"Cuda example"},{"location":"examples/dl/cuda/#cuda-example","text":"This example demonstrates how to compile and execute a CUDA program on one of the cluster's GPU nodes. It is assumed that you already have access and know how to log in.","title":"Cuda example"},{"location":"examples/dl/cuda/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration, we have pre-prepared the necessary code and scripts in a repo. Your just need to execute the code and then explore it in further detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda","title":"0. Git clone the guides repo with the examples"},{"location":"examples/dl/cuda/#1-preprare-the-environment","text":"The initial step involves setting up the development environment, which in this case means loading the GCC compiler and CUDA libraries. $ module load gcc $ module load cuda Currently there are two versions of CUDA installed (12.1 and 11.8). By default the latest one is always loaded when a version is not specified. Note that if you want to run CUDA 11.8 you aldo need to use gcc 11 due to compatibility issues from CUDA.","title":"1. Preprare the environment"},{"location":"examples/dl/cuda/#2-compile-the-cuda-program","text":"To compile the CUDA program, simply use the NVCC compiler: $ nvcc vector_addition.cu -o vector_addition","title":"2. Compile the cuda program"},{"location":"examples/dl/cuda/#3-submit-the-job","text":"The launch_cuda.sh script contains the necessary code to submit the Slurm job while requesting a GPU. $ sbatch launch_cuda.sh Submitted batch job 93 Check your directory for the output file and view its contents: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 launch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"3. Submit the job"},{"location":"examples/dl/mnist/","text":"","title":"MNIST example"},{"location":"examples/dl/transformers/","text":"Transformers example This example demonstrates how to execute a standard deep learning training pipeline using the transformers library on one of the GPU nodes in the cluster. It is assumed that you already have access and know how to log in. 0. Git clone the guides repo with the examples To facilitate the demonstration, we have prepared the necessary code and scripts in a repository. Your task is to run the code and then delve into it to understand the details more thoroughly. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/transformers 1. Preprare the environment The first step is to prepare the development environment. This involves loading Python, creating a virtual environment, and installing the dependencies. $ module load python $ python -m venv virtual-venv $ source virtual-venv/bin/activate (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install transformers accelerate evaluate datasets scikit-learn 2. Submit the job The hf_classification_trainer.py file contains the essential code needed to train a BERT base model for a classification task using the Yelp dataset. hf_trainer.sh is the launch script that includes SBATCH directives for acquiring resources. Specifically, we are requesting one A6000 GPU (--gres=gpu:nvidia-rtx-a6000:1). (virtual-venv)$ sbatch hf_trainer.sh Submitted batch job 95 After submitting the job, you can check its status in the queue by running squeue: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 94 gpu hf_train tiagoalm R 0:02 1 dl-srv-03 Here you can see that your job (94) is running (ST = R) and has been allocated to the node dl-srv-03, which contains the A6000 GPU. To monitor the progress of your training, inspect the output file specified in the launch script: $ cat hf_trainer-94.out Job Information for Job ID: 94 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 4 GPU: NVIDIA RTX A6000 Partition: gpu QOS: normal Start Time: 2024-07-05 15:12:29 UTC Running On Node: dl-srv-03 ------------ ------------ /var/lib/slurm-llnl/slurmd/job00094/slurm_script: line 9: virtual-venv/bin/activate: No such file or directory User can use the local tmp dir /tmp/slurm-tiagoalmeida-94 Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. amout of GPU memory 51033931776 BATCH SIZE 60 33%|\u2588\u2588\u2588\u258e | 17/51 [00:26<00:34, 1.01s/i{'eval_loss': 1.6087651252746582, 'eval_accuracy': 0.257, 'eval_runtime': 6.7322, 'eval_samples_per_second': 148.539, 'eval_steps_per_second': 1.337, 'epoch': 1.0} 43%|\u2588\u2588\u2588\u2588\u258e | 22/51 [00:51<01:27, 3.02s/it]","title":"Transformers example"},{"location":"examples/dl/transformers/#transformers-example","text":"This example demonstrates how to execute a standard deep learning training pipeline using the transformers library on one of the GPU nodes in the cluster. It is assumed that you already have access and know how to log in.","title":"Transformers example"},{"location":"examples/dl/transformers/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration, we have prepared the necessary code and scripts in a repository. Your task is to run the code and then delve into it to understand the details more thoroughly. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/transformers","title":"0. Git clone the guides repo with the examples"},{"location":"examples/dl/transformers/#1-preprare-the-environment","text":"The first step is to prepare the development environment. This involves loading Python, creating a virtual environment, and installing the dependencies. $ module load python $ python -m venv virtual-venv $ source virtual-venv/bin/activate (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install transformers accelerate evaluate datasets scikit-learn","title":"1. Preprare the environment"},{"location":"examples/dl/transformers/#2-submit-the-job","text":"The hf_classification_trainer.py file contains the essential code needed to train a BERT base model for a classification task using the Yelp dataset. hf_trainer.sh is the launch script that includes SBATCH directives for acquiring resources. Specifically, we are requesting one A6000 GPU (--gres=gpu:nvidia-rtx-a6000:1). (virtual-venv)$ sbatch hf_trainer.sh Submitted batch job 95 After submitting the job, you can check its status in the queue by running squeue: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 94 gpu hf_train tiagoalm R 0:02 1 dl-srv-03 Here you can see that your job (94) is running (ST = R) and has been allocated to the node dl-srv-03, which contains the A6000 GPU. To monitor the progress of your training, inspect the output file specified in the launch script: $ cat hf_trainer-94.out Job Information for Job ID: 94 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 4 GPU: NVIDIA RTX A6000 Partition: gpu QOS: normal Start Time: 2024-07-05 15:12:29 UTC Running On Node: dl-srv-03 ------------ ------------ /var/lib/slurm-llnl/slurmd/job00094/slurm_script: line 9: virtual-venv/bin/activate: No such file or directory User can use the local tmp dir /tmp/slurm-tiagoalmeida-94 Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. amout of GPU memory 51033931776 BATCH SIZE 60 33%|\u2588\u2588\u2588\u258e | 17/51 [00:26<00:34, 1.01s/i{'eval_loss': 1.6087651252746582, 'eval_accuracy': 0.257, 'eval_runtime': 6.7322, 'eval_samples_per_second': 148.539, 'eval_steps_per_second': 1.337, 'epoch': 1.0} 43%|\u2588\u2588\u2588\u2588\u258e | 22/51 [00:51<01:27, 3.02s/it]","title":"2. Submit the job"},{"location":"examples/sc/dna/","text":"","title":"DNA sequencing"},{"location":"examples/sc/r/","text":"","title":"R"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to the Pleiades documentation This documentation is designed to provide users with essential information and instructions to effectively utilize the resources at IEETA. Whether you're new to HPC or an experienced user, this guide aims to help you understand how to access, manage jobs, and leverage the available resources in the most efficient way. Currently we are under a testing phase! Feel free to apply, see How to Access Contents About : This section provides an overview of Pleiades (IEETA's HPC cluster), including its purpose, capabilities, and the team responsible for its supports. How to Access : Detailed instructions on how to gain access to Pleiades Quick start : A concise guide for quickly getting started with the Pleiades. This includes basic steps on logging in, running simple jobs, and understanding the file system. For users seeking more profound insights or for those new to this domain, we have curated additional detailed material that delves into various aspects of utilizing this cluster. Detailed Material : Cluster Nodes Software packages TODO Job management TODO Account management TODO Furthermore, to aid in acclimatization with the HPC stack, we have also included some practical examples: Deep Learning (GPU) Examples Transformers Cuda example MNIST Example TODO Scientific Computing (CPU) Examples DNA Sequencing Example TODO R Example TODO","title":"Home"},{"location":"#welcome-to-the-pleiades-documentation","text":"This documentation is designed to provide users with essential information and instructions to effectively utilize the resources at IEETA. Whether you're new to HPC or an experienced user, this guide aims to help you understand how to access, manage jobs, and leverage the available resources in the most efficient way.","title":"Welcome to the Pleiades documentation"},{"location":"#currently-we-are-under-a-testing-phase-feel-free-to-apply-see-how-to-access","text":"","title":"Currently we are under a testing phase! Feel free to apply, see How to Access"},{"location":"#contents","text":"About : This section provides an overview of Pleiades (IEETA's HPC cluster), including its purpose, capabilities, and the team responsible for its supports. How to Access : Detailed instructions on how to gain access to Pleiades Quick start : A concise guide for quickly getting started with the Pleiades. This includes basic steps on logging in, running simple jobs, and understanding the file system. For users seeking more profound insights or for those new to this domain, we have curated additional detailed material that delves into various aspects of utilizing this cluster. Detailed Material : Cluster Nodes Software packages TODO Job management TODO Account management TODO Furthermore, to aid in acclimatization with the HPC stack, we have also included some practical examples: Deep Learning (GPU) Examples Transformers Cuda example MNIST Example TODO Scientific Computing (CPU) Examples DNA Sequencing Example TODO R Example TODO","title":"Contents"},{"location":"about/","text":"About Pleiades Pleiades is the name of IEETA's High Performance Computing (HPC) cluster. In this page, you will learn about its purpose, capabilities, and the unique opportunities it offers for research and computation-intensive projects. Objectives Support advanced research: The primary objective of Pleiades is to enable and accelerate research that requires substantial computational resources. Offer an open and unified computation platform: By aggregating heterogenous computation devices under the same cluster, we can facilitate in a unified way fair access to all of our reshearchers. Rely on proven open technologies: By using open-source, widely accepted technologies, such as SLURM, we hope to ensure system longevity and vendor-independence, as well as provide valuable skills to our users. Capabilities Pleiades offers a range of computing resources, including traditional CPU and GPU-enabled nodes. Why this name In the real-world, Pleiades is a star cluster located within the constellation of Taurus. A collective noun seemed appropriate to describe the plural, heterogeneous nature of a computing cluster, and Pleiades is both a beatiful word and a brilliant object! Support Team Currently the IEETA HPC is managed by the \"Pelouro de Infraestrutra Computacional\" team: Jo\u00e3o Rodrigues (jmr@ua.pt) Eurico Pedrosa (efp@live.ua.pt) Tiago Almeida (tiagomeloalmeida@ua.pt)","title":"About"},{"location":"about/#about-pleiades","text":"Pleiades is the name of IEETA's High Performance Computing (HPC) cluster. In this page, you will learn about its purpose, capabilities, and the unique opportunities it offers for research and computation-intensive projects.","title":"About Pleiades"},{"location":"about/#objectives","text":"Support advanced research: The primary objective of Pleiades is to enable and accelerate research that requires substantial computational resources. Offer an open and unified computation platform: By aggregating heterogenous computation devices under the same cluster, we can facilitate in a unified way fair access to all of our reshearchers. Rely on proven open technologies: By using open-source, widely accepted technologies, such as SLURM, we hope to ensure system longevity and vendor-independence, as well as provide valuable skills to our users.","title":"Objectives"},{"location":"about/#capabilities","text":"Pleiades offers a range of computing resources, including traditional CPU and GPU-enabled nodes.","title":"Capabilities"},{"location":"about/#why-this-name","text":"In the real-world, Pleiades is a star cluster located within the constellation of Taurus. A collective noun seemed appropriate to describe the plural, heterogeneous nature of a computing cluster, and Pleiades is both a beatiful word and a brilliant object!","title":"Why this name"},{"location":"about/#support-team","text":"Currently the IEETA HPC is managed by the \"Pelouro de Infraestrutra Computacional\" team: Jo\u00e3o Rodrigues (jmr@ua.pt) Eurico Pedrosa (efp@live.ua.pt) Tiago Almeida (tiagomeloalmeida@ua.pt)","title":"Support Team"},{"location":"how_to_access/","text":"How to Access Overview Pleiades primarily serves researchers at IEETA and students of the University of Aveiro. This guide outlines the process for obtaining access and connecting to the cluster. Eligibility and Requesting Access Who Can Access Researchers at IEETA: Faculty members, postdoctoral researchers, and research staff affiliated with IEETA are eligible to access IEETA resources. Students at the University of Aveiro: Undergraduate and graduate students enrolled at the University of Aveiro with IEETA supervisors are eligible to use these computational facility for academic and research purposes. How to Request Access Currently, to request access to Pleiades, eligible users should fill the following form https://forms.gle/WvRmjL1krNykzLnX7 Connecting to the HPC Once your access is granted, you can connect to Pleiades using SSH (Secure Shell) with your user credentials. Note: Access to the Pleiades (pleiades.ieeta.pt) is restricted to the University of Aveiro network, including Eduroam. Ensure you are connected to this network for successful access. SSH Connection Steps Open a Terminal: On your local machine, open a terminal window. Initiate SSH Connection: Use the following SSH command, replacing username with the username provided to you by email: ssh username@pleiades.ieeta.pt Enter Your Password: When prompted, enter the password provided to you. Further Assistance If you encounter any issues or have queries regarding the access process, please feel free to reach out to the Pleiades support team for assistance.","title":"How to Access"},{"location":"how_to_access/#how-to-access","text":"","title":"How to Access"},{"location":"how_to_access/#overview","text":"Pleiades primarily serves researchers at IEETA and students of the University of Aveiro. This guide outlines the process for obtaining access and connecting to the cluster.","title":"Overview"},{"location":"how_to_access/#eligibility-and-requesting-access","text":"","title":"Eligibility and Requesting Access"},{"location":"how_to_access/#who-can-access","text":"Researchers at IEETA: Faculty members, postdoctoral researchers, and research staff affiliated with IEETA are eligible to access IEETA resources. Students at the University of Aveiro: Undergraduate and graduate students enrolled at the University of Aveiro with IEETA supervisors are eligible to use these computational facility for academic and research purposes.","title":"Who Can Access"},{"location":"how_to_access/#how-to-request-access","text":"Currently, to request access to Pleiades, eligible users should fill the following form https://forms.gle/WvRmjL1krNykzLnX7","title":"How to Request Access"},{"location":"how_to_access/#connecting-to-the-hpc","text":"Once your access is granted, you can connect to Pleiades using SSH (Secure Shell) with your user credentials. Note: Access to the Pleiades (pleiades.ieeta.pt) is restricted to the University of Aveiro network, including Eduroam. Ensure you are connected to this network for successful access.","title":"Connecting to the HPC"},{"location":"how_to_access/#ssh-connection-steps","text":"Open a Terminal: On your local machine, open a terminal window. Initiate SSH Connection: Use the following SSH command, replacing username with the username provided to you by email: ssh username@pleiades.ieeta.pt Enter Your Password: When prompted, enter the password provided to you.","title":"SSH Connection Steps"},{"location":"how_to_access/#further-assistance","text":"If you encounter any issues or have queries regarding the access process, please feel free to reach out to the Pleiades support team for assistance.","title":"Further Assistance"},{"location":"quick_start/","text":"Quick start In this page you can find a quick start guide on how to use IEETA cluster (Pleiades). 1. Access IEETA cluster (Pleiades) Access the cluster via SSH using the credentials provided to you by email. If you do not have access yet, please refer to the How to Access page. $ ssh user@pleiades.ieeta.pt By default, upon logging in, you will land on our login node in your home directory, which is located at /data/home . This is a network storage partition visible to all cluster nodes. The login node is where you should prepare your code in order to submit jobs to run on the worker nodes of the cluster. The worker nodes are equipped with powerful resources. Currently, we have: CPU nodes : Nodes with a high amount of RAM and faster CPUs. Currently not added to the cluster yet GPU nodes : Nodes equipped with GPUs and more modest CPU/RAM configurations. For more information about each node check the nodes page . 2. Prepare your software environment The next step is to prepare your environment to run/build your application. We recommend using a virtual environment so that you can install any package locally. First, load the Python module. $ module load python Then create and activate your virtual environment. $ python -m venv virtual-venv $ source virtual-venv/bin/activate You can then install your package dependencies with pip. (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install torch transformers 3. Create your SLURM job script After setting up your runtime environment, you should create a SLURM job script to submit your job. For example: #!/bin/bash #SBATCH --job-name=trainer # create a short name for your job #SBATCH --output=\"trainer-%j.out\" # %j will be replaced by the slurm jobID #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --mem=4G # Total amount of RAM requested source /virtual-venv/bin/activate # If you have your venv activated when you submit the job, then you do not need to activate/deactivate python your_trainer_script.py deactivate The script is made of two parts: 1. Specification of the resources needed and some job information; 2. Comands that will be executed on the destination node. As an example, in the first part of the script, we define the job name, the output file and the requested resources (1 GPU, 2 CPUs and 4GB RAM). Then, in the second part, we define the tasks of the job. By default since no partition was defined the job will run under the default partitaion that in this cluster is the gpu partition, you can check which partitions and nodes are available with: $ sinfo 4. Submit the job To submit the job, you should run the following command: $ sbatch script_trainer.sh Submitted batch job 144 You can check the job status using the following command: $ squeue","title":"Quick Start"},{"location":"quick_start/#quick-start","text":"In this page you can find a quick start guide on how to use IEETA cluster (Pleiades).","title":"Quick start"},{"location":"quick_start/#1-access-ieeta-cluster-pleiades","text":"Access the cluster via SSH using the credentials provided to you by email. If you do not have access yet, please refer to the How to Access page. $ ssh user@pleiades.ieeta.pt By default, upon logging in, you will land on our login node in your home directory, which is located at /data/home . This is a network storage partition visible to all cluster nodes. The login node is where you should prepare your code in order to submit jobs to run on the worker nodes of the cluster. The worker nodes are equipped with powerful resources. Currently, we have: CPU nodes : Nodes with a high amount of RAM and faster CPUs. Currently not added to the cluster yet GPU nodes : Nodes equipped with GPUs and more modest CPU/RAM configurations. For more information about each node check the nodes page .","title":"1. Access IEETA cluster (Pleiades)"},{"location":"quick_start/#2-prepare-your-software-environment","text":"The next step is to prepare your environment to run/build your application. We recommend using a virtual environment so that you can install any package locally. First, load the Python module. $ module load python Then create and activate your virtual environment. $ python -m venv virtual-venv $ source virtual-venv/bin/activate You can then install your package dependencies with pip. (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install torch transformers","title":"2. Prepare your software environment"},{"location":"quick_start/#3-create-your-slurm-job-script","text":"After setting up your runtime environment, you should create a SLURM job script to submit your job. For example: #!/bin/bash #SBATCH --job-name=trainer # create a short name for your job #SBATCH --output=\"trainer-%j.out\" # %j will be replaced by the slurm jobID #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --mem=4G # Total amount of RAM requested source /virtual-venv/bin/activate # If you have your venv activated when you submit the job, then you do not need to activate/deactivate python your_trainer_script.py deactivate The script is made of two parts: 1. Specification of the resources needed and some job information; 2. Comands that will be executed on the destination node. As an example, in the first part of the script, we define the job name, the output file and the requested resources (1 GPU, 2 CPUs and 4GB RAM). Then, in the second part, we define the tasks of the job. By default since no partition was defined the job will run under the default partitaion that in this cluster is the gpu partition, you can check which partitions and nodes are available with: $ sinfo","title":"3. Create your SLURM job script"},{"location":"quick_start/#4-submit-the-job","text":"To submit the job, you should run the following command: $ sbatch script_trainer.sh Submitted batch job 144 You can check the job status using the following command: $ squeue","title":"4. Submit the job"},{"location":"detail_material/account_management/","text":"","title":"Account management"},{"location":"detail_material/cuda/","text":"Cuda example This example shows how you can compile and run a cuda program in one of the clusters GPU nodes. We expected that you already have access and know how to login. 0. Git clone the guides repo with the examples To facilitate the demonstration we already prepared the code and scripts necessary, your job is to first run and then understand by looking into more detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda 1. Preprare the environment First step is the preparation of the development enviroment which in this case would be to load the gcc compiler and CUDA libraries. $ module load gcc/11 $ module load cuda Here we load gcc 11 and not 12, since the currently installed CUDA version (11.8) advises to run with gcc 11. 2. Compile the cuda program For compiling the cuda program just call the nvcc $ nvcc vector_addition.cu -o vector_addition 3. Submit the job lunch_cuda.sh already contains the code to lunch the slurm job while requesting a gpu. $ sbatch lunch_cuda.sh Check your directly for the output file and cat: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 lunch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"Cuda example"},{"location":"detail_material/cuda/#cuda-example","text":"This example shows how you can compile and run a cuda program in one of the clusters GPU nodes. We expected that you already have access and know how to login.","title":"Cuda example"},{"location":"detail_material/cuda/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration we already prepared the code and scripts necessary, your job is to first run and then understand by looking into more detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda","title":"0. Git clone the guides repo with the examples"},{"location":"detail_material/cuda/#1-preprare-the-environment","text":"First step is the preparation of the development enviroment which in this case would be to load the gcc compiler and CUDA libraries. $ module load gcc/11 $ module load cuda Here we load gcc 11 and not 12, since the currently installed CUDA version (11.8) advises to run with gcc 11.","title":"1. Preprare the environment"},{"location":"detail_material/cuda/#2-compile-the-cuda-program","text":"For compiling the cuda program just call the nvcc $ nvcc vector_addition.cu -o vector_addition","title":"2. Compile the cuda program"},{"location":"detail_material/cuda/#3-submit-the-job","text":"lunch_cuda.sh already contains the code to lunch the slurm job while requesting a gpu. $ sbatch lunch_cuda.sh Check your directly for the output file and cat: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 lunch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"3. Submit the job"},{"location":"detail_material/job_management/","text":"","title":"Job management"},{"location":"detail_material/nodes/","text":"Cluster Nodes We are currently in a testing phase, and nodes will be gradually added or migrated to the cluster. GPU Nodes: dl-srv-02: Type: HP-workstation CPU: i7-12700 (12C) RAM: 16GB GPU-0: A2000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a2000 dl-srv-03: Type: Asus Server CPU: EPYC 7543 (32C/64T) RAM: 256GB GPU-0: A6000 GPU-1: A6000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a6000 dl-srv-04: Type: AlianWare workstation CPU: Ryzen 9 5900 (12C/24T) RAM: 96GB GPU-0: RTX 4070 Temporal local storage: NONE Slurm GPU resource name: nvidia-rtx-4070 (More to be added in the future) CPU Nodes: cpu-srv-01: Type: SuperMicro Server CPU: AMD EPYC 9354P (32/64) RAM: 768GB (DDR5) Temporal local storage: None at the moment (More to be added in the future) Login Node: Type: VM CPU: 22 Cores RAM: 66GB","title":"Cluster Nodes"},{"location":"detail_material/nodes/#cluster-nodes","text":"We are currently in a testing phase, and nodes will be gradually added or migrated to the cluster.","title":"Cluster Nodes"},{"location":"detail_material/nodes/#gpu-nodes","text":"","title":"GPU Nodes:"},{"location":"detail_material/nodes/#dl-srv-02","text":"Type: HP-workstation CPU: i7-12700 (12C) RAM: 16GB GPU-0: A2000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a2000","title":"dl-srv-02:"},{"location":"detail_material/nodes/#dl-srv-03","text":"Type: Asus Server CPU: EPYC 7543 (32C/64T) RAM: 256GB GPU-0: A6000 GPU-1: A6000 Temporal local storage: 512GB nvme (/tmp/your-job) Slurm GPU resource name: nvidia-rtx-a6000","title":"dl-srv-03:"},{"location":"detail_material/nodes/#dl-srv-04","text":"Type: AlianWare workstation CPU: Ryzen 9 5900 (12C/24T) RAM: 96GB GPU-0: RTX 4070 Temporal local storage: NONE Slurm GPU resource name: nvidia-rtx-4070 (More to be added in the future)","title":"dl-srv-04:"},{"location":"detail_material/nodes/#cpu-nodes","text":"","title":"CPU Nodes:"},{"location":"detail_material/nodes/#cpu-srv-01","text":"Type: SuperMicro Server CPU: AMD EPYC 9354P (32/64) RAM: 768GB (DDR5) Temporal local storage: None at the moment (More to be added in the future)","title":"cpu-srv-01:"},{"location":"detail_material/nodes/#login-node","text":"Type: VM CPU: 22 Cores RAM: 66GB","title":"Login Node:"},{"location":"detail_material/software_packages/","text":"","title":"Software packages"},{"location":"examples/dl/cuda/","text":"Cuda example This example demonstrates how to compile and execute a CUDA program on one of the cluster's GPU nodes. It is assumed that you already have access and know how to log in. 0. Git clone the guides repo with the examples To facilitate the demonstration, we have pre-prepared the necessary code and scripts in a repo. Your just need to execute the code and then explore it in further detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda 1. Preprare the environment The initial step involves setting up the development environment, which in this case means loading the GCC compiler and CUDA libraries. $ module load gcc $ module load cuda Currently there are two versions of CUDA installed (12.1 and 11.8). By default the latest one is always loaded when a version is not specified. Note that if you want to run CUDA 11.8 you aldo need to use gcc 11 due to compatibility issues from CUDA. 2. Compile the cuda program To compile the CUDA program, simply use the NVCC compiler: $ nvcc vector_addition.cu -o vector_addition 3. Submit the job The launch_cuda.sh script contains the necessary code to submit the Slurm job while requesting a GPU. $ sbatch launch_cuda.sh Submitted batch job 93 Check your directory for the output file and view its contents: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 launch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"Cuda example"},{"location":"examples/dl/cuda/#cuda-example","text":"This example demonstrates how to compile and execute a CUDA program on one of the cluster's GPU nodes. It is assumed that you already have access and know how to log in.","title":"Cuda example"},{"location":"examples/dl/cuda/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration, we have pre-prepared the necessary code and scripts in a repo. Your just need to execute the code and then explore it in further detail. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/cuda","title":"0. Git clone the guides repo with the examples"},{"location":"examples/dl/cuda/#1-preprare-the-environment","text":"The initial step involves setting up the development environment, which in this case means loading the GCC compiler and CUDA libraries. $ module load gcc $ module load cuda Currently there are two versions of CUDA installed (12.1 and 11.8). By default the latest one is always loaded when a version is not specified. Note that if you want to run CUDA 11.8 you aldo need to use gcc 11 due to compatibility issues from CUDA.","title":"1. Preprare the environment"},{"location":"examples/dl/cuda/#2-compile-the-cuda-program","text":"To compile the CUDA program, simply use the NVCC compiler: $ nvcc vector_addition.cu -o vector_addition","title":"2. Compile the cuda program"},{"location":"examples/dl/cuda/#3-submit-the-job","text":"The launch_cuda.sh script contains the necessary code to submit the Slurm job while requesting a GPU. $ sbatch launch_cuda.sh Submitted batch job 93 Check your directory for the output file and view its contents: $ ll total 808 drwxr-xr-x 2 tiagoalmeida students 4096 Jul 5 15:49 ./ drwxr-xr-x 3 tiagoalmeida students 4096 Jul 5 15:48 ../ -rw-r--r-- 1 tiagoalmeida students 248 Jul 5 15:49 Cuda-93.out -rw-r--r-- 1 tiagoalmeida students 504 Jul 5 15:48 launch_cuda.sh -rwxr-xr-x 1 tiagoalmeida students 803936 Jul 5 15:48 vector_addition* -rw-r--r-- 1 tiagoalmeida students 2051 Jul 5 15:48 vector_addition.cu $ $ cat Cuda-93.out Job Information for Job ID: 93 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 2 GPU: NVIDIA RTX A2000 Partition: gpu QOS: normal Start Time: 2024-07-05 14:49:32 UTC Running On Node: dl-srv-02 ------------ ------------ --------------------------- __SUCCESS__ --------------------------- N = 1048576 Threads Per Block = 256 Blocks In Grid = 4096 ---------------------------","title":"3. Submit the job"},{"location":"examples/dl/mnist/","text":"","title":"MNIST example"},{"location":"examples/dl/transformers/","text":"Transformers example This example demonstrates how to execute a standard deep learning training pipeline using the transformers library on one of the GPU nodes in the cluster. It is assumed that you already have access and know how to log in. 0. Git clone the guides repo with the examples To facilitate the demonstration, we have prepared the necessary code and scripts in a repository. Your task is to run the code and then delve into it to understand the details more thoroughly. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/transformers 1. Preprare the environment The first step is to prepare the development environment. This involves loading Python, creating a virtual environment, and installing the dependencies. $ module load python $ python -m venv virtual-venv $ source virtual-venv/bin/activate (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install transformers accelerate evaluate datasets scikit-learn 2. Submit the job The hf_classification_trainer.py file contains the essential code needed to train a BERT base model for a classification task using the Yelp dataset. hf_trainer.sh is the launch script that includes SBATCH directives for acquiring resources. Specifically, we are requesting one A6000 GPU (--gres=gpu:nvidia-rtx-a6000:1). (virtual-venv)$ sbatch hf_trainer.sh Submitted batch job 95 After submitting the job, you can check its status in the queue by running squeue: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 94 gpu hf_train tiagoalm R 0:02 1 dl-srv-03 Here you can see that your job (94) is running (ST = R) and has been allocated to the node dl-srv-03, which contains the A6000 GPU. To monitor the progress of your training, inspect the output file specified in the launch script: $ cat hf_trainer-94.out Job Information for Job ID: 94 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 4 GPU: NVIDIA RTX A6000 Partition: gpu QOS: normal Start Time: 2024-07-05 15:12:29 UTC Running On Node: dl-srv-03 ------------ ------------ /var/lib/slurm-llnl/slurmd/job00094/slurm_script: line 9: virtual-venv/bin/activate: No such file or directory User can use the local tmp dir /tmp/slurm-tiagoalmeida-94 Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. amout of GPU memory 51033931776 BATCH SIZE 60 33%|\u2588\u2588\u2588\u258e | 17/51 [00:26<00:34, 1.01s/i{'eval_loss': 1.6087651252746582, 'eval_accuracy': 0.257, 'eval_runtime': 6.7322, 'eval_samples_per_second': 148.539, 'eval_steps_per_second': 1.337, 'epoch': 1.0} 43%|\u2588\u2588\u2588\u2588\u258e | 22/51 [00:51<01:27, 3.02s/it]","title":"Transformers example"},{"location":"examples/dl/transformers/#transformers-example","text":"This example demonstrates how to execute a standard deep learning training pipeline using the transformers library on one of the GPU nodes in the cluster. It is assumed that you already have access and know how to log in.","title":"Transformers example"},{"location":"examples/dl/transformers/#0-git-clone-the-guides-repo-with-the-examples","text":"To facilitate the demonstration, we have prepared the necessary code and scripts in a repository. Your task is to run the code and then delve into it to understand the details more thoroughly. $ git clone https://github.com/ieeta-pt/HPC-guides.git $ cd HPC-guides/examples/transformers","title":"0. Git clone the guides repo with the examples"},{"location":"examples/dl/transformers/#1-preprare-the-environment","text":"The first step is to prepare the development environment. This involves loading Python, creating a virtual environment, and installing the dependencies. $ module load python $ python -m venv virtual-venv $ source virtual-venv/bin/activate (virtual-venv)$ pip install --upgrade pip (virtual-venv)$ pip install transformers accelerate evaluate datasets scikit-learn","title":"1. Preprare the environment"},{"location":"examples/dl/transformers/#2-submit-the-job","text":"The hf_classification_trainer.py file contains the essential code needed to train a BERT base model for a classification task using the Yelp dataset. hf_trainer.sh is the launch script that includes SBATCH directives for acquiring resources. Specifically, we are requesting one A6000 GPU (--gres=gpu:nvidia-rtx-a6000:1). (virtual-venv)$ sbatch hf_trainer.sh Submitted batch job 95 After submitting the job, you can check its status in the queue by running squeue: $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 94 gpu hf_train tiagoalm R 0:02 1 dl-srv-03 Here you can see that your job (94) is running (ST = R) and has been allocated to the node dl-srv-03, which contains the A6000 GPU. To monitor the progress of your training, inspect the output file specified in the launch script: $ cat hf_trainer-94.out Job Information for Job ID: 94 from tiagoalmeida ------------ ------------ Account: students CPUs per Node: 4 GPU: NVIDIA RTX A6000 Partition: gpu QOS: normal Start Time: 2024-07-05 15:12:29 UTC Running On Node: dl-srv-03 ------------ ------------ /var/lib/slurm-llnl/slurmd/job00094/slurm_script: line 9: virtual-venv/bin/activate: No such file or directory User can use the local tmp dir /tmp/slurm-tiagoalmeida-94 Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. amout of GPU memory 51033931776 BATCH SIZE 60 33%|\u2588\u2588\u2588\u258e | 17/51 [00:26<00:34, 1.01s/i{'eval_loss': 1.6087651252746582, 'eval_accuracy': 0.257, 'eval_runtime': 6.7322, 'eval_samples_per_second': 148.539, 'eval_steps_per_second': 1.337, 'epoch': 1.0} 43%|\u2588\u2588\u2588\u2588\u258e | 22/51 [00:51<01:27, 3.02s/it]","title":"2. Submit the job"},{"location":"examples/sc/dna/","text":"","title":"DNA sequencing"},{"location":"examples/sc/r/","text":"","title":"R"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 335f081..a14e8a3 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,58 +2,58 @@
https://ieeta-pt.github.io/HPC-guides/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/about/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/how_to_access/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/quick_start/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/detail_material/account_management/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/detail_material/cuda/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/detail_material/job_management/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/detail_material/nodes/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/detail_material/software_packages/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/examples/dl/cuda/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/examples/dl/mnist/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/examples/dl/transformers/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/examples/sc/dna/
- 2024-09-27
+ 2024-10-24
https://ieeta-pt.github.io/HPC-guides/examples/sc/r/
- 2024-09-27
+ 2024-10-24
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 6cc3075..b9c8751 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ