Skip to content

Commit

Permalink
Merge pull request #343 from kevinstratford/fix-issue-342
Browse files Browse the repository at this point in the history
Update forge/ddt ionformation
mbareford authored Jul 31, 2024
2 parents c6142eb + f177830 commit 4291782
Showing 2 changed files with 83 additions and 172 deletions.
140 changes: 5 additions & 135 deletions docs/software-tools/ddt.md
Original file line number Diff line number Diff line change
@@ -1,137 +1,7 @@
# Debugging using Linaro DDT
## Debugging using Linaro DDT

The Linaro Forge tool suite is installed on Cirrus. This includes DDT,
which is a debugging tool for scalar, multi-threaded and large-scale
parallel applications. To compile your code for debugging you will
usually want to specify the `-O0` option to turn off all code
optimisation (as this can produce a mismatch between source code line
numbers and debugging information) and `-g` to include debugging
information in the compiled executable. To use this package you will
need to log in to Cirrus with X11-forwarding enabled, load the Linaro Forge
module and execute `forge`:
Please see the [Forge page](./forge.md) for details of how to run debugging and
profiling jobs on Cirrus.

module load forge
forge

## Debugging runs on the login nodes

You can execute and debug your MPI code on the login node which is
useful for immediate development work with short, small, simple runs to
avoid having to wait in the queue. Firstly ensure you have loaded the
`mpt` module and any other dependencies of your code, then start Forge
and click *Run*. Fill in the necessary details of your code under the
*Application* pane, then tick the *MPI* tick box, specify the number of
MPI processes you wish to run and ensure the implementation is set to
*HPE MPT (2.18+)*. If this is not set correctly then you can update the
configuration by clicking the *Change* button and selecting this option
on the *MPI/UPC Implementation* field of the system pane. When you are
happy with this hit *Run* to start.

## Debugging runs on the compute nodes

This involves DDT submitting your job to the queue, and as soon as the
compute nodes start executing you will drop into the debug session and
be able to interact with your code. Start Forge and click on *Run*, then
in the *Application* pane provide the details needed for your code. Then
tick the *MPI* box -- when running on the compute nodes, you must set
the MPI implementation to *Slurm (generic)*. You must also tick the
*Submit to Queue* box. Clicking the *Configure* button in this section,
you must now choose the submission template. One is provided for you at
`/work/y07/shared/cirrus-software/forge/latest/templates/cirrus.qtf` which
you should copy and modify to suit your needs. You will need to load any
modules required for your code and perform any other necessary setup,
such as providing extra sbatch options, i.e., whatever is needed for
your code to run in a normal batch job.



!!! Note

The current Linaro Forge licence permits use on the Cirrus CPU nodes only.
The licence does not permit use of DDT/MAP for codes that run on the
Cirrus GPUs.


Back in the DDT run window, you can click on *Parameters* in the same
queue pane to set the partition and QoS to use, the account to which the
job should be charged, and the maximum walltime. You can also now look
at the *MPI* pane again and select the number of processes and nodes to
use. Finally, clicking *Submit* will place the job in the queue. A new
window will show you the queue until the job starts at which you can
start to debug.

## Memory debugging with DDT

If you are dynamically linking your code and debugging it on the login
node then this is fine (just ensure that the *Preload the memory
debugging library* option is ticked in the *Details* pane.) If you are
dynamically linking but intending to debug running on the compute nodes,
or statically linking then you need to include the compile option
`-Wl,--allow-multiple-definition` and explicitly link your executable
with Allinea's memory debugging library. The exactly library to link
against depends on your code; `-ldmalloc` (for no threading with C),
`-ldmallocth` (for threading with C), `-ldmallocxx` (for no threading
with C++) or `-ldmallocthcxx` (for threading with C++). The library
locations are all set up when the *forge* module is loaded so these
libraries should be found without further arguments.

## Remote Client

Linaro Forge can connect to remote systems using SSH so you can run the
user interface on your desktop or laptop machine without the need for X
forwarding. Native remote clients are available for Windows, macOS and
Linux. You can download the remote clients from the [Linaro Forge
website](https://www.linaroforge.com/downloadForge/). No licence
file is required by a remote client.



!!! Note

The same versions of Linaro Forge must be installed on the local and remote
systems in order to use DDT remotely.



To configure the remote client to connect to Cirrus, start it and then
click on the *Remote Launch* drop-down box and click on *Configure*. In
the new window, click *Add* to create a new login profile. For the
hostname you should provide `username@login.cirrus.ac.uk` where
`username` is your login username. For *Remote Installation Directory*\*
enter `/work/y07/shared/cirrus-software/forge/latest`. To ensure your SSH
private key can be used to connect, the SSH agent on your local machine
should be configured to provide it. You can ensure this by running
`ssh-add ~/.ssh/id_rsa_cirrus` before using the Forge client where you
should replace `~/.ssh/id_rsa_cirrus` with the path to the key you
normally use to log in to Cirrus. This should persist until your local
machine is restarted --only then should you have to re-run `ssh-add`.

If you only intend to debug jobs on the compute nodes no further
configuration is needed. If however you want to use the login nodes, you
will likely need to write a short bash script to prepare the same
environment you would use if you were running your code interactively on
the login node -- otherwise, the necessary libraries will not be found
while running. For example, if using MPT, you might create a file in
your home directory containing only one line:

module load mpt

In your local Forge client you should then edit the *Remote Script*
field in the Cirrus login details to contain the path to this script.
When you log in the script will be sourced and the software provided by
whatever modules it loads become usable.

When you start the Forge client, you will now be able to select the
Cirrus login from the Remote Launch drop-down box. After providing your
usual login password the connection to Cirrus will be established and
you will be able to start debugging.

You can find more detailed information
[here](https://docs.linaroforge.com/23.1.1/html/forge/forge/connecting_to_a_remote_system/connecting_remotely.html).

## Getting further help on DDT

- [DDT
website](https://www.linaroforge.com/linaroDdt/)
- [DDT user
guide](https://docs.linaroforge.com/23.1.1/html/forge/ddt/index.html)
You can find more detailed information in the
[Linaro documentation](https://docs.linaroforge.com/24.0.2/html/forge/forge/connecting_to_a_remote_system/connecting_remotely.html).
115 changes: 78 additions & 37 deletions docs/software-tools/forge.md
Original file line number Diff line number Diff line change
@@ -3,7 +3,8 @@

[Linaro Forge](https://www.linaroforge.com/) provides debugging and profiling tools
for MPI parallel applications, and OpenMP or pthreads multi-threaded applications
(and also hydrid MPI/OpenMP). Forge DDT is the debugger and MAP is the profiler.
(and also hydrid MPI/OpenMP). Debugging is also possible for CUDA applications
on the GPU nodes. Forge DDT is the debugger and MAP is the profiler.


### User interface
@@ -17,36 +18,6 @@ To download the remote client, see the [Forge download pages](https://www.linaro
Version 24.0 is known to work at the time of writing. A section further down this page explains how to use the remote client,
see [Connecting with the remote client](#connecting-with-the-remote-client).

### Licensing

Cirrus has a licence for up to 2080 tokens, where a token represents an MPI parallel process.
Running Forge DDT/MAP to debug/profile a code running across 16 nodes using 36 MPI ranks per
node would require 576 tokens. Alternatively, the number of tokens required would be halved
if there were 18 MPI ranks per node.

Please note, Forge licence tokens are shared by all Cirrus (and [ARCHER2](https://www.archer2.ac.uk/)) users.

To see how many tokens are in use, you can view the licence server status page by first
setting up an SSH tunnel to the node hosting the licence server.

```bash
ssh <username>@login.cirrus.ac.uk -L 4241:cirrus-ag1:4241
```

You can now view the status page from within a local browser, see [http://localhost:4241/status.html](http://localhost:4241/status.html).

!!! note
The licence status page may contain multiple licences, indicated by a row of buttons (one per licence) near the top of the page.
The details of the 12-month licence described above can be accessed by clicking on the first button in the row.
Additional buttons may appear at various times for *boosted* licences that offer more tokens. Such licences are primarily for the
benefit of [ARCHER2](https://www.archer2.ac.uk/) users. Please contact the [Service Desk](https://www.cirrus.ac.uk/support/) if you have a specific requirement that exceeds
the current Forge licence provision.

!!! note
The licence status page refers to the Arm Licence Server. Arm is the name of the company that originally developed Forge
before it was acquired by Linaro.


### One time set-up for using Forge

A preliminary step is required to set up the necessary
@@ -95,7 +66,7 @@ to reduce the optimisation to `-O0` to obtain full and consistent
information. However, this in itself can change the behaviour of bugs,
so some experimentation may be necessary.

#### Post-mortem debugging
#### Post-mortem, or offline, debugging

A non-interactive method of debugging is available which allows information
to be obtained on the state of the execution at the point of failure in a
@@ -145,6 +116,7 @@ with C++) or `-ldmallocthcxx` (for threading with C++). The library
locations are all set up when the `forge` module is loaded so these
libraries should be found without further arguments.


#### Interactive debugging: using the client to submit a batch job

You can also start the client interactively (for details of remote launch, see [Connecting with the remote client](#connecting-with-the-remote-client)).
@@ -164,7 +136,10 @@ Note:
the left-hand side;

* If the license has connected successfully, a serial number will be
shown in small text at the lower left.
shown in small text at the lower left (see image below). One can click
on the question
mark icon next to the license serial number to see current information
on the status of the license (number of processes available and so on).


![Forge window](./forge-ddt.png)
@@ -198,6 +173,18 @@ template file can then be specified in the dialog window.
There may be a short delay while the sbatch job starts. Debugging should
then proceed as described in the [Linaro Forge documentation](https://docs.linaroforge.com/24.0/html/forge/ddt/index.html).

#### GPU debugging

This proceeds in the normal way on GPU nodes. We recommend that one sets, in
the environment (e.g., via the `.qft` file, q.v.)
```
export TMPDIR=/tmp
```
If this is not set, the application may not start, or fail at the point
of execution.

See the Linaro Forge documentation for further comments on GPU debugging.


### Using MAP

@@ -217,8 +204,8 @@ will need to link the MAP libraries manually by providing explicit link options.

The library paths specified in the link options will depend on the MPI library and compiler.

- `MPT 2.55 and GCC 10.2.0`: `${FORGE_DIR}/map/libs/mpt-2.25/gcc`
- `Intel MPI 20.4 and Intel 20.4`: `${FORGE_DIR}/map/libs/impi-20/intel20`
- MPT 2.55 and GCC 10.2.0: `${FORGE_DIR}/map/libs/mpt-2.25/gcc`
- Intel MPI 20.4 and Intel 20.4: `${FORGE_DIR}/map/libs/impi-20/intel20`

For example, for `MPT 2.55 and GCC 10.2.0` the additional options required at link time
are given below.
@@ -229,7 +216,7 @@ are given below.
```

The MAP libraries for other combinations of MPI library and compiler can be found under
`${FORGE_DIR}/map/libs`.
`${FORGE_DIR}/map/libs`.

#### Generating a profile

@@ -240,7 +227,8 @@ Submit a batch job in the usual way, and include the lines:
module load forge
map -n <number of MPI processes> --mpi=slurm --mpiargs="--hint=nomultithread --distribution=block:block" --profile ./my_executable
map -n <number of MPI processes> --mpi=slurm --mpiargs="--hint=nomultithread \
--distribution=block:block" --profile ./my_executable
```

Successful execution will generate a file with a `.map` extension.
@@ -290,6 +278,59 @@ Finally, note that `ssh` may need to be configured so that it picks up
the correct local public key file. This may be done, e.g., via the
local `.ssh/config` configuration file.

#### Troubleshooting

A common cause of problems in the use of the remote client is incorrect
Forge configuration in the `.forge/system.config` file, particularly in the
specification of the shared directory. The should be of the form
```
shared directory = /mnt/lustre/e1000/home/project/project/user/.forge
```
(and certainly not the home directory `~`). The full mount point your
work directory can be obtained with e.g., `pwd -P` (somewhat
confusingly, `/mnt/lustre/e100/home` is `/work`).

If you submit a job to the queue via the remote client, and the job starts
(can check using `squeue` interactively), but the client does not connect,
you may need to check this configuration setting.


For hybrid applications where thread placement is critical, the remote
client does not provide good control of such placement (or any at all).
The `--offline` approach discussed above is one solution.


### Licensing

Cirrus has a licence for up to 2080 tokens, where a token represents an MPI parallel process.
Running Forge DDT/MAP to debug/profile a code running across 16 nodes using 36 MPI ranks per
node would require 576 tokens. Alternatively, the number of tokens required would be halved
if there were 18 MPI ranks per node.

Please note, Forge licence tokens are shared by all Cirrus (and [ARCHER2](https://www.archer2.ac.uk/)) users.

To see how many tokens are in use, you can view the licence server status page by first
setting up an SSH tunnel to the node hosting the licence server.

```bash
ssh <username>@login.cirrus.ac.uk -L 4241:cirrus-ag1:4241
```

You can now view the status page from within a local browser, see [http://localhost:4241/status.html](http://localhost:4241/status.html).

!!! note
The licence status page may contain multiple licences, indicated by a row of buttons (one per licence) near the top of the page.
The details of the 12-month licence described above can be accessed by clicking on the first button in the row.
Additional buttons may appear at various times for *boosted* licences that offer more tokens. Such licences are primarily for the
benefit of [ARCHER2](https://www.archer2.ac.uk/) users. Please contact the [Service Desk](https://www.cirrus.ac.uk/support/) if you have a specific requirement that exceeds
the current Forge licence provision.

!!! note
The licence status page refers to the Arm Licence Server. Arm is the name of the company that originally developed Forge
before it was acquired by Linaro.



## Useful links


0 comments on commit 4291782

Please sign in to comment.