This is not official documentation for the [Vienna Scientific Cluster](https://vsc.ac.at). For this check the [VSC Wiki](https://wiki.vsc.ac.at). Instead, this is my personal cheat sheet of things that are not well documented elsewhere. Also while the content is focused on the VSC, most of the things mentioned here also apply to similar setups that use Slurm at other universities.
<!--more-->
## Basics
**Always request an interactive session when running anything using a non-trivial amount of CPU power!**
This will also show the reason why the job is still queued for which an explanation can be found [in the slurm documentation](https://slurm.schedmd.com/squeue.html#lbAF) or the [VSC wiki](https://wiki.vsc.ac.at/doku.php?id=doku:slurm_job_reason_codes).
Details about past Jobs (like maximum memory usage), can be found using [`sacct`](https://slurm.schedmd.com/sacct.html). You can manually specify the needed columns or display most of them using `--long`
Depending on access to private nodes, you might have access to many different QoS (*Quality of Service*), accounts and partitions.
On VSC you can get an overview over your account with `sqos` (this is also shown on login):
```bash
➜ sqos -acc # this only works on VSC
```
If you want to a different account or QoS than your default (e.g. if you want to access private nodes or GPU nodes), you can specify them with `--qos` and `--acccount` in `salloc`, `sbatch` or your job script.
You can also get an overview over all available partitions with `sinfo` and specify one explicitly with `--partition`.
If you want to get a quick overview over the QoS at VSC and their current usage, you can use `sqos`.
### Array Jobs
Sometimes you might want to submit a larger number of similar jobs. This can be easily achieved using array jobs and the [`--array`](https://slurm.schedmd.com/sbatch.html#OPT_array) argument. With this, your job will be submitted multiple times with a different task ID that can be used from the `$SLURM_ARRAY_TASK_ID` environment variable.
Keep in mind that each individual job should not be too small (more than just a few minutes) as otherwise the computational overhead of scheduling the job and starting it will not be worth it. In these cases using one job that runs the program in a loop will be more efficient.
[official docs](https://wiki.vsc.ac.at/doku.php?id=doku:vpn_ssh_access) (but we are using the more modern ProxyJump instead of Agent forwarding as this way we don't have to trust the intermediate server with our private key)
Access to VSC is only possible from IP addresses of the partner universities. If you are from the University of Vienna and don't want to use the VPN, an SSH tunnel via `login.univie.ac.at` is an alternative.
Then you can add another entry to `~/.ssh/config` on your computer for VSC that uses `ProxyJump` to connect via the `loginUnivie` entry we just created.
([official docs](https://wiki.vsc.ac.at/doku.php?id=doku:spack), that this guide builds on. More useful tips can be found in the [spack documentation](https://spack.readthedocs.io/en/latest/))
Software that is needed can be loaded via modules. The easiest way to find the right module for the current processor architecture, is directly querying `spack`, which is used to provide all compiled libraries and applications. There should never be a need to run `module` directly and doing so might accidentally pick libraries that are not intended for the current processor architecture.
If you get a long output, you can ignore everything above the `==> N installed package(s)` line as it is unrelated to your current query. In case this only returns one module that fits your requirements, you can directly replace `spack find` with `spack load` to load this module.
But most of the time, you will find multiple modules which differ in their properties (and `spack load` will fail if the query resolves to more than one package):
The most important property is the version and it is denoted with an `@` sign. Another property is the compiler the program or library was compiled with and it can be separated with a `%`.
So if you want to load e.g. `cmake` version 3.x.x compiled with `gcc` version 11, you could directly search for it and subsequently load it.
This way if another minor update of cmake is released, your command will load it. If you don't like this, check the next section.
Sometimes there are also multiple variants of the same module. `spack info modulename` can give you an overview over all of them, but that doesn't mean that all combinations of variants/compilers/versions are offered at VSC. If you are for example interested in the `hdf5` library with MPI support, you can search for the following (`-v` gives you the exact properties of each module):
If you dislike the fact that `spack load` queries don't resolve to specific packages, but just filters that describe the properties you want or prefer exactly specifying the version of a package for reproducibility, you can find the hash of package using `spack find -l` and can then use `/hash` to always refer to this exact package:
Sometimes a program that just compiled without any issues (as the correct spack modules are loaded) won't run afterwards as the libraries are not found at run time.
Keep in mind that doing so might bring back [the issues](#avoiding-broken-programs-due-to-loaded-dependencies) that changing `$LD_LIBRARY_PATH` causes.
Sometimes one needs to know what `spack load somepackage` does exactly (e.g. because a library is still not found even though you loaded the module). Adding `--sh` to `spack load` prints out all commands that would be executed during the `module load` allowing you to understand what is going on.
This is a list of modules I commonly use. While it might not be directly usable for other people and will go out of date quickly, it might still serve as a good starting point.
The following sections have been removed from the main guide as they are most likely no longer valid.
### Avoiding broken programs due to loaded dependencies
{{<alerttype="warning">}} Recent versions of spack don't set `$LD_LIBRARY_PATH` any more, which means that "unnecessarily" loaded spack modules should no longer affect other programs at runtime. If you manually modify `$LD_LIBRARY_PATH` you might still run into these issues now.
{{</alert>}}
Loading a spack module not just loads the specified module, but also all dependencies of this module. With some modules like `openmpi` that dependency tree can be quite large.
And loading module like `openssl` or `ncurses` from spack means that programs that depend on those libraries, but the versions provided by the base operating system, will crash.
```bash
➜ spack load openmpi%gcc
➜ nano somefile.txt
Segmentation fault (core dumped)
➜ htop
Segmentation fault (core dumped)
```
One can avoid this by unloading the affected modules afterwards.
```bash
➜ spack unload ncurses
➜ spack unload openssl
```
But in many cases one doesn't need all dependency modules and is really just interested in e.g. `openmpi` itself. Therefore, one can ignore the dependencies with `--only package`.