Overview of Nero GPU resources
GPU Models and Slurm Features
# of Nodes | # of GPUs | Slurm Features |
---|---|---|
6 | 4 | GPU_GEN:PSC,GPU_BRD:TESLA,GPU_SKU:V100_PCIE,GPU_MEM:32GB,GPU_CC:7.0 |
2 | 2 | GPU_GEN:PSC,GPU_BRD:TESLA,GPU_SKU:P100_PCIE,GPU_MEM:16GB,GPU_CC:6.0,CLOUD |
GPU Slurm Feature Descriptions
Slurm Feature | Description |
---|---|
GPU_GEN | GPU generation |
GPU_BRD | GPU brand |
GPU_SKU | GPU model |
GPU_MEM | Amount of GPU memory |
GPU_CC | GPU Compute Capability |
Basic Interactive Job submission for GPU resources
The following will request resources for 2 GPUs.
$ srun --pty -p gpu --gres=gpu:2 bash
The following flags are required
Slurm flag | Description |
---|---|
–pty | gives you a pty (console) |
-p gpu or –partition=gpu | select the GPU partition |
–gres=gpu:X | request # of GPUs from 1-4 |
To select GPU model using Slurm feature use the -C flag, for example:
srun --partition=gpu --gres=gpu:1 -C GPU_SKU:V100_PCIE --pty bash
Submitting a GPU job via a Batch Script
Note: Overview of batch script options can be found here: Slurm jobscript template.
The following script will request two GPUs for two hours in the gpu partition, job-name gputest1
#!/bin/bash
# Give your job a name, so you can recognize it in the queue overview
#SBATCH --job-name=gputest1
# Get email notification when job finishes or fails
#SBATCH --mail-type=END,FAIL # notifications for job done & fail
#SBATCH --mail-user=<sunetid>@stanford.edu
# Define how long you job will run d-hh:mm:ss
#SBATCH --time 02:00:00
# GPU jobs require you to specify partition
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=16G
# Number of tasks
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
To submit your job to slurm now run the following:
sbatch gputest1.sh
You can also reference a gpu slurm feature in you script using the following:
#SBATCH -C GPU_MEM:32GB
#SBATCH -C GPU_SKU:V100_PCIE
To Check the GPU Utilization for your job
srun --jobid=$RUNNINGJOB --pty bash nvidia-smi