Don't Panic! Running Containers in Gridware and Open Cluster Scheduler

Don’t Panic! Running Containers in Gridware and Open Cluster Scheduler

January 6, 2026

The container landscape has evolved. Modern runtimes like Apptainer and enroot enable seamless container execution without specialized workload manager integration—no external daemons, no elevated privileges, full scheduler control. Discover how container workloads are first-class citizens in Gridware and Open Cluster Scheduler.

When Docker emerged in 2013, the research community immediately recognized its potential for HPC environments, particularly for reproducibility. However, Docker introduced a fundamental architectural challenge: it operated as an external daemon, placing container processes outside the workload manager's control. Whether started through integration methods, prolog scripts, or job scripts, these containers bypassed the essential capabilities that workload managers provide—resource accounting, limit enforcement, and environment cleanup. The requirement for elevated system privileges further complicated adoption in shared HPC environments.

In 2024, we introduced Open Cluster Scheduler and Gridware Cluster Scheduler, continuing the legacy of DQS, Sun Grid Engine, and more. A frequently asked question today is:

How do you handle container integration in GCS and OCS?

The Container Landscape Has Evolved

The answer lies in recognizing that container management has fundamentally changed. Modern runtimes like Apptainer (formerly Singularity), enroot, and daemonless podman enable seamless container execution without specialized workload manager integration. These tools integrate naturally with HPC schedulers through standard integration and process management.

Apptainer has become the de facto standard for HPC container workloads, while NVIDIA's enroot excels at GPU-accelerated computing. Both run natively on Open Cluster Scheduler and Gridware Cluster Scheduler. In an earlier blog post about RNA sequencing pipelines, we demonstrated how multiple Apptainer containers can compose a complete bioinformatics workflow—from the scheduler's perspective, these are ordinary processes subject to standard forking, resource monitoring, limitation, and cleanup mechanisms. All spawned from a Nextflow pipeline using its "SGE" and Apptainer integration seemlessly working with Gridware and Open Cluster Scheduler.

Apptainer now supports completely unprivileged user-mode operation, enabling even container nesting. Its OCI compliance allows straightforward conversion of Docker images to the SIF format. Apptainer containers execute within job scripts exactly as they would on the command line—no external daemon required.

GPU Workloads and AI Container Runtimes

AI workloads benefit significantly from containerization. NVIDIA distributes optimized containers for various computing scenarios, all requiring GPU resource access. NVIDIA's enroot runtime integrates seamlessly with Open Cluster Scheduler, requiring only basic global configuration.

The fundamental advantage is architectural: containers launched through enroot or Apptainer remain child processes of the job, enabling complete scheduler control over resource allocation, monitoring, and cleanup.

Here's a simplified example of running NVIDIA's Clara Parabricks in an enroot container:

#!/bin/bash
#$ -l h_vmem=16G
#$ -l NVIDIA_GPUS=1
#$ -q gpu.q
#$ -N parabricks_job

# Import container image if not already cached on user's filesystem.
IMAGE_FILE="$HOME/parabricks4501.sqsh"
if [[ ! -f "${IMAGE_FILE}" ]]; then
  # Downloads Docker layers and converts to SqashFS format.
  enroot import -o "${IMAGE_FILE}" \
    docker://nvcr.io#nvidia/clara/clara-parabricks:4.5.0-1
fi

# Create container if it does not exist. The enroot create command takes a
# container image and unpacks its root filesystem under $ENROOT_DATA_PATH.
# Import container image, container creation, and start can be abstracted
# from the user by using a starter_method script configured in the queue.
CONTAINER_NAME="parabricks450"
if ! enroot list | grep -q "^${CONTAINER_NAME}\$"; then
  enroot create --name "${CONTAINER_NAME}" "${IMAGE_FILE}"
fi

# Run your workload inside the container. Change command below to your
# workload.
enroot start --rw "${CONTAINER_NAME}" nvidia-smi

Running Apptainer Containers

Apptainer provides even simpler integration, running containers directly from Docker registries without pre-import steps. Here's a batch job example using TensorFlow from Docker Hub:

#!/bin/bash
#$ -l h_vmem=8G
#$ -N tensorflow_batch

# Run directly from Docker registry
apptainer exec docker://tensorflow/tensorflow:latest \
  python -c "import tensorflow as tf; print(tf.__version__)"

For GPU workloads, use the --nv flag to expose NVIDIA GPUs to the container:

#!/bin/bash
#$ -l h_vmem=16G
#$ -l NVIDIA_GPUS=1
#$ -q gpu.q

# GPU-accelerated PyTorch training
apptainer exec --nv docker://pytorch/pytorch:latest \
  python train_model.py

Bind additional directories with --bind to access data directories or scratch space:

#!/bin/bash
#$ -l h_vmem=8G
#$ -cwd

# Bind project data and scratch directories
apptainer exec \
  --bind /data/project:/data \
  --bind $TMPDIR:/scratch \
  docker://rocker/tidyverse:latest \
  Rscript analysis.R

For interactive workloads, submit an interactive job and run Apptainer commands directly:

# Request an interactive session
qrsh -l h_vmem=8G

# Inside the interactive session, run your containerized application
apptainer shell docker://ubuntu:22.04

# Or execute specific commands
apptainer exec docker://python:3.11-slim python --version

Apptainer automatically caches container images in ~/.apptainer/cache, so subsequent runs are faster. For production workflows, pre-build SIF images for optimal performance:

# Convert Docker image to SIF format (one-time operation)
apptainer build tensorflow.sif docker://tensorflow/tensorflow:latest

# Use the SIF in your job script for faster startup
apptainer exec tensorflow.sif python train_model.py

Simplifying Container Workflows with starter_method

Gridware Cluster Scheduler provides the starter_method queue configuration parameter to streamline container execution for users. Instead of requiring users to explicitly invoke container runtimes in their job scripts, administrators can configure queues to automatically wrap jobs in containers.

The starter_method specifies an executable that Grid Engine uses to launch jobs instead of the default shell. This starter receives the job script and environment as arguments and further information through standard Grid Engine variables (SGE_STARTER_SHELL_PATH, SGE_STARTER_SHELL_START_MODE, SGE_STARTER_USE_LOGIN_SHELL).

Example use case: Create a container queue where all jobs automatically run inside a standardized Apptainer container:

# Queue configuration (qconf -mq container.q)
starter_method /opt/gridware/starters/apptainer_starter.sh

Example starter script (/opt/gridware/starters/apptainer_starter.sh):

#!/bin/bash
# Automatically wrap job in Apptainer container
exec apptainer exec \
  --bind /scratch,/data \
  docker://ubuntu:22.04 \
  "${SGE_STARTER_SHELL_PATH}" "$@"

Users then submit jobs normally without container-specific syntax:

#!/bin/bash
#$ -q container.q
#$ -N my_job

# Job runs inside container automatically
python my_script.py

This approach offers several benefits:

Simplified user experience: Users write standard job scripts without container syntax
Consistent environments: Administrators control container images centrally
Flexible deployment: Different queues can use different container images or runtimes
Transparent migration: Legacy job scripts work unchanged in containerized queues

The starter_method is particularly valuable for sites standardizing on specific software stacks, implementing security policies, or transitioning legacy workflows to containers without requiring users to modify their submission scripts.

Site-Specific Optimization Considerations

Starting, supervising, and terminating jobs represents only part of the HPC container equation. Site-specific optimizations depend on image storage strategies—local filesystems versus container registries. Open Cluster Scheduler provides comprehensive mechanisms for these scenarios. Container images can be represented as host resources, with load sensors reporting available images. We provide configuration guidance and custom tooling for site-specific requirements.

Multi-Node Container Jobs: The Technical Reality

Single-node containerized workloads present no significant challenges. Multi-node jobs and containerized MPI applications require understanding job launch mechanics. Apptainer's documentation covers this comprehensively: https://apptainer.org/docs/user/main/mpi.html

Consider a practical example: launching an enroot container interactively with allocated slots distributed across multiple nodes. The job script executes only on the master node. When MPI runs inside the container, it attempts remote execution on allocated nodes found in the hosts file ($PE_HOSTFILE) using ssh. The issue is that ssh goes to the remote host trying to execute the command - but MPI and the command is not found as it is installed in the container.

The solution is straightforward: configure OpenMPI to wrap the remote command with the container runtime. Instead of executing ssh orted <command>, execute ssh enroot <containerimage> orted <command>.

This is accomplished through the OpenMPI environment variable:

OMPI_MCA_orte_launch_agent="enroot start --rw --mount /mnt/home/${USER}/enroot/workspace:/app ${ENROOT_ROOTFS##*/} orted"

Again, as this is the magic part - this env variable allows to hook in a command between ssh and the usual startup through orted. This can be configured as an enroot hook for OpenMPI. For example:

cat /etc/enroot/hooks.d/ompi.sh 
#!/bin/bash
# Hook to enable OpenMPI to launch orted inside containers on remote nodes
echo "OMPI_MCA_orte_launch_agent=enroot start --rw --mount /shared_home/public/enroot/workspace/${USER}:/workspace ${ENROOT_ROOTFS##*/} orted" >> "${ENROOT_ENVIRON}"

These lines are important as the container for the master task in the job script needs to be started exactly the same way (same dirs) but with mpirun instead of orted.

Container image location requires also attention, typically resolved through shared filesystem access having the container available on all possible worker nodes. The enroot configuration requires two key settings:

ENROOT_DATA_PATH=/mnt/home/${USER}/enroot/data
ENROOT_MOUNT_HOME=yes

Practical Enroot Setup Example

An excellent resource for multi-node enroot configuration is available at (thanks j3soon from NVIDIA for sharing that): https://github.com/j3soon/multi-node-enroot-without-pyxis-and-slurm

For those implementing enroot with NFS-based shared storage, here's an example workflow for running NVIDIA's HPC Benchmark container:

Prerequisites: NFS server configured and mounted on all nodes.

Step 1: Install enroot on all compute nodes

# Download and install enroot (check https://github.com/NVIDIA/enroot/releases)
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v4.0.0/enroot_4.0.0-1_${arch}.deb
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v4.0.0/enroot+caps_4.0.0-1_${arch}.deb
sudo apt install -y ./enroot_4.0.0-1_${arch}.deb ./enroot+caps_4.0.0-1_${arch}.deb

Step 2: Configure enroot

Edit /etc/enroot/enroot.conf on all nodes and create the dirs on shared storage:

# Enroot configuration
ENROOT_RUNTIME_PATH=/shared_home/public/${USER}/enroot/runtime
ENROOT_CACHE_PATH=/shared_home/public/${USER}/enroot/.cache
ENROOT_DATA_PATH=/shared_home/public/${USER}/enroot/data
ENROOT_MOUNT_HOME=yes

Add OpenMPI hook (this adds the env variable for OpenMPI to the container):


cat /etc/enroot/hooks.d/ompi.sh 
#!/bin/bash
# Hook to enable OpenMPI to launch orted inside containers on remote nodes
echo "OMPI_MCA_orte_launch_agent=enroot start --rw --mount /shared_home/public/enroot/workspace/${USER}:/workspace ${ENROOT_ROOTFS##*/} orted" >> "${ENROOT_ENVIRON}"

Step 3: Create and import the NVIDIA HPC Benchmark container on user shared home

# Create container from NVIDIA NGC
enroot import docker://nvcr.io#nvidia/hpc-benchmarks:25.04
ls
nvidia+hpc-benchmarks+25.04.sqsh
enroot create --name ${USER}-hpc-benchmarks-25-04 nvidia+hpc-benchmarks+25.04.sqsh
# test on host
enroot start ${USER}-hpc-benchmarks-25-04 mpirun hostname
=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.04
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: No InfiniBand devices detected.
         Multi-node communication performance may be reduced.
         Ensure /dev/infiniband is mounted to this container.

execution-0
execution-0
execution-0
execution-0
execution-0
execution-0
execution-0
execution-0

# Manually create a hosts.txt file for further validation with Gridware
enroot start --rw --mount /shared_home/public/enroot/workspace/${USER}:/workspace ${USER}-hpc-benchmarks-25-04 mpirun -np 16 --hostfile /workspace/hosts.txt hostname

=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.04
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: No InfiniBand devices detected.
         Multi-node communication performance may be reduced.
         Ensure /dev/infiniband is mounted to this container.


=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.04
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: No InfiniBand devices detected.
         Multi-node communication performance may be reduced.
         Ensure /dev/infiniband is mounted to this container.

execution-0
execution-1
execution-1
execution-0
execution-0
execution-1
execution-1
execution-1
execution-1
execution-0
execution-1
execution-1
execution-0
execution-0
execution-0
execution-0

# Create a simmple job.sh script

cat job.sh
#!/bin/bash

echo "Host and slot selection:"
cat $PE_HOSTFILE

# Convert hostfile
awk '{print $1, "slots="$2}' $PE_HOSTFILE > /shared_home/public/enroot/workspace/${USER}/hosts.txt

echo slots $NSLOTS

enroot start --rw --mount /shared_home/public/enroot/workspace/${USER}:/workspace ${USER}-hpc-benchmarks-25-04 mpirun -np $NSLOTS --hostfile /workspace/hosts.txt hostname

Step 4: Submit a test job


qsub -pe enroot.pe 4 ./job.sh

Security Considerations

The upcoming version of Gridware Cluster Scheduler 9.1 introduces munge authentication support. Munge provides straightforward authentication that ensures consistent UNIX user and group ID validation across cluster hosts. This proven service operates as a daemon using shared secrets, enabling Gridware Cluster Scheduler to validate user IDs correctly even within containers using user ID namespaces.

For multi-user environments running containers, transitioning to munge authentication (GCS 9.1 feature) is highly recommended (a must to ensure isolation). Additionally, a simplified TLS encryption method is being introduced in Gridware Cluster Scheduler 9.1. The legacy CSP encryption mode has been deprecated due to its certificate management complexity and resulting limited adoption. The new encryption method operates without setup requirements, creating and updating certificates automatically.

Kubernetes Integration: Complementary Technologies

Kubernetes manages containerized workloads across multiple nodes, as do HPC workload managers. However, these systems serve different architectural purposes. Kubernetes' strength lies in its controller-based architecture using pods, StatefulSets, deployments — with extensibility for specialized workloads including MPI applications. For environments processing millions of daily batch jobs, Kubernetes overhead may be prohibitive. Job submission taking milliseconds on an HPC scheduler can require hundreds of milliseconds on Kubernetes just one example.

Yet Kubernetes offers compelling advantages: ubiquitous availability, widespread adoption, container-first design, and its emerging role for AI workload orchestration. Rather than viewing Kubernetes as competition, it should be embraced as a complementary technology for HPC environments.

At HPC Gridware, we leverage Kubernetes extensively for testing Gridware Cluster Scheduler. Kubernetes overlay networks enable running large virtual Gridware clusters within smaller physical Kubernetes clusters, with many execution daemons per physical node. This provides excellent density for test workloads. For HPC deployments, we can place one execution daemon per Kubernetes node within a StatefulSet.

Gridware Cluster Scheduler containers run on Kubernetes identically to bare-metal deployments. Specialized login node containers enable job submission via browser or SSH, replicating traditional HPC cluster workflows. Specific considerations include application delivery, storage integration, and user ID mapping.

Three application integration patterns are possible:

Applications on shared storage (works against container concept)
Pre-installed applications within Gridware Cluster Scheduler containers (can blow up container size; maintenance burden)
Containerized applications using enroot or Apptainer inside pods (container in container is newer concept)

We welcome discussions about specific deployment requirements and architectural considerations.

Summary

Modern container runtimes' capabilities—daemonless operation and unprivileged user-space execution—have eliminated the need for complex integration with external daemon processes. Container support in Gridware and Open Cluster Scheduler is comprehensive and fully supported. We provide site-specific optimization assistance and integration hooks for specialized requirements.

The diversity of container runtimes and image management strategies means no single solution addresses all use cases. This article clarifies that container workloads are first-class citizens in Open Cluster Scheduler and Gridware Cluster Scheduler, with our continued commitment to supporting evolving HPC containerization needs.

For further questions feel free to reach out to us directly.

Author: Daniel Gruber (dgruber@hpc-gridware.com)