HPC Gridware Releases Gridware Cluster Scheduler 9.1.0 and Open Cluster Scheduler 9.1.0

March 12, 2026
,

Smarter Binding, Stronger Security, and Full Cluster Visibility

Managing HPC and AI workloads at scale means balancing performance, security, and operational simplicity — often across thousands of cores, multiple architectures, and diverse software stacks. With Gridware Cluster Scheduler 9.1.0, we’ve tackled all three.

This release is the most feature-rich update we’ve shipped to date. It introduces a redesigned topology-aware binding framework, new TLS-encrypted communication, optional munge integration,  native Prometheus and Grafana monitoring, a preview of our new Qontrol web UI, and much more — all while maintaining full backward compatibility with the SGE ecosystem.

Gridware Cluster Scheduler is built on top of Open Cluster Scheduler, our free and open-source workload scheduler, which is also being released as version 9.1.0 today. Many of the core enhancements in this release — including the advanced binding framework, Munge authentication, systemd integration, and numerous fixes — are available in both editions.

Here’s what’s inside.

Topology-Aware Job Placement, Reimagined

Getting the best performance out of modern hardware means placing jobs exactly where they belong — on the right cores, close to the right caches, within the right NUMA domains. Until now, binding in cluster schedulers has largely been an afterthought handled by execution daemons after the fact.

In 9.1.0, we’ve changed that fundamentally. Binding is now a first-class scheduling concept. The scheduler itself detects, sorts, reserves, and assigns binding units — threads, cores, sockets, dies, L2/L3 caches, and NUMA domains — as consumable resources during the scheduling decision.

What does this mean in practice?

  • Deterministic placement: Jobs land exactly where the scheduler intends, not wherever the OS happens to put them.
  • Hybrid CPU support: Performance and efficiency cores (P-cores and E-cores) are distinguished and can be targeted independently.
  • Reservation-aware: Binding units participate in resource reservations and advance reservations, preventing oversubscription.
  • Highly configurable: The new -b… option family (-bamount, -bunit, -btype, -bfilter, -bsort, -bstart, -bstop, -binstance) gives users precise control over placement strategies.

For workloads where cache locality, memory bandwidth, or core affinity directly affect runtime — think computational fluid dynamics, molecular simulations, or large-scale training jobs — this is a game changer.

 

Security That Keeps Up with the Threat Landscape

HPC clusters are increasingly targeted infrastructure. Version 9.1.0 introduces three layers of security enhancement:

TLS Encryption All internal communication between Gridware Cluster Scheduler components — qmaster, execution daemons, clients — can now be encrypted with TLS. Certificate generation and renewal are automatic, lifetimes are configurable, and enablement is straightforward through the installer or bootstrap configuration. This puts us on the path toward full Cyber Resilience Act (CRA) compliance.

Munge Authentication For environments running containers or leveraging user namespaces, Munge provides lightweight, secure authentication that’s easy to deploy and highly recommended as the default auth mechanism in those scenarios.

DoS Protection A new request-rate limiting mechanism protects the qmaster daemon from being overwhelmed by excessive GDI requests — whether from a misconfigured monitoring script or a genuine attack. Limits are fine-grained: per-source, per-user, per-host, and per-object, with clear error reporting when thresholds are exceeded.

See Everything: Prometheus & Grafana Integration

Cluster administrators have always needed better observability. With qtelemetry now entering beta, Gridware Cluster Scheduler 9.1.0 provides native metrics export to Prometheus and Grafana.

The exporter covers host-level metrics (CPU load, memory, GPU availability), job-level metrics (queued, running, errored jobs, wait times), and qmaster internals (CPU/memory usage, spooling filesystem info). A pre-configured Grafana dashboard is available for immediate use — plug it in and you have real-time cluster visibility within minutes.

For smaller environments, optional per-job metrics provide even deeper observability into individual workload behavior.

Meet Qontrol: A Modern UI for Cluster Administration

Let’s be honest: qmon served its purpose, but it belongs to a different era. That’s why 9.1.0 ships with Qontrol, our new REST-based Cluster Configuration UI.

Qontrol provides a clean, intuitive web interface for the configuration tasks administrators perform every day: managing hosts and host groups, configuring queues and parallel environments, editing user sets, projects, resource quotas, calendars, and global settings. It also introduces conveniences like clone functionality — duplicate an existing queue or PE configuration as a starting point instead of building from scratch.

Qontrol is included in 9.1.0 packages and ready to explore. We’re actively developing it and welcome feedback from early adopters.

 

GPU Workloads and License Management, Simplified

Enhanced qgpu Building on the GPU tool we introduced in 9.0.2, this release deepens NVIDIA GPU support through tighter DCGM integration. qgpu serves as a load sensor, prolog/epilog handler, and per-job GPU accounting tool in one — automatically reporting GPU usage and power consumption in standard qacct -j output. If you’re running AI training or inference workloads, GPU resource management just became significantly easier.

FlexNet License Manager Integration Commercial software licenses are expensive, and letting them sit idle while jobs queue is wasteful. The new license-manager tool automatically discovers FlexNet licenses, monitors availability in real time as a load sensor, and tracks external consumption — so the scheduler always knows exactly how many licenses are actually available. No more manual complex configuration. No more job failures from license exhaustion.

Systemd: Native Linux Service Management

Gridware Cluster Scheduler is now fully integrated with systemd. Daemons are managed as native systemd services with proper lifecycle control, and jobs can optionally run under systemd supervision — enabling cgroup-based accounting, core binding enforcement, device isolation, and clean process management.

Usage data can be collected via systemd, the built-in data collector, or a hybrid of both. The mode is configurable per cluster, giving administrators flexibility to balance detail against overhead.

More Highlights Worth Knowing

Decrease Resources of Running Jobs Long-running job holding a license it no longer needs? The new qalter -when now option lets you free resources — licenses, memory, or any consumable — from a running job without restarting it, making them immediately available to other jobs in the queue.

Department View (Experimental) In large, shared clusters, not every user needs to see every queue and every job. Department View restricts visibility and access to cluster objects by department, simplifying the interface and improving operational hygiene. Administrators can enforce it globally or users can opt in individually.

Faulty Job Loadsensor When jobs fail, their spool files (shepherd traces, error files, usage data) are now automatically collected and archived to a configurable location — making post-mortem debugging straightforward instead of a scavenger hunt across execution hosts.

Extensible JSON Accounting Job accounting data is now available in standard JSON format, making it easy to integrate with enterprise reporting systems, data lakes, and custom dashboards. You can even define your own resource metrics — such as per-job energy consumption — within the standard accounting framework.

Certified on SLES 15 SP7 — and Much More

Platform support has always been a strength, and 9.1.0 extends it further. This release is certified on SUSE Linux Enterprise Server (SLES) 15 SP7 and supports a broad matrix of architectures and distributions:

  • Architectures: x86-64, ARM64, ppc64le, s390x, RISC-V64
  • Distributions: RHEL 7–10, Rocky Linux 8–10, Alma Linux 8–10, CentOS 7–9, Ubuntu 20.04–26.04, SUSE Leap 15, SUSE SLES 15, Raspbian 11–12, FreeBSD 13–14

Whether you’re running on-premise bare metal, cloud instances on AWS, Azure, or Google Cloud, or a hybrid of both — GCS 9.1.0 fits your infrastructure.

Drop-In SGE Compatibility and Strategic Partnerships

Gridware Cluster Scheduler integrates with the full ecosystem of SGE-compatible platforms, tools, and frameworks. If your environment runs on SGE today — including any major MPI implementation, DRMAA-based workflow, or commercial application with SGE integration — GCS is a drop-in replacement with no rework required.

 

Beyond broad compatibility, we’ve established strategic partnerships with HPC Box and the EF Portal to deliver jointly supported, validated integrations that streamline HPC and VDI deployment for enterprise customers.

Open Cluster Scheduler 9.1.0: Free and Open Source

Gridware Cluster Scheduler is the enterprise edition of Open Cluster Scheduler (OCS), our free and open-source HPC workload scheduler. OCS 9.1.0 is released alongside this update and shares many of the same core enhancements, including the advanced topology-aware binding framework (with limited placement support), Munge authentication, systemd integration, the new qsub -sync r option, and a comprehensive set of bug fixes and performance improvements.

Open Cluster Scheduler is ideal for academic institutions, research labs, and teams looking to get started with a production-quality scheduler at no cost — with a clear upgrade path to the enterprise features in Gridware Cluster Scheduler when your needs grow.

Download Open Cluster Scheduler 9.1.0 →

Enterprise-only features such as qgpu, qtelemetry, Qontrol, FlexNet license management, DoS protection, TLS encryption, and Department View are available exclusively in Gridware Cluster Scheduler.

Try It Today

Gridware Cluster Scheduler 9.1.0 is available now and already running in production at some of the world’s most demanding computing sites. Whether you’re an existing customer looking to upgrade or evaluating GCS for the first time, we’d love to hear from you.

Questions? Reach us at sales@hpc-gridware.com.