Mastering Cgroups v2 in RHEL 10: Resource Control for System Admins
Published On: 1 May 2026
Objective
Cgroups control groups have been in the Linux kernel since 2008. For most of that time, they were something container runtimes and systemd used internally while admins stayed safely away from the details. That's changed. With RHEL 9 and 10 running exclusively on cgroups v2, and with containerized workloads running alongside traditional services on the same hosts, understanding how resource control actually works has become a practical skill rather than a niche one.
The jump from cgroups v1 to v2 wasn't just a version bump. The two are architecturally different in ways that matter. Cgroups v1 let you attach a process to multiple separate hierarchies one for CPU, one for memory, one for I/O. That sounds flexible but it created real inconsistency problems and made it hard to reason about what resources a given process actually had access to. Cgroups v2 uses a single unified hierarchy. Every process lives in exactly one place in the tree, and all resource controllers operate on that same tree. It's a cleaner model and RHEL 10 builds on it exclusively. This guide covers what cgroups v2 actually is under the hood, how systemd exposes it for day-to-day use, and how to apply resource limits that hold up in production without surprises.
The Architecture: What the Unified Hierarchy Actually Means
The cgroup hierarchy is a filesystem mounted at /sys/fs/cgroup. Every directory in that tree is a cgroup. Every process on the system belongs to exactly one cgroup. Resource limits you set on a cgroup apply to all processes in it and all its descendants.
# Look at the top of the cgroup hierarchy
ls /sys/fs/cgroup/
# See what controllers are available system-wide
cat /sys/fs/cgroup/cgroup.controllers
# See which controllers are enabled for child cgroups
cat /sys/fs/cgroup/cgroup.subtree_control
The controllers you'll see on RHEL 10 typically include cpu, cpuset, memory, io, pids, and rdma. Each controller manages a different class of resource. A controller has to be explicitly enabled in a cgroup's subtree_control file before child cgroups can use it. You rarely manipulate these files directly in production. Systemd sits on top of the cgroup hierarchy and manages it on your behalf. But knowing what's underneath helps a lot when things behave unexpectedly and you need to verify what limits are actually in effect.
# Verify RHEL 10 is using cgroups v2 (should show cgroup2)
mount | grep cgroup
# Or check directly
stat -f /sys/fs/cgroup | grep Type
# Type should be: cgroup2fs
How Systemd Maps to the Cgroup Tree
Systemd organizes everything into three main slices in the cgroup hierarchy:
- system.slice system services started by systemd units
- user.slice user sessions and user-level services
- machine.slice virtual machines and containers
Every service unit, scope, and slice gets its own subdirectory in /sys/fs/cgroup. When you set resource limits through systemd, it writes the appropriate values into those cgroup files. When you check limits through systemd, it reads from those same files.
# See the full cgroup tree with resource usage
systemd-cgls
# See resource usage per cgroup
systemd-cgtop
# Find which cgroup a specific process belongs to
cat /proc/$(pgrep nginx)/cgroup
systemd-cgtop is one of those tools that earns regular use. It shows you which services and slices are consuming the most CPU, memory, and I/O in real time the cgroup equivalent of top, but organized by service rather than by individual process.
Resource Controllers: What Each One Does
- CPU Controller
The CPU controller manages how much processor time a cgroup gets relative to others. In cgroups v2 this works through a weight system rather than the shares model from v1. The default weight is 100. A cgroup with weight 200 gets twice as much CPU time as one with weight 100 when the system is under load. When the system is idle, limits don't apply a low-weight cgroup can use all available CPU if nothing else needs it. There's also a quota system for hard limits: you can specify that a cgroup gets at most X microseconds of CPU time in every Y microseconds, regardless of what else is doing. This is CPU quota and it's what you use when you need a hard ceiling, not just relative priority.
- Memory Controller
The memory controller sets limits on physical RAM and swap usage. The key thing to understand about memory limits in cgroups v2 is the difference between memory.max and memory.high. memory.max is the hard limit hit it and the OOM killer runs. memory.high is a soft limit hit it and the kernel starts applying memory pressure to the cgroup, throttling allocations and encouraging reclaim, but the process isn't killed. Using memory.high as the primary control and memory.max as a safety net is a more graceful approach than relying solely on the hard limit.
- I/O Controller
The I/O controller manages read and write bandwidth and IOPS to block devices. You can set weights for relative priority or absolute bandwidth limits per device. This controller is more complex to configure correctly because limits are per-device you need to know the major:minor device number of the block device you want to limit.
- PID Controller
The PID controller caps how many processes a cgroup can create. This is primarily a security and stability control it prevents a runaway process or a fork bomb from consuming all available PID space and taking down the system. For containers and untrusted workloads, setting a reasonable PID limit is a cheap, high-value safeguard.
Setting Resource Limits on Systemd Services
The cleanest way to apply cgroup resource limits to a service is through the unit file. Systemd has a set of resource control directives that map directly to cgroup v2 controls.
CPU limits
# /etc/systemd/system/myapp.service
[Unit]
Description=My Application Service
[Service]
ExecStart=/usr/bin/myapp
User=myapp
# CPU weight (default 100, range 1-10000)
# This service gets half the CPU priority of default
CPUWeight=50
# Hard CPU quota: 200ms of CPU per 1000ms (20% of one core)
CPUQuota=20%
# On a multi-core system, 150% means 1.5 cores max
# CPUQuota=150%
[Install]
WantedBy=multi-user.target
Memory limits
[Service]
ExecStart=/usr/bin/myapp
# Soft memory limit kernel applies pressure above this
MemoryHigh=512M
# Hard memory limit OOM killer triggers above this
MemoryMax=768M
# Swap limit (in addition to RAM limit)
MemorySwapMax=256M
# Memory accounting must be enabled for limits to work
MemoryAccounting=yes
I/O limits
[Service]
ExecStart=/usr/bin/myapp
# I/O weight (default 100, range 1-10000)
IOWeight=50
# Hard bandwidth limit on a specific device
# Format: "major:minor bytes_per_second"
IOReadBandwidthMax=/dev/sda 50M
IOWriteBandwidthMax=/dev/sda 20M
# IOPS limits
IOReadIOPSMax=/dev/sda 1000
IOWriteIOPSMax=/dev/sda 500
PID limits
[Service]
ExecStart=/usr/bin/myapp
# Maximum number of processes this service can spawn
TasksMax=256
A complete example service unit
# /etc/systemd/system/webapp.service
[Unit]
Description=Web Application
After=network-online.target postgresql.service
Requires=postgresql.service
[Service]
Type=simple
User=webapp
Group=webapp
ExecStart=/opt/webapp/bin/webapp config /etc/webapp/config.yml
Restart=on-failure
RestartSec=5
# Resource limits
CPUWeight=100
CPUQuota=200%
MemoryHigh=1G
MemoryMax=1536M
MemorySwapMax=512M
MemoryAccounting=yes
IOWeight=100
TasksMax=512
[Install]
WantedBy=multi-user.target
# After editing the unit file
sudo systemctl daemon-reload
sudo systemctl restart webapp.service
# Verify limits are applied
sudo systemctl status webapp.service
sudo systemd-cgls /system.slice/webapp.service
Applying Limits Without Editing Unit Files
Sometimes you need to apply a resource limit to a running service without restarting it or editing unit files permanently. Systemd provides two ways to do this.
- Transient limits with systemctl set-property
Changes made with set-property take effect immediately on the running service. By default they also persist across reboots by writing to a drop-in file under /etc/systemd/system/.
# Apply a memory limit to a running service immediately
sudo systemctl set-property httpd.service MemoryMax=512M
# Apply without persisting (runtime only, lost on reboot)
sudo systemctl set-property runtime httpd.service MemoryMax=512M
# Apply a CPU quota immediately
sudo systemctl set-property httpd.service CPUQuota=50%
# Remove a property (set it to empty string)
sudo systemctl set-property httpd.service MemoryMax=
# Verify the change
sudo systemctl show httpd.service | grep -i memory
- Drop-in files
Drop-in files let you override specific parts of a unit file without modifying the original. This is the right approach for limits you want to persist on a service defined by a package you don't touch the package's unit file and your overrides survive package updates.
# Create a drop-in directory for an existing service
sudo mkdir -p /etc/systemd/system/nginx.service.d/
# Create the drop-in file
sudo vi /etc/systemd/system/nginx.service.d/limits.conf
# /etc/systemd/system/nginx.service.d/limits.conf
[Service]
CPUWeight=150
MemoryHigh=256M
MemoryMax=384M
TasksMax=512
sudo systemctl daemon-reload
sudo systemctl restart nginx.service
# Confirm the drop-in is loaded
sudo systemctl cat nginx.service
The systemctl cat output shows you the base unit file followed by any drop-ins, with clear separation between them. Use this to verify what's actually in effect before and after making changes.
Working With Slices
Slices are cgroup containers for organizing related services. The limits you set on a slice apply to the combined resource usage of everything inside it. This is how you cap the total resource consumption of a group of services rather than managing each one individually. Systemd creates system.slice, user.slice, and machine.slice automatically. You can create custom slices and assign services to them.
# /etc/systemd/system/webservices.slice
[Unit]
Description=Web Services Slice
Before=slices.target
[Slice]
# All services in this slice share at most 4 cores total
CPUQuota=400%
# Total memory ceiling for everything in this slice
MemoryMax=8G
MemoryHigh=6G
# /etc/systemd/system/nginx.service
[Unit]
Description=Nginx Web Server
[Service]
# Assign this service to the custom slice
Slice=webservices.slice
ExecStart=/usr/sbin/nginx -g 'daemon off;'
CPUWeight=200
MemoryMax=1G
sudo systemctl daemon-reload sudo systemctl start webservices.slice sudo systemctl start nginx.service # See the slice and its contents systemd-cgls /webservices.slice
The slice-level limit and the service-level limit work together. If Nginx tries to use more than 1G of memory, the service-level MemoryMax triggers. If all services in the slice combined try to exceed 8G, the slice-level limit triggers. The more restrictive limit wins.
Running One-Off Commands With Resource Limits
For batch jobs, maintenance scripts, or any command you want to run with controlled resource usage without creating a service unit, systemd-run is the tool. It creates a transient systemd unit, runs your command inside it with the limits you specify, and cleans up when the command finishes.
# Run a command with memory and CPU limits
sudo systemd-run scope \
-p MemoryMax=512M \
-p CPUQuota=50% \
/usr/bin/my-intensive-script.sh
# Run a database backup with limited I/O impact
sudo systemd-run scope \
-p IOWeight=10 \
-p CPUWeight=20 \
unit=db-backup \
/usr/local/bin/backup-postgres.sh
# Run interactively in a limited environment
sudo systemd-run pty scope \
-p MemoryMax=1G \
-p CPUQuota=100% \
-p TasksMax=64 \
/bin/bash
The scope flag creates a scope unit (for processes you start directly) rather than a service unit (for processes systemd manages). For interactive commands and scripts you're running yourself, scope is usually what you want. For something that needs to be managed like a background service, omit scope and systemd-run creates a service unit instead.
Monitoring Resource Usage
Setting limits is only half the job. You need to know whether services are approaching those limits, hitting them, and whether the limits are calibrated correctly.
# Real-time resource usage by cgroup
systemd-cgtop
# Detailed resource stats for a specific service
sudo systemctl status webapp.service
# Show all resource properties for a service
sudo systemctl show webapp.service | grep -E 'Memory|CPU|IO|Tasks'
# Check current memory usage directly from cgroup files
cat /sys/fs/cgroup/system.slice/webapp.service/memory.current
cat /sys/fs/cgroup/system.slice/webapp.service/memory.max
# Check CPU usage stats
cat /sys/fs/cgroup/system.slice/webapp.service/cpu.stat
# Check if OOM kill has occurred for a service
cat /sys/fs/cgroup/system.slice/webapp.service/memory.events
That last command is worth paying attention to. The memory.events file tracks how many times different memory events have occurred for a cgroup including OOM kills. If a service is intermittently dying and you suspect memory is the cause, checking this file gives you a quick answer.
# Check memory events look for oom_kill count
cat /sys/fs/cgroup/system.slice/webapp.service/memory.events
# Sample output:
# low 0
# high 145
# max 12
# oom 3
# oom_kill 3
A non-zero oom_kill count means processes have been killed by the OOM killer in this cgroup. A non-zero high count means the service has been hitting its soft memory limit and the kernel has been applying pressure. High counts of high events with zero OOM kills is a sign the soft limit is set appropriately the kernel is managing memory pressure without needing to kill anything.
Resource Limits for Containers
Podman uses cgroups v2 directly for container resource limits on RHEL 10. When you set resource limits through Podman, it writes them to the container's cgroup. Container cgroups live under machine.slice in the hierarchy.
# Run a container with CPU and memory limits
podman run -d \
name webapp \
cpus=2.0 \
memory=1g \
memory-swap=1536m \
docker.io/library/nginx:latest
# Limit CPU shares (weight equivalent)
podman run -d \
name backend \
cpu-shares=512 \
memory=2g \
pids-limit=256 \
myapp:latest
# Update limits on a running container
podman update memory=2g cpus=4.0 webapp
# Check container cgroup resource usage
podman stats webapp
# Inspect the cgroup directly
podman inspect webapp format '{{.CgroupsMode}}'
When you're running containers through Podman Quadlet as systemd services, you can apply resource limits at the systemd service level (which affects the container process from systemd's perspective) or at the Podman level inside the unit file using PodmanArgs:
# In a Quadlet .container file
[Container]
Image=docker.io/library/nginx:latest
PodmanArgs=cpus=2.0 memory=1g
[Service]
# systemd-level limits that wrap the entire container process
CPUWeight=100
MemoryMax=1536M
Diagnosing Resource Limit Problems
Service keeps getting OOM killed
# Check OOM kill events
cat /sys/fs/cgroup/system.slice/myservice.service/memory.events
# Check kernel OOM messages
sudo journalctl -k | grep -i "oom\|killed process"
# See current memory usage vs limit
systemctl show myservice.service | grep -E 'MemoryCurrent|MemoryMax|MemoryHigh'
Service is slow but not hitting limits
# Check CPU throttling statistics
cat /sys/fs/cgroup/system.slice/myservice.service/cpu.stat
# Look at: throttled_usec time spent throttled due to CPU quota
# Check I/O pressure
cat /sys/fs/cgroup/system.slice/myservice.service/io.pressure
The cpu.stat file's throttled_usec value tells you how many microseconds this cgroup has spent throttled by its CPU quota. If this number is large and growing, your CPUQuota is too restrictive for the workload. Either raise the quota or investigate whether the service is doing more work than expected.
Finding the cgroup path for any process
# Find cgroup for a process by PID cat /proc/12345/cgroup # Find cgroup for a process by name cat /proc/$(pgrep -f myapp)/cgroup # Navigate to the cgroup directory cd /sys/fs/cgroup$(cat /proc/$(pgrep -f myapp)/cgroup | cut -d: -f3)
Things That Catch People Out
A few patterns that come up often enough to call out explicitly.
- MemoryAccounting defaults: Memory accounting has overhead and isn't always enabled by default for every service. If you set
MemoryMaxon a service and it doesn't seem to take effect, verify thatMemoryAccounting=yesis set. Without accounting enabled, the memory controller can't track usage for that cgroup. - CPU quota on multi-core systems:
CPUQuota=100%means one full core. On a 16-core system, if you want a service to be able to use up to 4 cores, setCPUQuota=400%. This surprises people coming from Docker'scpusflag which uses a decimal (4.0 cores). Systemd uses percentage of total, not a core count. - Slice limits aren't hard per-service caps: A slice limit is a ceiling for the combined usage of everything in the slice. Individual services can use more than their proportional share as long as the total stays under the slice limit. If you need a hard per-service limit, set it at the service level, not just the slice level.
- I/O limits require device identifiers: The I/O controller applies limits per block device using
major:minornotation or the device path. A limit on/dev/sdadoesn't automatically apply to/dev/sdb. If a service writes to multiple devices, you need separate limits for each one you want to control.
Conclusion
Cgroups v2 and systemd's resource control directives give RHEL 10 admins a genuinely powerful toolkit for managing what workloads are allowed to consume. The unified hierarchy makes it much easier to reason about than cgroups v1, and the systemd integration means you don't need to manipulate cgroup files directly for normal operations. The practical starting point for most environments is memory limits on services that have known memory leaks or unbounded growth patterns, CPU quotas for batch jobs and background processes that should yield to interactive workloads, and PID limits as a cheap safety net on any service running untrusted code or handling external input.
Get comfortable with systemd-cgtop and memory.events as your regular monitoring tools. They tell you what's actually happening at the resource level in a way that top and free can't. Once you're watching those regularly, the calibration of limits becomes much less of a guessing game.