10 KiB
title: "Control Group version 2 by hand" slug: "cgroup-v2-by-hand" description: null date: 2021-07-17T17:05:00+02:00 type: posts draft: false tags: - CGroup - Linux toc: true ---
We have Control Group v2 since 2016 but I had trouble finding good documentation on how to use it. Most tutorials and blog posts only cover v1 or are specific to systemd[1]. The kernel documentation is a great reference and the basis for this post but not always easy to follow. I will give you a few short examples on how to use it. I will not explain everything, but hopefully enough to get an idea and understand the reference better.
Your interface to cgroups is a special file-system. Most distributions have
cgroup v1 mounted at /sys/fs/cgroup
and cgroup v2 at
/sys/fs/cgroup/unified
. Some distributions removed v1 support by default and
have v2 mounted at /sys/fs/cgroup
. You can find out where cgroup v2 is mounted
with mount | grep cgroup2
. If it is not mounted, you can do it yourself with
mount -t cgroup2 none /sys/fs/cgroup/unified
. You can theoretically mount it
anywhere you like, but tools expect it in the path mentioned above. Going
forward I will assume you are in a terminal in the cgroup v2 directory.
Linux distributions should have all cgroup options compiled in. If you built the
kernel yourself, or you are missing files in /sys/fs/cgroup
, you can check
with zgrep CGROUP /proc/config.gz | grep -Ev 'DEBUG|=y'
if you are missing
anything important.
Note
|
All examples on this page are tested with kernel 5.10. |
Enabling controllers
There are 8 controllers
currently[2]: cpu, memory, io,
pids, cpuset, rdma, hugetlb and perf_event. You can find out which are
available with cat cgroup.controllers
. perf_event is automatically enabled,
all others have to be enabled explicitly, with echo "cpu +memory" >
cgroup.subtree_control` for example. You can disable a controller by using a `-`
instead of a `
.
Resources are distributed top-down and a cgroup can further distribute a resource only if the resource has been distributed to it from the parent. This means that all non-root “cgroup.subtree_control” files can only contain controllers which are enabled in the parent’s “cgroup.subtree_control” file. A controller can be enabled only if the parent has the controller enabled and a controller can’t be disabled if one or more children have it enabled. […] Non-root cgroups can distribute domain resources to their children only when they don’t have any processes of their own. In other words, only domain cgroups which don’t contain any processes can have domain controllers enabled in their “cgroup.subtree_control” files.[3]
We will keep it simple by only setting controllers globally in our root cgroup.
Controlling CPU usage
This control group will use the cpu controller[4]. Every process in this group will be deprioritized, all processes together can only use the power of 2 CPU cores.
echo "+cpu" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "50" > cpu.weight
echo "200000 100000" > cpu.max
Try adding your current shell to the group with echo "$$" > cgroup.procs
. All
processes you start from this shell are now in the same cgroup. But what does
the example do, exactly?
-
cpu.weight is the relative amount of CPU cycles the cgroup is getting under load. The CPU cycles are distributed by adding up the weights of all active children and giving each the fraction matching the ratio of its weight against the sum.[5] It has a range from 1 to 10,000. If one process has a weight of 3,000 and the only other active process has a weight of 7,000, the former will get 30% and the latter 70% of CPU cycles. The default is 100.
-
cpu.max sets the “maximum bandwidth limit”. We told the kernel that the processes should use at most 200,000 µs every 100,000 µs, meaning they can use the power of up to 2 cores.
Try running for process in $(seq 1 4); do (cat /dev/urandom > /dev/null &);
done
. You will see that the CPU usage of each process hovers at around 50%
instead of 100%. The processes were added to cgroup.procs.
Tip
|
You can add a cgroup column to htop by pressing F2 and then navigating to “Columns”. Select “CGROUP” in “Available Columns” and press Enter. |
Tip
|
cpu.weight.nice is an alternate interface to cpu.weight that uses the same values used by nice and has a range from -20 to 19. |
Controlling CPU core usage
This control group will use the cpuset controller[6] to restrict the processes to the CPU cores 0 and 3.
echo "+cpuset" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "0,3" > cpuset.cpus
echo "$$" > cgroup.procs
cpuset.cpus takes comma-separated numbers or ranges. For example: “0-4,6,8-10”.
Tip
|
You can add a CPU column to htop by pressing F2 and then navigating to “Columns”. Select “PROCESSOR” in “Available Columns” and press Enter. |
Controlling memory usage
This control group will use the memory controller[7]. All processes together can only use 1 GiB of memory at most.
echo "+memory" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "512M" > memory.high
echo "1G" > memory.max
echo "$$" > cgroup.procs
-
If the memory usage of a cgroup goes over memory.high it throttles allocations by forcing them into direct reclaim to work off the excess.
-
memory.max is the hard limit. If the cgroup reaches that limit and the memory usage can not be reduced, the OOM killer is invoked in the cgroup.
Controlling Input/Output usage
This control group will increase the IO priority and limit the write speed to 2
MiB a second using the io
controller[8]. IO limits are set per device. You need to
specify the major and minor device numbers of the device (not partition) you
want to limit (in my case it is “8:0” for /dev/sda
). Run lsblk
or cat
/proc/partitions
to find them out.
echo "+io" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "default 500" > io.weight
echo "8:0 wbps=$((2 * 1024 * 1024)) rbps=max" > io.max
echo "$$" > cgroup.procs
-
io.weight specifies the relative amount of IO time the cgroup can use in relation to its siblings and has a range from 1 to 10,000.[5] The priority can be overridden for individual devices with the major:minor syntax, like “8:0 90”. The default is value is 100.
-
io.max limits bytes per second (rbps/wbps) and/or IO operations per second (riops/wiops).
Try running dd if=/dev/zero bs=1M count=100 of=test.img
oflag=direct
[9]. You will see that the speed is around 2 MiB a second.
Controlling process numbers
This control group will limit the amount of processes to 10 using the process number controller[12].
echo "+pids" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo 10 > pids.max
echo "$$" > cgroup.procs
Try running for process in $(seq 1 10); do ((sleep 2 && echo ${process}) &);
done
. You will get error messages from your shell that it can not fork another
process.