blog/content/posts/cgroup v2 by hand.adoc

8.8 KiB
Raw Blame History


title: "Control Group version 2 by hand" slug: "cgroup-v2-by-hand" description: null date: 2021-07-17T17:05:00+02:00 type: posts draft: false tags: - cgroup - Linux toc: true ---

We have Control Group v2 since 2016 but I had trouble finding good documentation on how to use it. Most tutorials and blog posts only cover v1 or are specific to systemd[1]. The kernel documentation is a great reference but not always easy to follow. I will give you a few short examples on how to use it. I will not explain everything, but hopefully enough to get an idea and understand the reference better.

Your interface to cgroups is a special file-system. Most distributions have cgroup v1 mounted at /sys/fs/cgroup and cgroup v2 at /sys/fs/cgroup/unified. Some distributions removed v1 support by default and have v2 mounted at /sys/fs/cgroup. You can find out where cgroup v2 is mounted with mount | grep cgroup2. If it is not mounted, you can do it yourself with mount -t cgroup2 none /sys/fs/cgroup/unified. You can theoretically mount it anywhere you like, but tools expect it in the path mentioned above. Going forward I will assume you are in a terminal in the cgroup v2 directory.

Linux distributions should have all cgroup options compiled in. If you built the kernel yourself, or you are missing files in /sys/fs/cgroup, you can check with zgrep CGROUP /proc/config.gz | grep -Ev 'DEBUG|=y' if you are missing anything important.

Note
All examples on this page are tested with kernel 5.10.

Enabling controllers

There are 8 controllers currently[2]: cpu, memory, io, pids, cpuset, rdma, hugetlb and perf_event. You can find out which are available with cat cgroup.controllers. perf_event is automatically enabled, all others have to be enabled explicitly, with echo "cpu +memory" > cgroup.subtree_control` for example. You can disable a controller by using a `-` instead of a `.

Resources are distributed top-down and a cgroup can further distribute a resource only if the resource has been distributed to it from the parent. This means that all non-root “cgroup.subtree_control” files can only contain controllers which are enabled in the parents “cgroup.subtree_control” file. A controller can be enabled only if the parent has the controller enabled and a controller cant be disabled if one or more children have it enabled. […] Non-root cgroups can distribute domain resources to their children only when they dont have any processes of their own. In other words, only domain cgroups which dont contain any processes can have domain controllers enabled in their “cgroup.subtree_control” files.[3]

Kernel documentation on Control Group v2

We will keep it simple by only setting controllers globally in our root cgroup.

Controlling CPU usage

This control group will use the cpu controller[2]. Every process in this group will be deprioritized, all processes together can only use the power of 2 CPU cores.

echo "+cpu" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "10" > cpu.weight.nice
echo "200000 100000" > cpu.max

Try adding your current shell to the group with echo "$$" > cgroup.procs. All processes you start from this shell are now in the same cgroup. But what does the example do, exactly?

  • cpu.weight.nice works like the nice command and has a range from -20 to 19. It is an alternate interface to cpu.weight which has a range from 1 to 10,000.

  • cpu.max sets the “maximum bandwidth limit”. We told the kernel that the processes should use at most 200,000 µs every 100,000 µs, meaning they can use the power of up to 2 cores.

Try running for process in $(seq 1 4); do (cat /dev/urandom > /dev/null &); done. You will see that the CPU usage of each process hovers at around 50% instead of 100%. The processes were added to cgroup.procs.

Tip
You can add a cgroup column to htop by pressing F2 and then navigating to “Columns”. Select “CGROUP” in “Available Columns” and press Enter.

Controlling CPU core usage

This control group will use the cpuset controller to restrict the processes to the CPU cores 0 and 3.

echo "+cpuset" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "0,3" > cpuset.cpus
echo "$$" > cgroup.procs

cpuset.cpus takes comma-separated numbers or ranges. For example: “0-4,6,8-10”.

Tip
You can add a CPU column to htop by pressing F2 and then navigating to “Columns”. Select “PROCESSOR” in “Available Columns” and press Enter.

Controlling memory usage

This control group will use the memory controller[2]. All processes together can only use 1 GiB of memory at most.

echo "+memory" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "512M" > memory.high
echo "1G" > memory.max
echo "$$" > cgroup.procs
  • If the memory usage of a cgroup goes over memory.high it throttles allocations by forcing them into direct reclaim to work off the excess.

  • memory.max is the hard limit. If the cgroup reaches that limit and the memory usage can not be reduced, the OOM killer is invoked in the cgroup.

Controlling Input/Output usage

This control group will limit the write speed to 2 MiB a second using the io controller[2]. IO limits are set per device. You need to specify the major and minor device numbers of the device (not partition) you want to limit (in my case it is “8:0” for /dev/sda). Run lsblk or cat /proc/partitions to find them out.

echo "+io" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "8:0 wbps=$((2 * 1024 * 1024)) rbps=max" > io.max
echo "$$" > cgroup.procs

io.max limits bytes per second (rbps/wbps) and/or IO operations per second (riops/wiops).

Try running dd if=/dev/zero bs=1M count=100 of=test.img oflag=direct[4]. You will see that the speed is around 2 MiB a second.

Tip
Kernel 5.14 introduced blkio.prio.class[5] that controls the IO priority. It seems to work like ionice. I could not test it yet, since I run kernel 5.10.

Controlling process numbers

This control group will limit the amount of processes to 10 using the process number controller[2].

echo "+pids" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo 10 > pids.max
echo "$$" > cgroup.procs

Try running for process in $(seq 1 10); do ((sleep 2 && echo ${process}) &); done. You will get error messages from your shell that it cant fork another process.