Add “Control Group version 2 by hand”.

This commit is contained in:
tastytea 2021-07-17 16:41:58 +02:00
parent acdfb79b91
commit 07260be96c
Signed by: tastytea
GPG Key ID: CFC39497F1B26E07
1 changed files with 214 additions and 0 deletions

View File

@ -0,0 +1,214 @@
---
title: "Control Group version 2 by hand"
slug: "cgroup-v2-by-hand"
description: null
date: 2021-07-17T17:05:00+02:00
type: posts
draft: false
tags:
- cgroup
- Linux
toc: true
---
:source-highlighter: pygments
:idprefix:
:experimental: true
:toc:
:toclevels: 2
:url-openrc: https://wiki.gentoo.org/wiki/OpenRC/CGroups
:url-kernel-doc: https://www.kernel.org/doc/html/v5.10/admin-guide/cgroup-v2.html
:url-kernel-doc-14: https://www.kernel.org/doc/html/v5.14-rc1/admin-guide/cgroup-v2.html
:url-nice: https://manpages.debian.org/buster/coreutils/nice.1.en.html
:url-htop: https://htop.dev/
:url-ionice: https://manpages.debian.org/buster/util-linux/ionice.1.en.html
We have Control Group v2 since 2016 but I had trouble finding good documentation
on how to use it. Most tutorials and blog posts only cover v1 or are specific to
systemdfootnote:[If you are looking for OpenRC specific documentation, take a
look at the link:{url-openrc}[article in the Gentoo Wiki]]. The
link:{url-kernel-doc}[kernel documentation] is a great reference but not always
easy to follow. I will give you a few short examples on how to use it. I will
not explain everything, but hopefully enough to get an idea and understand the
reference better.
Your interface to cgroups is a special file-system. Most distributions have
cgroup v1 mounted at `/sys/fs/cgroup` and cgroup v2 at
`/sys/fs/cgroup/unified`. Some distributions removed v1 support by default and
have v2 mounted at `/sys/fs/cgroup`. You can find out where cgroup v2 is mounted
with `mount | grep cgroup2`. If it is not mounted, you can do it yourself with
`mount -t cgroup2 none /sys/fs/cgroup/unified`. You can theoretically mount it
anywhere you like, but tools expect it in the path mentioned above. Going
forward I will assume you are in a terminal in the cgroup v2 directory.
Linux distributions should have all cgroup options compiled in. If you built the
kernel yourself, or you are missing files in `/sys/fs/cgroup`, you can check
with `zgrep CGROUP /proc/config.gz | grep -Ev 'DEBUG|=y'` if you are missing
anything important.
[NOTE]
All examples on this page are tested with kernel 5.10.
== Enabling controllers
There are 8 controllers
currentlyfootnote:controllers[link:{url-kernel-doc}#controllers[Kernel
documentation on Control Group v2, section “Controllers”]]: cpu, memory, io,
pids, cpuset, rdma, hugetlb and perf_event. You can find out which are
available with `cat cgroup.controllers`. perf_event is automatically enabled,
all others have to be enabled explicitly, with `echo "+cpu +memory" >
cgroup.subtree_control` for example. You can disable a controller by using a `-`
instead of a `+`.
[quote, citetitle = "Kernel documentation on Control Group v2"]
________________________________________________________________________________
Resources are distributed top-down and a cgroup can further distribute a
resource only if the resource has been distributed to it from the parent. This
means that all non-root “cgroup.subtree_control” files can only contain
controllers which are enabled in the parents “cgroup.subtree_control” file. A
controller can be enabled only if the parent has the controller enabled and a
controller cant be disabled if one or more children have it enabled. […]
Non-root cgroups can distribute domain resources to their children only when
they dont have any processes of their own. In other words, __only domain
cgroups which dont contain any processes can have domain controllers enabled in
their “cgroup.subtree_control”
files.__footnote:[link:{url-kernel-doc}#top-down-constraint[Kernel documentation
on Control Group v2, sections “Top-down Constraint” and “No Internal Process
Constraint”]]
________________________________________________________________________________
We will keep it simple by only setting controllers globally in our root cgroup.
== Controlling CPU usage
This control group will use the cpu controllerfootnote:controllers[]. Every
process in this group will be deprioritized, all processes together can only use
the power of 2 CPU cores.
[source,shell]
--------------------------------------------------------------------------------
echo "+cpu" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "10" > cpu.weight.nice
echo "200000 100000" > cpu.max
--------------------------------------------------------------------------------
Try adding your current shell to the group with `echo "$$" > cgroup.procs`. All
processes you start from this shell are now in the same cgroup. But what does
the example do, exactly?
- *cpu.weight.nice* works like the link:{url-nice}[nice] command and has a range
from -20 to 19. It is an alternate interface to *cpu.weight* which has a range
from 1 to 10,000.
- *cpu.max* sets the “maximum bandwidth limit”. We told the kernel that the
processes should use at most 200,000 µs every 100,000 µs, meaning they can use
the power of up to 2 cores.
Try running `for process in $(seq 1 4); do (cat /dev/urandom > /dev/null &);
done`. You will see that the CPU usage of each process hovers at around 50%
instead of 100%. The processes were added to *cgroup.procs*.
[TIP]
You can add a cgroup column to link:{url-htop}[htop] by pressing kbd:[F2] and
then navigating to “Columns”. Select “CGROUP” in “Available Columns” and press
kbd:[Enter].
=== Controlling CPU core usage
This control group will use the cpuset controller to restrict the processes to
the CPU cores 0 and 3.
[source,shell]
--------------------------------------------------------------------------------
echo "+cpuset" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "0,3" > cpuset.cpus
echo "$$" > cgroup.procs
--------------------------------------------------------------------------------
*cpuset.cpus* takes comma-separated numbers or ranges. For example:
“0-4,6,8-10”.
[TIP]
You can add a CPU column to link:{url-htop}[htop] by pressing kbd:[F2] and then
navigating to “Columns”. Select “PROCESSOR” in “Available Columns” and press
kbd:[Enter].
== Controlling memory usage
This control group will use the memory controllerfootnote:controllers[]. All
processes together can only use 1
pass:[<abbr title="Gibibyte, 1024 Mibibyte">GiB</abbr>] of memory at most.
[source,shell]
--------------------------------------------------------------------------------
echo "+memory" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "512M" > memory.high
echo "1G" > memory.max
echo "$$" > cgroup.procs
--------------------------------------------------------------------------------
- If the memory usage of a cgroup goes over *memory.high* it throttles
allocations by forcing them into direct reclaim to work off the excess.
- *memory.max* is the hard limit. If the cgroup reaches that limit and the
memory usage can not be reduced, the
pass:[<abbr title="Out Of Memory">OOM</abbr>] killer is invoked in the
cgroup.
== Controlling Input/Output usage
This control group will limit the write speed to 2
pass:[<abbr title="Mebibyte, 1024 Kibibyte">MiB</abbr>] a second using the io
controllerfootnote:controllers[]. IO limits are set per device. You need to
specify the major and minor device numbers of the _device_ (not partition) you
want to limit (in my case it is “8:0” for `/dev/sda`). Run `lsblk` or `cat
/proc/partitions` to find them out.
[source,shell]
--------------------------------------------------------------------------------
echo "+io" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo "8:0 wbps=$((2 * 1024 * 1024)) rbps=max" > io.max
echo "$$" > cgroup.procs
--------------------------------------------------------------------------------
*io.max* limits bytes per second (_rbps/wbps_) and/or IO operations per
second (_riops/wiops_).
Try running ``dd if=/dev/zero bs=1M count=100 of=test.img
oflag=direct``footnote:[`oflag=direct` opens the file with the `O_DIRECT` flag,
bypassing caches.]. You will see that the speed is around 2 MiB a second.
[TIP]
Kernel 5.14 introduced
**blkio.prio.class**footnote:[link:{url-kernel-doc-14}#io-priority[Kernel
documentation on Control Group v2, section “IO Priority”]] that controls the IO
priority. It seems to work like link:{url-ionice}[ionice]. I could not test it
yet, since I run kernel 5.10.
== Controlling process numbers
This control group will limit the amount of processes to 10 using the process
number controllerfootnote:controllers[].
[source,shell]
--------------------------------------------------------------------------------
echo "+pids" > cgroup.subtree_control
mkdir testgroup
cd testgroup
echo 10 > pids.max
echo "$$" > cgroup.procs
--------------------------------------------------------------------------------
Try running `for process in $(seq 1 10); do ((sleep 2 && echo ${process}) &);
done`. You will get error messages from your shell that it can't fork another
process.
// LocalWords: cgroups cgroup cpuset