From 07260be96c45aee609c7ecb7d77e5e35ec10dcf4 Mon Sep 17 00:00:00 2001 From: tastytea Date: Sat, 17 Jul 2021 16:41:58 +0200 Subject: [PATCH] =?UTF-8?q?Add=20=E2=80=9CControl=20Group=20version=202=20?= =?UTF-8?q?by=20hand=E2=80=9D.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- content/posts/cgroup v2 by hand.adoc | 214 +++++++++++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 content/posts/cgroup v2 by hand.adoc diff --git a/content/posts/cgroup v2 by hand.adoc b/content/posts/cgroup v2 by hand.adoc new file mode 100644 index 0000000..19ad616 --- /dev/null +++ b/content/posts/cgroup v2 by hand.adoc @@ -0,0 +1,214 @@ +--- +title: "Control Group version 2 by hand" +slug: "cgroup-v2-by-hand" +description: null +date: 2021-07-17T17:05:00+02:00 +type: posts +draft: false +tags: +- cgroup +- Linux +toc: true +--- + +:source-highlighter: pygments +:idprefix: +:experimental: true +:toc: +:toclevels: 2 + +:url-openrc: https://wiki.gentoo.org/wiki/OpenRC/CGroups +:url-kernel-doc: https://www.kernel.org/doc/html/v5.10/admin-guide/cgroup-v2.html +:url-kernel-doc-14: https://www.kernel.org/doc/html/v5.14-rc1/admin-guide/cgroup-v2.html +:url-nice: https://manpages.debian.org/buster/coreutils/nice.1.en.html +:url-htop: https://htop.dev/ +:url-ionice: https://manpages.debian.org/buster/util-linux/ionice.1.en.html + +We have Control Group v2 since 2016 but I had trouble finding good documentation +on how to use it. Most tutorials and blog posts only cover v1 or are specific to +systemdfootnote:[If you are looking for OpenRC specific documentation, take a +look at the link:{url-openrc}[article in the Gentoo Wiki]]. The +link:{url-kernel-doc}[kernel documentation] is a great reference but not always +easy to follow. I will give you a few short examples on how to use it. I will +not explain everything, but hopefully enough to get an idea and understand the +reference better. + +Your interface to cgroups is a special file-system. Most distributions have +cgroup v1 mounted at `/sys/fs/cgroup` and cgroup v2 at +`/sys/fs/cgroup/unified`. Some distributions removed v1 support by default and +have v2 mounted at `/sys/fs/cgroup`. You can find out where cgroup v2 is mounted +with `mount | grep cgroup2`. If it is not mounted, you can do it yourself with +`mount -t cgroup2 none /sys/fs/cgroup/unified`. You can theoretically mount it +anywhere you like, but tools expect it in the path mentioned above. Going +forward I will assume you are in a terminal in the cgroup v2 directory. + +Linux distributions should have all cgroup options compiled in. If you built the +kernel yourself, or you are missing files in `/sys/fs/cgroup`, you can check +with `zgrep CGROUP /proc/config.gz | grep -Ev 'DEBUG|=y'` if you are missing +anything important. + +[NOTE] +All examples on this page are tested with kernel 5.10. + +== Enabling controllers + +There are 8 controllers +currentlyfootnote:controllers[link:{url-kernel-doc}#controllers[Kernel +documentation on Control Group v2, section “Controllers”]]: cpu, memory, io, +pids, cpuset, rdma, hugetlb and perf_event. You can find out which are +available with `cat cgroup.controllers`. perf_event is automatically enabled, +all others have to be enabled explicitly, with `echo "+cpu +memory" > +cgroup.subtree_control` for example. You can disable a controller by using a `-` +instead of a `+`. + +[quote, citetitle = "Kernel documentation on Control Group v2"] +________________________________________________________________________________ +Resources are distributed top-down and a cgroup can further distribute a +resource only if the resource has been distributed to it from the parent. This +means that all non-root “cgroup.subtree_control” files can only contain +controllers which are enabled in the parent’s “cgroup.subtree_control” file. A +controller can be enabled only if the parent has the controller enabled and a +controller can’t be disabled if one or more children have it enabled. […] +Non-root cgroups can distribute domain resources to their children only when +they don’t have any processes of their own. In other words, __only domain +cgroups which don’t contain any processes can have domain controllers enabled in +their “cgroup.subtree_control” +files.__footnote:[link:{url-kernel-doc}#top-down-constraint[Kernel documentation +on Control Group v2, sections “Top-down Constraint” and “No Internal Process +Constraint”]] +________________________________________________________________________________ + +We will keep it simple by only setting controllers globally in our root cgroup. + +== Controlling CPU usage + +This control group will use the cpu controllerfootnote:controllers[]. Every +process in this group will be deprioritized, all processes together can only use +the power of 2 CPU cores. + +[source,shell] +-------------------------------------------------------------------------------- +echo "+cpu" > cgroup.subtree_control +mkdir testgroup +cd testgroup +echo "10" > cpu.weight.nice +echo "200000 100000" > cpu.max +-------------------------------------------------------------------------------- + +Try adding your current shell to the group with `echo "$$" > cgroup.procs`. All +processes you start from this shell are now in the same cgroup. But what does +the example do, exactly? + +- *cpu.weight.nice* works like the link:{url-nice}[nice] command and has a range + from -20 to 19. It is an alternate interface to *cpu.weight* which has a range + from 1 to 10,000. +- *cpu.max* sets the “maximum bandwidth limit”. We told the kernel that the + processes should use at most 200,000 µs every 100,000 µs, meaning they can use + the power of up to 2 cores. + +Try running `for process in $(seq 1 4); do (cat /dev/urandom > /dev/null &); +done`. You will see that the CPU usage of each process hovers at around 50% +instead of 100%. The processes were added to *cgroup.procs*. + +[TIP] +You can add a cgroup column to link:{url-htop}[htop] by pressing kbd:[F2] and +then navigating to “Columns”. Select “CGROUP” in “Available Columns” and press +kbd:[Enter]. + +=== Controlling CPU core usage + +This control group will use the cpuset controller to restrict the processes to +the CPU cores 0 and 3. + +[source,shell] +-------------------------------------------------------------------------------- +echo "+cpuset" > cgroup.subtree_control +mkdir testgroup +cd testgroup +echo "0,3" > cpuset.cpus +echo "$$" > cgroup.procs +-------------------------------------------------------------------------------- + +*cpuset.cpus* takes comma-separated numbers or ranges. For example: +“0-4,6,8-10”. + +[TIP] +You can add a CPU column to link:{url-htop}[htop] by pressing kbd:[F2] and then +navigating to “Columns”. Select “PROCESSOR” in “Available Columns” and press +kbd:[Enter]. + +== Controlling memory usage + +This control group will use the memory controllerfootnote:controllers[]. All +processes together can only use 1 +pass:[GiB] of memory at most. + +[source,shell] +-------------------------------------------------------------------------------- +echo "+memory" > cgroup.subtree_control +mkdir testgroup +cd testgroup +echo "512M" > memory.high +echo "1G" > memory.max +echo "$$" > cgroup.procs +-------------------------------------------------------------------------------- + +- If the memory usage of a cgroup goes over *memory.high* it throttles + allocations by forcing them into direct reclaim to work off the excess. +- *memory.max* is the hard limit. If the cgroup reaches that limit and the + memory usage can not be reduced, the + pass:[OOM] killer is invoked in the + cgroup. + +== Controlling Input/Output usage + +This control group will limit the write speed to 2 +pass:[MiB] a second using the io +controllerfootnote:controllers[]. IO limits are set per device. You need to +specify the major and minor device numbers of the _device_ (not partition) you +want to limit (in my case it is “8:0” for `/dev/sda`). Run `lsblk` or `cat +/proc/partitions` to find them out. + +[source,shell] +-------------------------------------------------------------------------------- +echo "+io" > cgroup.subtree_control +mkdir testgroup +cd testgroup +echo "8:0 wbps=$((2 * 1024 * 1024)) rbps=max" > io.max +echo "$$" > cgroup.procs +-------------------------------------------------------------------------------- + +*io.max* limits bytes per second (_rbps/wbps_) and/or IO operations per + second (_riops/wiops_). + +Try running ``dd if=/dev/zero bs=1M count=100 of=test.img +oflag=direct``footnote:[`oflag=direct` opens the file with the `O_DIRECT` flag, +bypassing caches.]. You will see that the speed is around 2 MiB a second. + +[TIP] +Kernel 5.14 introduced +**blkio.prio.class**footnote:[link:{url-kernel-doc-14}#io-priority[Kernel +documentation on Control Group v2, section “IO Priority”]] that controls the IO +priority. It seems to work like link:{url-ionice}[ionice]. I could not test it +yet, since I run kernel 5.10. + +== Controlling process numbers + +This control group will limit the amount of processes to 10 using the process +number controllerfootnote:controllers[]. + +[source,shell] +-------------------------------------------------------------------------------- +echo "+pids" > cgroup.subtree_control +mkdir testgroup +cd testgroup +echo 10 > pids.max +echo "$$" > cgroup.procs +-------------------------------------------------------------------------------- + +Try running `for process in $(seq 1 10); do ((sleep 2 && echo ${process}) &); +done`. You will get error messages from your shell that it can't fork another +process. + + +// LocalWords: cgroups cgroup cpuset