...

Text file src/github.com/opencontainers/runc/docs/systemd.md

Documentation: github.com/opencontainers/runc/docs

     1## systemd cgroup driver
     2
     3By default, runc creates cgroups and sets cgroup limits on its own (this mode
     4is known as fs cgroup driver). When `--systemd-cgroup` global option is given
     5(as in e.g. `runc --systemd-cgroup run ...`), runc switches to systemd cgroup
     6driver. This document describes its features and peculiarities.
     7
     8### systemd unit name and placement
     9
    10When creating a container, runc requests systemd (over dbus) to create
    11a transient unit for the container, and place it into a specified slice.
    12
    13The name of the unit and the containing slice is derived from the container
    14runtime spec in the following way:
    15
    161. If `Linux.CgroupsPath` is set, it is expected to be in the form
    17   `[slice]:[prefix]:[name]`.
    18
    19   Here `slice` is a systemd slice under which the container is placed.
    20   If empty, it defaults to `system.slice`, except when cgroup v2 is
    21   used and rootless container is created, in which case it defaults
    22   to `user.slice`.
    23
    24   Note that `slice` can contain dashes to denote a sub-slice
    25   (e.g. `user-1000.slice` is a correct notation, meaning a subslice
    26   of `user.slice`), but it must not contain slashes (e.g.
    27   `user.slice/user-1000.slice` is invalid).
    28
    29   A `slice` of `-` represents a root slice.
    30
    31   Next, `prefix` and `name` are used to compose the  unit name, which
    32   is `<prefix>-<name>.scope`, unless `name` has `.slice` suffix, in
    33   which case `prefix` is ignored and the `name` is used as is.
    34
    352. If `Linux.CgroupsPath` is not set or empty, it works the same way as if it
    36   would be set to `:runc:<container-id>`. See the description above to see
    37   what it transforms to.
    38
    39As described above, a unit being created can either be a scope or a slice.
    40For a scope, runc specifies its parent slice via a _Slice=_ systemd property,
    41and also sets _Delegate=true_. For a slice, runc specifies a weak dependency on
    42the parent slice via a _Wants=_ property.
    43
    44### Resource limits
    45
    46runc always enables accounting for all controllers, regardless of any limits
    47being set. This means it unconditionally sets the following properties for the
    48systemd unit being created:
    49
    50 * _CPUAccounting=true_
    51 * _IOAccounting=true_ (_BlockIOAccounting_ for cgroup v1)
    52 * _MemoryAccounting=true_
    53 * _TasksAccounting=true_
    54
    55The resource limits of the systemd unit are set by runc by translating the
    56runtime spec resources to systemd unit properties.
    57
    58Such translation is by no means complete, as there are some cgroup properties
    59that can not be set via systemd.  Therefore, runc systemd cgroup driver is
    60backed by fs driver (in other words, cgroup limits are first set via systemd
    61unit properties, and when by writing to cgroupfs files).
    62
    63The set of runtime spec resources which is translated by runc to systemd unit
    64properties depends on kernel cgroup version being used (v1 or v2), and on the
    65systemd version being run. If an older systemd version (which does not support
    66some resources) is used, runc do not set those resources.
    67
    68The following tables summarize which properties are translated.
    69
    70#### cgroup v1
    71
    72| runtime spec resource | systemd property name | min systemd version |
    73|-----------------------|-----------------------|---------------------|
    74| memory.limit          | MemoryLimit           |                     |
    75| cpu.shares            | CPUShares             |                     |
    76| blockIO.weight        | BlockIOWeight         |                     |
    77| pids.limit            | TasksMax              |                     |
    78| cpu.cpus              | AllowedCPUs           | v244                |
    79| cpu.mems              | AllowedMemoryNodes    | v244                |
    80
    81#### cgroup v2
    82
    83| runtime spec resource   | systemd property name | min systemd version |
    84|-------------------------|-----------------------|---------------------|
    85| memory.limit            | MemoryMax             |                     |
    86| memory.reservation      | MemoryLow             |                     |
    87| memory.swap             | MemorySwapMax         |                     |
    88| cpu.shares              | CPUWeight             |                     |
    89| pids.limit              | TasksMax              |                     |
    90| cpu.cpus                | AllowedCPUs           | v244                |
    91| cpu.mems                | AllowedMemoryNodes    | v244                |
    92| unified.cpu.max         | CPUQuota, CPUQuotaPeriodSec | v242          |
    93| unified.cpu.weight      | CPUWeight             |                     |
    94| unified.cpuset.cpus     | AllowedCPUs           | v244                |
    95| unified.cpuset.mems     | AllowedMemoryNodes    | v244                |
    96| unified.memory.high     | MemoryHigh            |                     |
    97| unified.memory.low      | MemoryLow             |                     |
    98| unified.memory.min      | MemoryMin             |                     |
    99| unified.memory.max      | MemoryMax             |                     |
   100| unified.memory.swap.max | MemorySwapMax         |                     |
   101| unified.pids.max        | TasksMax              |                     |
   102
   103For documentation on systemd unit resource properties, see
   104`systemd.resource-control(5)` man page.
   105
   106### Auxiliary properties
   107
   108Auxiliary properties of a systemd unit (as shown by `systemctl show
   109<unit-name>` after the container is created) can be set (or overwritten) by
   110adding annotations to the container runtime spec (`config.json`).
   111
   112For example:
   113
   114```json
   115        "annotations": {
   116                "org.systemd.property.TimeoutStopUSec": "uint64 123456789",
   117                "org.systemd.property.CollectMode":"'inactive-or-failed'"
   118        },
   119```
   120
   121The above will set the following properties:
   122
   123* `TimeoutStopSec` to 2 minutes and 3 seconds;
   124* `CollectMode` to "inactive-or-failed".
   125
   126The values must be in the gvariant text format, as described in
   127[gvariant documentation](https://docs.gtk.org/glib/gvariant-text.html).
   128
   129To find out which type systemd expects for a particular parameter, please
   130consult systemd sources.

View as plain text