terminals.md

Documentation: github.com/opencontainers/runc/docs

     1# Terminals and Standard IO #
     2
     3*Note that the default configuration of `runc` (foreground, new terminal) is
     4generally the best option for most users. This document exists to help explain
     5what the purpose of the different modes is, and to try to steer users away from
     6common mistakes and misunderstandings.*
     7
     8In general, most processes on Unix (and Unix-like) operating systems have 3
     9standard file descriptors provided at the start, collectively referred to as
    10"standard IO" (`stdio`):
    11
    12* `0`: standard-in (`stdin`), the input stream into the process
    13* `1`: standard-out (`stdout`), the output stream from the process
    14* `2`: standard-error (`stderr`), the error stream from the process
    15
    16When creating and running a container via `runc`, it is important to take care
    17to structure the `stdio` the new container's process receives. In some ways
    18containers are just regular processes, while in other ways they're an isolated
    19sub-partition of your machine (in a similar sense to a VM). This means that the
    20structure of IO is not as simple as with ordinary programs (which generally
    21just use the file descriptors you give them).
    22
    23## Other File Descriptors ##
    24
    25Before we continue, it is important to note that processes can have more file
    26descriptors than just `stdio`. By default in `runc` no other file descriptors
    27will be passed to the spawned container process. If you wish to explicitly pass
    28file descriptors to the container you have to use the `--preserve-fds` option.
    29These ancillary file descriptors don't have any of the strange semantics
    30discussed further in this document (those only apply to `stdio`) -- they are
    31passed untouched by `runc`.
    32
    33It should be noted that `--preserve-fds` does not take individual file
    34descriptors to preserve. Instead, it takes how many file descriptors (not
    35including `stdio` or `LISTEN_FDS`) should be passed to the container. In the
    36following example:
    37
    38```
    39% runc run --preserve-fds 5 <container>
    40```
    41
    42`runc` will pass the first `5` file descriptors (`3`, `4`, `5`, `6`, and `7` --
    43assuming that `LISTEN_FDS` has not been configured) to the container.
    44
    45In addition to `--preserve-fds`, `LISTEN_FDS` file descriptors are passed
    46automatically to allow for `systemd`-style socket activation. To extend the
    47above example:
    48
    49```
    50% LISTEN_PID=$pid_of_runc LISTEN_FDS=3 runc run --preserve-fds 5 <container>
    51```
    52
    53`runc` will now pass the first `8` file descriptors (and it will also pass
    54`LISTEN_FDS=3` and `LISTEN_PID=1` to the container). The first `3` (`3`, `4`,
    55and `5`) were passed due to `LISTEN_FDS` and the other `5` (`6`, `7`, `8`, `9`,
    56and `10`) were passed due to `--preserve-fds`. You should keep this in mind if
    57you use `runc` directly in something like a `systemd` unit file. To disable
    58this `LISTEN_FDS`-style passing just unset `LISTEN_FDS`.
    59
    60**Be very careful when passing file descriptors to a container process.** Due
    61to some Linux kernel (mis)features, a container with access to certain types of
    62file descriptors (such as `O_PATH` descriptors) outside of the container's root
    63file system can use these to break out of the container's pivoted mount
    64namespace. [This has resulted in CVEs in the past.][CVE-2016-9962]
    65
    66[CVE-2016-9962]: https://nvd.nist.gov/vuln/detail/CVE-2016-9962
    67
    68## <a name="terminal-modes" /> Terminal Modes ##
    69
    70`runc` supports two distinct methods for passing `stdio` to the container's
    71primary process:
    72
    73* [new terminal](#new-terminal) (`terminal: true`)
    74* [pass-through](#pass-through) (`terminal: false`)
    75
    76When first using `runc` these two modes will look incredibly similar, but this
    77can be quite deceptive as these different modes have quite different
    78characteristics.
    79
    80By default, `runc spec` will create a configuration that will create a new
    81terminal (`terminal: true`). However, if the `terminal: ...` line is not
    82present in `config.json` then pass-through is the default.
    83
    84*In general we recommend using new terminal, because it means that tools like
    85`sudo` will work inside your container. But pass-through can be useful if you
    86know what you're doing, or if you're using `runc` as part of a non-interactive
    87pipeline.*
    88
    89### <a name="new-terminal"> New Terminal ###
    90
    91In new terminal mode, `runc` will create a brand-new "console" (or more
    92precisely, a new pseudo-terminal using the container's namespaced
    93`/dev/pts/ptmx`) for your contained process to use as its `stdio`.
    94
    95When you start a process in new terminal mode, `runc` will do the following:
    96
    971. Create a new pseudo-terminal.
    982. Pass the slave end to the container's primary process as its `stdio`.
    993. Send the master end to a process to interact with the `stdio` for the
   100   container's primary process ([details below](#runc-modes)).
   101
   102It should be noted that since a new pseudo-terminal is being used for
   103communication with the container, some strange properties of pseudo-terminals
   104might surprise you. For instance, by default, all new pseudo-terminals
   105translate the byte `'\n'` to the sequence `'\r\n'` on both `stdout` and
   106`stderr`. In addition there are [a whole range of `ioctls(2)` that can only
   107interact with pseudo-terminal `stdio`][tty_ioctl(4)].
   108
   109> **NOTE**: In new terminal mode, all three `stdio` file descriptors are the
   110> same underlying file. The reason for this is to match how a shell's `stdio`
   111> looks to a process (as well as remove race condition issues with having to
   112> deal with multiple master pseudo-terminal file descriptors). However this
   113> means that it is not really possible to uniquely distinguish between `stdout`
   114> and `stderr` from the caller's perspective.
   115
   116#### Issues
   117
   118If you see an error like
   119
   120```
   121open /dev/tty: no such device or address
   122```
   123
   124from runc, it means it can't open a terminal (because there isn't one). This
   125can happen when stdin (and possibly also stdout and stderr) are redirected,
   126or in some environments that lack a tty (such as GitHub Actions runners).
   127
   128The solution to this is to *not* use a terminal for the container, i.e. have
   129`terminal: false` in `config.json`. If the container really needs a terminal
   130(some programs require one), you can provide one, using one of the following
   131methods.
   132
   133One way is to use `ssh` with the `-tt` flag. The second `t` forces a terminal
   134allocation even if there's no local one -- and so it is required when stdin is
   135not a terminal (some `ssh` implementations only look for a terminal on stdin).
   136
   137Another way is to run runc under the `script` utility, like this
   138
   139```console
   140$ script -e -c 'runc run <container>'
   141```
   142
   143[tty_ioctl(4)]: https://linux.die.net/man/4/tty_ioctl
   144
   145### <a name="pass-through"> Pass-Through ###
   146
   147If you have already set up some file handles that you wish your contained
   148process to use as its `stdio`, then you can ask `runc` to pass them through to
   149the contained process (this is not necessarily the same as `--preserve-fds`'s
   150passing of file descriptors -- [details below](#runc-modes)). As an example
   151(assuming that `terminal: false` is set in `config.json`):
   152
   153```
   154% echo input | runc run some_container > /tmp/log.out 2> /tmp/log.err
   155```
   156
   157Here the container's various `stdio` file descriptors will be substituted with
   158the following:
   159
   160* `stdin` will be sourced from the `echo input` pipeline.
   161* `stdout` will be output into `/tmp/log.out` on the host.
   162* `stderr` will be output into `/tmp/log.err` on the host.
   163
   164It should be noted that the actual file handles seen inside the container may
   165be different [based on the mode `runc` is being used in](#runc-modes) (for
   166instance, the file referenced by `1` could be `/tmp/log.out` directly or a pipe
   167which `runc` is using to buffer output, based on the mode). However the net
   168result will be the same in either case. In principle you could use the [new
   169terminal mode](#new-terminal) in a pipeline, but the difference will become
   170more clear when you are introduced to [`runc`'s detached mode](#runc-modes).
   171
   172## <a name="runc-modes" /> `runc` Modes ##
   173
   174`runc` itself runs in two modes:
   175
   176* [foreground](#foreground)
   177* [detached](#detached)
   178
   179You can use either [terminal mode](#terminal-modes) with either `runc` mode.
   180However, there are considerations that may indicate preference for one mode
   181over another. It should be noted that while two types of modes (terminal and
   182`runc`) are conceptually independent from each other, you should be aware of
   183the intricacies of which combination you are using.
   184
   185*In general we recommend using foreground because it's the most
   186straight-forward to use, with the only downside being that you will have a
   187long-running `runc` process. Detached mode is difficult to get right and
   188generally requires having your own `stdio` management.*
   189
   190### Foreground ###
   191
   192The default (and most straight-forward) mode of `runc`. In this mode, your
   193`runc` command remains in the foreground with the container process as a child.
   194All `stdio` is buffered through the foreground `runc` process (irrespective of
   195which terminal mode you are using). This is conceptually quite similar to
   196running a normal process interactively in a shell (and if you are using `runc`
   197in a shell interactively, this is what you should use).
   198
   199Because the `stdio` will be buffered in this mode, some very important
   200peculiarities of this mode should be kept in mind:
   201
   202* With [new terminal mode](#new-terminal), the container will see a
   203  pseudo-terminal as its `stdio` (as you might expect). However, the `stdio` of
   204  the foreground `runc` process will remain the `stdio` that the process was
   205  started with -- and `runc` will copy all `stdio` between its `stdio` and the
   206  container's `stdio`. This means that while a new pseudo-terminal has been
   207  created, the foreground `runc` process manages it over the lifetime of the
   208  container.
   209
   210* With [pass-through mode](#pass-through), the foreground `runc`'s `stdio` is
   211  **not** passed to the container. Instead, the container's `stdio` is a set of
   212  pipes which are used to copy data between `runc`'s `stdio` and the
   213  container's `stdio`. This means that the container never has direct access to
   214  host file descriptors (aside from the pipes created by the container runtime,
   215  but that shouldn't be an issue).
   216
   217The main drawback of the foreground mode of operation is that it requires a
   218long-running foreground `runc` process. If you kill the foreground `runc`
   219process then you will no longer have access to the `stdio` of the container
   220(and in most cases this will result in the container dying abnormally due to
   221`SIGPIPE` or some other error). By extension this means that any bug in the
   222long-running foreground `runc` process (such as a memory leak) or a stray
   223OOM-kill sweep could result in your container being killed **through no fault
   224of the user**. In addition, there is no way in foreground mode of passing a
   225file descriptor directly to the container process as its `stdio` (like
   226`--preserve-fds` does).
   227
   228These shortcomings are obviously sub-optimal and are the reason that `runc` has
   229an additional mode called "detached mode".
   230
   231### Detached ###
   232
   233In contrast to foreground mode, in detached mode there is no long-running
   234foreground `runc` process once the container has started. In fact, there is no
   235long-running `runc` process at all. However, this means that it is up to the
   236caller to handle the `stdio` after `runc` has set it up for you. In a shell
   237this means that the `runc` command will exit and control will return to the
   238shell, after the container has been set up.
   239
   240You can run `runc` in detached mode in one of the following ways:
   241
   242* `runc run -d ...` which operates similar to `runc run` but is detached.
   243* `runc create` followed by `runc start` which is the standard container
   244  lifecycle defined by the OCI runtime specification (`runc create` sets up the
   245  container completely, waiting for `runc start` to begin execution of user
   246  code).
   247
   248The main use-case of detached mode is for higher-level tools that want to be
   249wrappers around `runc`. By running `runc` in detached mode, those tools have
   250far more control over the container's `stdio` without `runc` getting in the
   251way (most wrappers around `runc` like `cri-o` or `containerd` use detached mode
   252for this reason).
   253
   254Unfortunately using detached mode is a bit more complicated and requires more
   255care than the foreground mode -- mainly because it is now up to the caller to
   256handle the `stdio` of the container.
   257
   258Another complication is that the parent process is responsible for acting as
   259the subreaper for the container. In short, you need to call
   260`prctl(PR_SET_CHILD_SUBREAPER, 1, ...)` in the parent process and correctly
   261handle the implications of being a subreaper. Failing to do so may result in
   262zombie processes being accumulated on your host.
   263
   264These tasks are usually performed by a dedicated (and minimal) monitor process
   265per-container. For the sake of comparison, other runtimes such as LXC do not
   266have an equivalent detached mode and instead integrate this monitor process
   267into the container runtime itself -- this has several tradeoffs, and runc has
   268opted to support delegating the monitoring responsibility to the parent process
   269through this detached mode.
   270
   271#### Detached Pass-Through ####
   272
   273In detached mode, pass-through actually does what it says on the tin -- the
   274`stdio` file descriptors of the `runc` process are passed through (untouched)
   275to the container's `stdio`. The purpose of this option is to allow a user to
   276set up `stdio` for a container themselves and then force `runc` to just use
   277their pre-prepared `stdio` (without any pseudo-terminal funny business). *If
   278you don't see why this would be useful, don't use this option.*
   279
   280**You must be incredibly careful when using detached pass-through (especially
   281in a shell).** The reason for this is that by using detached pass-through you
   282are passing host file descriptors to the container. In the case of a shell,
   283usually your `stdio` is going to be a pseudo-terminal (on your host). A
   284malicious container could take advantage of TTY-specific `ioctls` like
   285`TIOCSTI` to fake input into the **host** shell (remember that in detached
   286mode, control is returned to your shell and so the terminal you've given the
   287container is being read by a shell prompt).
   288
   289There are also several other issues with running non-malicious containers in a
   290shell with detached pass-through (where you pass your shell's `stdio` to the
   291container):
   292
   293* Output from the container will be interleaved with output from your shell (in
   294  a non-deterministic way), without any real way of distinguishing from where a
   295  particular piece of output came from.
   296
   297* Any input to `stdin` will be non-deterministically split and given to either
   298  the container or the shell (because both are blocked on a `read(2)` of the
   299  same FIFO-style file descriptor).
   300
   301They are all related to the fact that there is going to be a race when either
   302your host or the container tries to read from (or write to) `stdio`. This
   303problem is especially obvious when in a shell, where usually the terminal has
   304been put into raw mode (where each individual key-press should cause `read(2)`
   305to return).
   306
   307> **NOTE**: There is also currently a [known problem][issue-1721] where using
   308> detached pass-through will result in the container hanging if the `stdout` or
   309> `stderr` is a pipe (though this should be a temporary issue).
   310
   311[issue-1721]: https://github.com/opencontainers/runc/issues/1721
   312
   313#### Detached New Terminal ####
   314
   315When creating a new pseudo-terminal in detached mode, and fairly obvious
   316problem appears -- how do we use the new terminal that `runc` created? Unlike
   317in pass-through, `runc` has created a new set of file descriptors that need to
   318be used by *something* in order for container communication to work.
   319
   320The way this problem is resolved is through the use of Unix domain sockets.
   321There is a feature of Unix sockets called `SCM_RIGHTS` which allows a file
   322descriptor to be sent through a Unix socket to a completely separate process
   323(which can then use that file descriptor as though they opened it). When using
   324`runc` in detached new terminal mode, this is how a user gets access to the
   325pseudo-terminal's master file descriptor.
   326
   327To this end, there is a new option (which is required if you want to use `runc`
   328in detached new terminal mode): `--console-socket`. This option takes the path
   329to a Unix domain socket which `runc` will connect to and send the
   330pseudo-terminal master file descriptor down. The general process for getting
   331the pseudo-terminal master is as follows:
   332
   3331. Create a Unix domain socket at some path, `$socket_path`.
   3342. Call `runc run` or `runc create` with the argument `--console-socket
   335   $socket_path`.
   3363. Using `recvmsg(2)` retrieve the file descriptor sent using `SCM_RIGHTS` by
   337   `runc`.
   3384. Now the manager can interact with the `stdio` of the container, using the
   339   retrieved pseudo-terminal master.
   340
   341After `runc` exits, the only process with a copy of the pseudo-terminal master
   342file descriptor is whoever read the file descriptor from the socket.
   343
   344> **NOTE**: Currently `runc` doesn't support abstract socket addresses (due to
   345> it not being possible to pass an `argv` with a null-byte as the first
   346> character). In the future this may change, but currently you must use a valid
   347> path name.
   348
   349In order to help users make use of detached new terminal mode, we have provided
   350a [Go implementation in the `go-runc` bindings][containerd/go-runc.Socket], as
   351well as [a simple client][recvtty].
   352
   353[containerd/go-runc.Socket]: https://godoc.org/github.com/containerd/go-runc#Socket
   354[recvtty]: /contrib/cmd/recvtty
View as plain text