1# Image Layer Filesystem Changeset
2
3This document describes how to serialize a filesystem and filesystem changes like removed files into a blob called a layer.
4One or more layers are applied on top of each other to create a complete filesystem.
5This document will use a concrete example to illustrate how to create and consume these filesystem layers.
6
7This section defines the `application/vnd.oci.image.layer.v1.tar`, `application/vnd.oci.image.layer.v1.tar+gzip`, `application/vnd.oci.image.layer.v1.tar+zstd`, `application/vnd.oci.image.layer.nondistributable.v1.tar`, `application/vnd.oci.image.layer.nondistributable.v1.tar+gzip`, and `application/vnd.oci.image.layer.nondistributable.v1.tar+zstd` [media types](media-types.md).
8
9## `+gzip` Media Types
10
11- The media type `application/vnd.oci.image.layer.v1.tar+gzip` represents an `application/vnd.oci.image.layer.v1.tar` payload which has been compressed with [gzip][rfc1952_2].
12- The media type `application/vnd.oci.image.layer.nondistributable.v1.tar+gzip` represents an `application/vnd.oci.image.layer.nondistributable.v1.tar` payload which has been compressed with [gzip][rfc1952_2].
13
14## `+zstd` Media Types
15
16- The media type `application/vnd.oci.image.layer.v1.tar+zstd` represents an `application/vnd.oci.image.layer.v1.tar` payload which has been compressed with [zstd][rfc8478].
17- The media type `application/vnd.oci.image.layer.nondistributable.v1.tar+zstd` represents an `application/vnd.oci.image.layer.nondistributable.v1.tar` payload which has been compressed with [zstd][rfc8478].
18
19## Distributable Format
20
21- Layer Changesets for the [media type](media-types.md) `application/vnd.oci.image.layer.v1.tar` MUST be packaged in [tar archive][tar-archive].
22- Layer Changesets for the [media type](media-types.md) `application/vnd.oci.image.layer.v1.tar` MUST NOT include duplicate entries for file paths in the resulting [tar archive][tar-archive].
23
24## Change Types
25
26Types of changes that can occur in a changeset are:
27
28- Additions
29- Modifications
30- Removals
31
32Additions and Modifications are represented the same in the changeset tar archive.
33
34Removals are represented using "[whiteout](#whiteouts)" file entries (See [Representing Changes](#representing-changes)).
35
36### File Types
37
38Throughout this document section, the use of word "files" or "entries" includes the following, where supported:
39
40- regular files
41- directories
42- sockets
43- symbolic links
44- block devices
45- character devices
46- FIFOs
47
48### File Attributes
49
50Where supported, MUST include file attributes for Additions and Modifications include:
51
52- Modification Time (`mtime`)
53- User ID (`uid`)
54 - User Name (`uname`) *secondary to `uid`*
55- Group ID (`gid`)
56 - Group Name (`gname`) *secondary to `gid`*
57- Mode (`mode`)
58- Extended Attributes (`xattrs`)
59- Symlink reference (`linkname` + symbolic link type)
60- [Hardlink](#hardlinks) reference (`linkname`)
61
62[Sparse files](https://en.wikipedia.org/wiki/Sparse_file) SHOULD NOT be used because they lack consistent support across tar implementations.
63
64#### Hardlinks
65
66- Hardlinks are a [POSIX concept](https://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html) for having one or more directory entries for the same file on the same device.
67- Not all filesystems support hardlinks (e.g. [FAT](https://en.wikipedia.org/wiki/File_Allocation_Table)).
68- Hardlinks are possible with all [file types](#file-types) except `directories`.
69- Non-directory files are considered "hardlinked" when their link count is greater than 1.
70- Hardlinked files are on a same device (i.e. comparing Major:Minor pair) and have the same inode.
71- The corresponding files that share the link with the > 1 linkcount may be outside the directory that the changeset is being produced from, in which case the `linkname` is not recorded in the changeset.
72- Hardlinks are stored in a tar archive with type of a `1` char, per the [GNU Basic Tar Format][gnu-tar-standard] and [libarchive tar(5)][libarchive-tar].
73- While approaches to deriving new or changed hardlinks may vary, a possible approach is:
74
75```text
76SET LinkMap to map[< Major:Minor String >]map[< inode integer >]< path string >
77SET LinkNames to map[< src path string >]< dest path string >
78FOR each path in root path
79 IF path type is directory
80 CONTINUE
81 ENDIF
82 SET filestat to stat(path)
83 IF filestat num of links == 1
84 CONTINUE
85 ENDIF
86 IF LinkMap[filestat device][filestat inode] is not empty
87 SET LinkNames[path] to LinkMap[filestat device][filestat inode]
88 ELSE
89 SET LinkMap[filestat device][filestat inode] to path
90 ENDIF
91END FOR
92```
93
94With this approach, the link map and links names of a directory could be compared against that of another directory to derive additions and changes to hardlinks.
95
96#### Platform-specific attributes
97
98Implementations on Windows MUST support these additional attributes, encoded in [PAX vendor
99extensions](https://github.com/libarchive/libarchive/wiki/ManPageTar5#pax-interchange-format) as follows:
100
101- [Windows file attributes](https://msdn.microsoft.com/en-us/library/windows/desktop/gg258117(v=vs.85).aspx) (`MSWINDOWS.fileattr`)
102- [Security descriptor](https://msdn.microsoft.com/en-us/library/cc230366.aspx) (`MSWINDOWS.rawsd`): base64-encoded self-relative binary security descriptor
103- Mount points (`MSWINDOWS.mountpoint`): if present on a directory symbolic link, then the link should be created as a [directory junction](https://en.wikipedia.org/wiki/NTFS_junction_point)
104- Creation time (`LIBARCHIVE.creationtime`)
105
106## Creating
107
108### Initial Root Filesystem
109
110The initial root filesystem is the base or parent layer.
111
112For this example, an image root filesystem has an initial state as an empty directory.
113The name of the directory is not relevant to the layer itself, only for the purpose of producing comparisons.
114
115Here is an initial empty directory structure for a changeset, with a unique directory name `rootfs-c9d-v1`.
116
117```text
118rootfs-c9d-v1/
119```
120
121### Populate Initial Filesystem
122
123Files and directories are then created:
124
125```text
126rootfs-c9d-v1/
127 etc/
128 my-app-config
129 bin/
130 my-app-binary
131 my-app-tools
132```
133
134The `rootfs-c9d-v1` directory is then created as a plain [tar archive][tar-archive] with relative path to `rootfs-c9d-v1`.
135Entries for the following files:
136
137```text
138./
139./etc/
140./etc/my-app-config
141./bin/
142./bin/my-app-binary
143./bin/my-app-tools
144```
145
146### Populate a Comparison Filesystem
147
148Create a new directory and initialize it with a copy or snapshot of the prior root filesystem.
149Example commands that can preserve [file attributes](#file-attributes) to make this copy are:
150
151- [cp(1)](https://linux.die.net/man/1/cp): `cp -a rootfs-c9d-v1/ rootfs-c9d-v1.s1/`
152- [rsync(1)](https://linux.die.net/man/1/rsync): `rsync -aHAX rootfs-c9d-v1/ rootfs-c9d-v1.s1/`
153- [tar(1)](https://linux.die.net/man/1/tar): `mkdir rootfs-c9d-v1.s1 && tar --acls --xattrs -C rootfs-c9d-v1/ -c . | tar -C rootfs-c9d-v1.s1/ --acls --xattrs -x` (including `--selinux` where supported)
154
155Any [changes](#change-types) to the snapshot MUST NOT change or affect the directory it was copied from.
156
157For example `rootfs-c9d-v1.s1` is an identical snapshot of `rootfs-c9d-v1`.
158In this way `rootfs-c9d-v1.s1` is prepared for updates and alterations.
159
160**Implementor's Note**: *a copy-on-write or union filesystem can efficiently make directory snapshots*
161
162Initial layout of the snapshot:
163
164```text
165rootfs-c9d-v1.s1/
166 etc/
167 my-app-config
168 bin/
169 my-app-binary
170 my-app-tools
171```
172
173See [Change Types](#change-types) for more details on changes.
174
175For example, add a directory at `/etc/my-app.d` containing a default config file, removing the existing config file.
176Also a change (in attribute or file content) to `./bin/my-app-tools` binary to handle the config layout change.
177
178Following these changes, the representation of the `rootfs-c9d-v1.s1` directory:
179
180```text
181rootfs-c9d-v1.s1/
182 etc/
183 my-app.d/
184 default.cfg
185 bin/
186 my-app-binary
187 my-app-tools
188```
189
190### Determining Changes
191
192When two directories are compared, the relative root is the top-level directory.
193The directories are compared, looking for files that have been [added, modified, or removed](#change-types).
194
195For this example, `rootfs-c9d-v1/` and `rootfs-c9d-v1.s1/` are recursively compared, each as relative root path.
196
197The following changeset is found:
198
199```text
200Added: /etc/my-app.d/
201Added: /etc/my-app.d/default.cfg
202Modified: /bin/my-app-tools
203Deleted: /etc/my-app-config
204```
205
206This reflects the removal of `/etc/my-app-config` and creation of a file and directory at `/etc/my-app.d/default.cfg`.
207`/bin/my-app-tools` has also been replaced with an updated version.
208
209### Representing Changes
210
211A [tar archive][tar-archive] is then created which contains _only_ this changeset:
212
213- Added and modified files and directories in their entirety
214- Deleted files or directories marked with a [whiteout file](#whiteouts)
215
216The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries:
217
218```text
219./etc/my-app.d/
220./etc/my-app.d/default.cfg
221./bin/my-app-tools
222./etc/.wh.my-app-config
223```
224
225To signify that the resource `./etc/my-app-config` MUST be removed when the changeset is applied, the basename of the entry is prefixed with `.wh.`.
226
227## Applying Changesets
228
229- Layer Changesets of [media type](media-types.md) `application/vnd.oci.image.layer.v1.tar` are _applied_, rather than simply extracted as tar archives.
230- Applying a layer changeset requires special consideration for the [whiteout](#whiteouts) files.
231- In the absence of any [whiteout](#whiteouts) files in a layer changeset, the archive is extracted like a regular tar archive.
232
233### Changeset over existing files
234
235This section specifies applying an entry from a layer changeset if the target path already exists.
236
237If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset.
238In all other cases, the implementation MUST do the semantic equivalent of the following:
239
240- removing the file path (e.g. [`unlink(2)`](https://linux.die.net/man/2/unlink) on Linux systems)
241- recreating the file path, based on the contents and attributes of the changeset entry
242
243## Whiteouts
244
245- A whiteout file is an empty file with a special filename that signifies a path should be deleted.
246- A whiteout filename consists of the prefix `.wh.` plus the basename of the path to be deleted.
247- As files prefixed with `.wh.` are special whiteout markers, it is not possible to create a filesystem which has a file or directory with a name beginning with `.wh.`.
248- Once a whiteout is applied, the whiteout itself MUST also be hidden.
249- Whiteout files MUST only apply to resources in lower/parent layers.
250- Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.
251
252The following is a base layer with several resources:
253
254```text
255a/
256a/b/
257a/b/c/
258a/b/c/bar
259```
260
261When the next layer is created, the original `a/b` directory is deleted and recreated with `a/b/c/foo`:
262
263```text
264a/
265a/.wh..wh..opq
266a/b/
267a/b/c/
268a/b/c/foo
269```
270
271When processing the second layer, `a/.wh..wh..opq` is applied first, before creating the new version of `a/b`, regardless of the ordering in which the whiteout file was encountered.
272For example, the following layer is equivalent to the layer above:
273
274```text
275a/
276a/b/
277a/b/c/
278a/b/c/foo
279a/.wh..wh..opq
280```
281
282Implementations SHOULD generate layers such that the whiteout files appear before sibling directory entries.
283
284### Opaque Whiteout
285
286- In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
287- An opaque whiteout entry is a file with the name `.wh..wh..opq` indicating that all siblings are hidden in the lower layer.
288
289Let's take the following base layer as an example:
290
291```text
292etc/
293 my-app-config
294bin/
295 my-app-binary
296 my-app-tools
297 tools/
298 my-app-tool-one
299```
300
301If all children of `bin/` are removed, the next layer would have the following:
302
303```text
304bin/
305 .wh..wh..opq
306```
307
308This is called _opaque whiteout_ format.
309An _opaque whiteout_ file hides _all_ children of the `bin/` including sub-directories and all descendants.
310Using _explicit whiteout_ files, this would be equivalent to the following:
311
312```text
313bin/
314 .wh.my-app-binary
315 .wh.my-app-tools
316 .wh.tools
317```
318
319In this case, a unique whiteout file is generated for each entry.
320If there were more children of `bin/` in the base layer, there would be an entry for each.
321Note that this opaque file will apply to _all_ children, including sub-directories, other resources and all descendants.
322
323Implementations SHOULD generate layers using _explicit whiteout_ files, but MUST accept both.
324
325Any given image is likely to be composed of several of these Image Filesystem Changeset tar archives.
326
327## Non-Distributable Layers
328
329> **NOTE**: Non-distributable layers are deprecated, and not recommended for future use.
330> Implementations SHOULD NOT produce new non-distributable layers.
331
332Due to legal requirements, certain layers may not be regularly distributable.
333Such "non-distributable" layers are typically downloaded directly from a distributor but never uploaded.
334
335Non-distributable layers SHOULD be tagged with an alternative mediatype of `application/vnd.oci.image.layer.nondistributable.v1.tar`.
336Implementations SHOULD NOT upload layers tagged with this media type; however, such a media type SHOULD NOT affect whether an implementation downloads the layer.
337
338[Descriptors](descriptor.md) referencing non-distributable layers MAY include `urls` for downloading these layers directly; however, the presence of the `urls` field SHOULD NOT be used to determine whether or not a layer is non-distributable.
339
340[libarchive-tar]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#POSIX_ustar_Archives
341[gnu-tar-standard]: https://www.gnu.org/software/tar/manual/html_node/Standard.html
342[rfc1952_2]: https://tools.ietf.org/html/rfc1952
343[tar-archive]: https://en.wikipedia.org/wiki/Tar_(computing)
344[rfc8478]: https://tools.ietf.org/html/rfc8478
View as plain text