...

Text file src/cuelang.org/go/doc/ref/spec.md

Documentation: cuelang.org/go/doc/ref

     1<!--
     2 Copyright 2018 The CUE Authors
     3
     4 Licensed under the Apache License, Version 2.0 (the "License");
     5 you may not use this file except in compliance with the License.
     6 You may obtain a copy of the License at
     7
     8     http://www.apache.org/licenses/LICENSE-2.0
     9
    10 Unless required by applicable law or agreed to in writing, software
    11 distributed under the License is distributed on an "AS IS" BASIS,
    12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    13 See the License for the specific language governing permissions and
    14 limitations under the License.
    15-->
    16
    17# The CUE Language Specification
    18
    19## Introduction
    20
    21This is a reference manual for the CUE data constraint language.
    22CUE, pronounced cue or Q, is a general-purpose and strongly typed
    23constraint-based language.
    24It can be used for data templating, data validation, code generation, scripting,
    25and many other applications involving structured data.
    26The CUE tooling, layered on top of CUE, provides
    27a general purpose scripting language for creating scripts as well as
    28simple servers, also expressed in CUE.
    29
    30CUE was designed with cloud configuration and related systems in mind,
    31but is not limited to this domain.
    32It derives its formalism from relational programming languages.
    33This formalism allows for managing and reasoning over large amounts of
    34data in a straightforward manner.
    35
    36The grammar is compact and regular, allowing for easy analysis by automatic
    37tools such as integrated development environments.
    38
    39This document is maintained by mpvl@golang.org.
    40CUE has a lot of similarities with the Go language. This document draws heavily
    41from the Go specification as a result.
    42
    43CUE draws its influence from many languages.
    44Its main influences were BCL/GCL (internal to Google),
    45LKB (LinGO), Go, and JSON.
    46Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
    47Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
    48
    49
    50## Notation
    51
    52The syntax is specified using Extended Backus-Naur Form (EBNF):
    53
    54```
    55Production  = production_name "=" [ Expression ] "." .
    56Expression  = Alternative { "|" Alternative } .
    57Alternative = Term { Term } .
    58Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
    59Group       = "(" Expression ")" .
    60Option      = "[" Expression "]" .
    61Repetition  = "{" Expression "}" .
    62```
    63
    64Productions are expressions constructed from terms and the following operators,
    65in increasing precedence:
    66
    67```
    68|   alternation
    69()  grouping
    70[]  option (0 or 1 times)
    71{}  repetition (0 to n times)
    72```
    73
    74Lower-case production names are used to identify lexical tokens. Non-terminals
    75are in CamelCase. Lexical tokens are enclosed in double quotes `""` or back
    76quotes ` `` `.
    77
    78The form `a … b` represents the set of characters from a through b as
    79alternatives. The horizontal ellipsis `…` is also used elsewhere in the spec to
    80informally denote various enumerations or code snippets that are not further
    81specified. The character `…` (as opposed to the three characters `...`) is not a
    82token of the CUE language.
    83
    84
    85## Source code representation
    86
    87Source code is Unicode text encoded in UTF-8.
    88Unless otherwise noted, the text is not canonicalized, so a single
    89accented code point is distinct from the same character constructed from
    90combining an accent and a letter; those are treated as two code points.
    91For simplicity, this document will use the unqualified term character to refer
    92to a Unicode code point in the source text.
    93
    94Each code point is distinct; for instance, upper and lower case letters are
    95different characters.
    96
    97Implementation restriction: For compatibility with other tools, a compiler may
    98disallow the NUL character (U+0000) in the source text.
    99
   100Implementation restriction: For compatibility with other tools, a compiler may
   101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
   102point in the source text. A byte order mark may be disallowed anywhere else in
   103the source.
   104
   105
   106### Characters
   107
   108The following terms are used to denote specific Unicode character classes:
   109
   110```
   111newline        = /* the Unicode code point U+000A */ .
   112unicode_char   = /* an arbitrary Unicode code point except newline */ .
   113unicode_letter = /* a Unicode code point classified as "Letter" */ .
   114unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
   115```
   116
   117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
   118character categories.
   119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
   120as Unicode letters, and those in the Number category Nd as Unicode digits.
   121
   122
   123### Letters and digits
   124
   125The underscore character `_` (U+005F) is considered a letter.
   126
   127```
   128letter        = unicode_letter | "_" | "$" .
   129decimal_digit = "0" … "9" .
   130binary_digit  = "0" … "1" .
   131octal_digit   = "0" … "7" .
   132hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .
   133```
   134
   135
   136## Lexical elements
   137
   138### Comments
   139
   140Comments serve as program documentation.
   141CUE supports line comments that start with the character sequence `//`
   142and stop at the end of the line.
   143
   144A comment cannot start inside a string literal or inside a comment.
   145A comment acts like a newline.
   146
   147
   148### Tokens
   149
   150Tokens form the vocabulary of the CUE language. There are four classes:
   151identifiers, keywords, operators and punctuation, and literals. White space,
   152formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
   153(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
   154would otherwise combine into a single token. Also, a newline or end of file may
   155trigger the insertion of a comma. While breaking the input into tokens, the
   156next token is the longest sequence of characters that form a valid token.
   157
   158
   159### Commas
   160
   161The formal grammar uses commas `,` as terminators in a number of productions.
   162CUE programs may omit most of these commas using the following rules:
   163
   164When the input is broken into tokens, a comma is automatically inserted into
   165the token stream immediately after a line's final token if that token is
   166
   167- an identifier, keyword, or bottom
   168- a number or string literal, including an interpolation
   169- one of the characters `)`, `]`, `}`, or `?`
   170- an ellipsis `...`
   171
   172
   173Although commas are automatically inserted, the parser will require
   174explicit commas between two list elements.
   175
   176<!--
   177TODO: remove the above exception
   178-->
   179
   180To reflect idiomatic use, examples in this document elide commas using
   181these rules.
   182
   183
   184### Identifiers
   185
   186Identifiers name entities such as fields and aliases.
   187An identifier is a sequence of one or more letters (which includes `_` and `$`)
   188and digits, optionally preceded by `#` or `_#`.
   189It may not be `_` or `$`.
   190The first character in an identifier, or after an `#` if it contains one,
   191must be a letter.
   192Identifiers starting with a `#` or `_` are reserved for definitions and hidden
   193fields.
   194
   195<!--
   196TODO: allow identifiers as defined in Unicode UAX #31
   197(https://unicode.org/reports/tr31/).
   198
   199Identifiers are normalized using the NFC normal form.
   200-->
   201
   202```
   203identifier  = [ "#" | "_#" ] letter { letter | unicode_digit } .
   204```
   205
   206```
   207a
   208_x9
   209fieldName
   210αβ
   211```
   212
   213<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
   214
   215Some identifiers are [predeclared](#predeclared-identifiers).
   216
   217
   218### Keywords
   219
   220CUE has a limited set of keywords.
   221In addition, CUE reserves all identifiers starting with `__` (double underscores)
   222as keywords.
   223These are typically targets of pre-declared identifiers.
   224
   225All keywords may be used as labels (field names).
   226Unless noted otherwise, they can also be used as identifiers to refer to
   227the same name.
   228
   229
   230#### Values
   231
   232The following keywords are values.
   233
   234```
   235null         true         false
   236```
   237
   238These can never be used to refer to a field of the same name.
   239This restriction is to ensure compatibility with JSON configuration files.
   240
   241
   242#### Preamble
   243
   244The following keywords are used at the preamble of a CUE file.
   245After the preamble, they may be used as identifiers to refer to namesake fields.
   246
   247```
   248package      import
   249```
   250
   251
   252#### Comprehension clauses
   253
   254The following keywords are used in comprehensions.
   255
   256```
   257for          in           if           let
   258```
   259
   260<!--
   261TODO:
   262    reduce [to]
   263    order [by]
   264-->
   265
   266
   267### Operators and punctuation
   268
   269The following character sequences represent operators and punctuation:
   270
   271```
   272+     &&    ==    <     =     (     )
   273-     ||    !=    >     :     {     }
   274*     &     =~    <=    ?     [     ]     ,
   275/     |     !~    >=    !     _|_   ...   .
   276```
   277<!--
   278Free tokens:  ; ~ ^
   279// To be used:
   280  @   at: associative lists.
   281
   282// Idea: use # instead of @ for attributes and allow then at declaration level.
   283// This will open up the possibility of defining #! at the start of a file
   284// without requiring special syntax. Although probably not quite.
   285 -->
   286
   287
   288### Numeric literals
   289
   290There are several kinds of numeric literals.
   291
   292```
   293int_lit     = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
   294decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .
   295decimals    = decimal_digit { [ "_" ] decimal_digit } .
   296si_it       = decimals [ "." decimals ] multiplier |
   297              "." decimals  multiplier .
   298binary_lit  = "0b" binary_digit { [ "_" ] binary_digit } .
   299hex_lit     = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
   300octal_lit   = "0o" octal_digit { [ "_" ] octal_digit } .
   301multiplier  = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
   302
   303float_lit   = decimals "." [ decimals ] [ exponent ] |
   304              decimals exponent |
   305              "." decimals [ exponent ].
   306exponent    = ( "e" | "E" ) [ "+" | "-" ] decimals .
   307```
   308
   309An _integer literal_ is a sequence of digits representing an integer value.
   310An optional prefix sets a non-decimal base: `0o` for octal,
   311`0x` or `0X` for hexadecimal, and `0b` for binary.
   312In hexadecimal literals, letters `a … f` and `A … F` represent values 10 through 15.
   313All integers allow interstitial underscores `_`;
   314these have no meaning and are solely for readability.
   315
   316Integer literals may have an SI or IEC multiplier.
   317Multipliers can be used with fractional numbers.
   318When multiplying a fraction by a multiplier, the result is truncated
   319towards zero if it is not an integer.
   320
   321```
   32242
   3231.5G    // 1_500_000_000
   3241.3Ki   // 1.3 * 1024 = trunc(1331.2) = 1331
   325170_141_183_460_469_231_731_687_303_715_884_105_727
   3260xBad_Face
   3270o755
   3280b0101_0001
   329```
   330
   331A _decimal floating-point literal_ is a representation of
   332a decimal floating-point value (a _float_).
   333It has an integer part, a decimal point, a fractional part, and an
   334exponent part.
   335The integer and fractional part comprise decimal digits; the
   336exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
   337One of the integer part or the fractional part may be elided; one of the decimal
   338point or the exponent may be elided.
   339
   340```
   3410.
   34272.40
   343072.40  // == 72.40
   3442.71828
   3451.e+0
   3466.67428e-11
   3471E6
   348.25
   349.12345E+5
   350```
   351
   352<!--
   353TODO: consider allowing Exo (and up), if not followed by a sign
   354or number. Alternatively one could only allow Ei, Yi, and Zi.
   355-->
   356
   357Neither a `float_lit` nor an `si_lit` may appear after a token that is:
   358
   359- an identifier, keyword, or bottom
   360- a number or string literal, including an interpolation
   361- one of the characters `)`, `]`, `}`, `?`, or `.`.
   362
   363<!--
   364So
   365`a + 3.2Ti`  -> `a`, `+`, `3.2Ti`
   366`a 3.2Ti`    -> `a`, `3`, `.`, `2`, `Ti`
   367`a + .5e3`   -> `a`, `+`, `.5e3`
   368`a .5e3`     -> `a`, `.`, `5`, `e3`.
   369-->
   370
   371
   372### String and byte sequence literals
   373
   374A string literal represents a string constant obtained from concatenating a
   375sequence of characters.
   376Byte sequences are a sequence of bytes.
   377
   378String and byte sequence literals are character sequences between,
   379respectively, double and single quotes, as in `"bar"` and `'bar'`.
   380Within the quotes, any character may appear except newline and,
   381respectively, unescaped double or single quote.
   382String literals may only be valid UTF-8.
   383Byte sequences may contain any sequence of bytes.
   384
   385Several escape sequences allow arbitrary values to be encoded as ASCII text.
   386An escape sequence starts with an _escape delimiter_, which is `\` by default.
   387The escape delimiter may be altered to be `\` plus a fixed number of
   388hash symbols `#` by padding the start and end of a string or byte sequence
   389literal with this number of hash symbols.
   390
   391<!--
   392TODO: move these examples further up so it's evident why #" exists.
   393	#"This is not an \(interpolation)"#
   394	#"This is an \#(interpolation)"#
   395	#"The sequence "\U0001F604" renders as \#U0001F604."#
   396-->
   397
   398There are four ways to represent the integer value as a numeric constant: `\x`
   399followed by exactly two hexadecimal digits; `\u` followed by exactly four
   400hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
   401plain backslash `\` followed by exactly three octal digits.
   402In each case the value of the literal is the value represented by the
   403digits in the corresponding base.
   404Hexadecimal and octal escapes are only allowed within byte sequences
   405(single quotes).
   406
   407Although these representations all result in an integer, they have different
   408valid ranges.
   409Octal escapes must represent a value between 0 and 255 inclusive.
   410Hexadecimal escapes satisfy this condition by construction.
   411The escapes `\u` and `\U` represent Unicode code points so within them
   412some values are illegal, in particular those above `0x10FFFF`.
   413Surrogate halves are allowed,
   414but are translated into their non-surrogate equivalent internally.
   415
   416The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
   417represent individual bytes of the resulting string; all other escapes represent
   418the (possibly multi-byte) UTF-8 encoding of individual characters.
   419Thus inside a string literal `\377` and `\xFF` represent a single byte of
   420value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
   421the two bytes `0xc3 0xbf` of the UTF-8 encoding of character `U+00FF`.
   422
   423```
   424\a   U+0007 alert or bell
   425\b   U+0008 backspace
   426\f   U+000C form feed
   427\n   U+000A line feed or newline
   428\r   U+000D carriage return
   429\t   U+0009 horizontal tab
   430\v   U+000b vertical tab
   431\/   U+002f slash (solidus)
   432\\   U+005c backslash
   433\'   U+0027 single quote  (valid escape only within single quoted literals)
   434\"   U+0022 double quote  (valid escape only within double quoted literals)
   435```
   436
   437The escape `\(` is used as an escape for string interpolation.
   438A `\(` must be followed by a valid CUE Expression, followed by a `)`.
   439
   440A backslash at the end of a line elides the line terminator that follows it.
   441This may not escape the final newline inside a multiline string: that
   442newline is already implicitly elided.
   443
   444All other sequences starting with a backslash are illegal inside literals.
   445
   446```
   447escaped_char     = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
   448byte_value       = octal_byte_value | hex_byte_value .
   449octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit .
   450hex_byte_value   = `\` { `#` } "x" hex_digit hex_digit .
   451little_u_value   = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit .
   452big_u_value      = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit
   453                           hex_digit hex_digit hex_digit hex_digit .
   454unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
   455interpolation    = "\" { `#` } "(" Expression ")" .
   456
   457string_lit       = simple_string_lit |
   458                   multiline_string_lit |
   459                   simple_bytes_lit |
   460                   multiline_bytes_lit |
   461                   `#` string_lit `#` .
   462
   463simple_string_lit    = `"` { unicode_value | interpolation } `"` .
   464simple_bytes_lit     = `'` { unicode_value | interpolation | byte_value } `'` .
   465multiline_string_lit = `"""` newline
   466                             { unicode_value | interpolation | newline }
   467                             newline `"""` .
   468multiline_bytes_lit  = "'''" newline
   469                             { unicode_value | interpolation | byte_value | newline }
   470                             newline "'''" .
   471```
   472
   473Carriage return characters (`\r`) inside string literals are discarded from
   474the string value.
   475
   476```
   477'a\000\xab'
   478'\007'
   479'\377'
   480'\xa'        // illegal: too few hexadecimal digits
   481"\n"
   482"\""
   483'Hello, world!\n'
   484"Hello, \( name )!"
   485"日本語"
   486"\u65e5本\U00008a9e"
   487'\xff\u00FF'
   488"\uD800"             // illegal: surrogate half (TODO: probably should allow)
   489"\U00110000"         // illegal: invalid Unicode code point
   490
   491#"This is not an \(interpolation)"#
   492#"This is an \#(interpolation)"#
   493#"The sequence "\U0001F604" renders as \#U0001F604."#
   494```
   495
   496These examples all represent the same string:
   497
   498```
   499"日本語"                                 // UTF-8 input text
   500'日本語'                                 // UTF-8 input text as byte sequence
   501"\u65e5\u672c\u8a9e"                    // the explicit Unicode code points
   502"\U000065e5\U0000672c\U00008a9e"        // the explicit Unicode code points
   503'\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'  // the explicit UTF-8 bytes
   504```
   505
   506If the source code represents a character as two code points, such as a
   507combining form involving an accent and a letter, the result will appear as two
   508code points if placed in a string literal.
   509
   510Strings and byte sequences have a multiline equivalent.
   511Multiline strings are like their single-line equivalent,
   512but allow newline characters.
   513
   514Multiline strings and byte sequences respectively start with
   515a triple double quote (`"""`) or triple single quote (`'''`),
   516immediately followed by a newline, which is discarded from the string contents.
   517The string is closed by a matching triple quote, which must be by itself
   518on a new line, preceded by optional whitespace.
   519The newline preceding the closing quote is discarded from the string contents.
   520The whitespace before a closing triple quote must appear before any non-empty
   521line after the opening quote and will be removed from each of these
   522lines in the string literal.
   523A closing triple quote may not appear in the string.
   524To include it is suffices to escape one of the quotes.
   525
   526```
   527"""
   528    lily:
   529    out of the water
   530    out of itself
   531
   532    bass
   533    picking \
   534    bugs
   535    off the moon
   536        — Nick Virgilio, Selected Haiku, 1988
   537    """
   538```
   539
   540This represents the same string as:
   541
   542```
   543"lily:\nout of the water\nout of itself\n\n" +
   544"bass\npicking bugs\noff the moon\n" +
   545"    — Nick Virgilio, Selected Haiku, 1988"
   546```
   547
   548<!-- TODO: other values
   549
   550Support for other values:
   551- Duration literals
   552- regular expressions: `re("[a-z]")`
   553-->
   554
   555
   556## Values
   557
   558In addition to simple values like `"hello"` and `42.0`, CUE has [structs](#structs).
   559A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
   560Structs are CUE's only way of building up complex values;
   561lists, which we will see later,
   562are defined in terms of structs.
   563
   564All possible values are ordered in a lattice,
   565a partial order where every two elements have a single greatest lower bound.
   566A value `a` is an _instance_ of a value `b`,
   567denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
   568that is if `a` orders before `b` in the partial order
   569(`⊑` is _not_ a CUE operator).
   570We also say that `b` _subsumes_ `a` in this case.
   571In graphical terms, `b` is "above" `a` in the lattice.
   572
   573<!-- TODO: link to https://cuelang.org/docs/concepts/logic/ as more reading
   574material, especially for those new to lattices
   575-->
   576
   577At the top of the lattice is the single ancestor of all values, called
   578[top](#top), denoted `_` in CUE.
   579Every value is an instance of top.
   580
   581At the bottom of the lattice is the value called [bottom](#bottom), denoted `_|_`.
   582A bottom value usually indicates an error.
   583Bottom is an instance of every value.
   584
   585An _atom_ is any value whose only instances are itself and bottom.
   586Examples of atoms are `42.0`, `"hello"`, `true`, and `null`.
   587
   588A value is _concrete_ if it is either an atom, or a struct whose field values
   589are all concrete, recursively.
   590
   591CUE's values also include what we normally think of as types, like `string` and
   592`float`.
   593It does not distinguish between types and values:
   594only the relationship of values in the lattice is important.
   595Each CUE "type" subsumes the concrete values that one would normally think
   596of as part of that type.
   597For example, `"hello"` is an instance of `string`, and `42.0` is an instance of
   598`float`.
   599In addition to `string` and `float`, CUE has `null`, `int`, `bool`, and `bytes`.
   600We informally call these CUE's "basic types".
   601
   602
   603```
   604false ⊑ bool
   605true  ⊑ bool
   606true  ⊑ true
   6075.0   ⊑ float
   608bool  ⊑ _
   609_|_   ⊑ _
   610_|_   ⊑ _|_
   611
   612_     ⋢ _|_
   613_     ⋢ bool
   614int   ⋢ bool
   615bool  ⋢ int
   616false ⋢ true
   617true  ⋢ false
   618float ⋢ 5.0
   6195     ⋢ 6
   620```
   621
   622
   623### Unification
   624
   625The _unification_ of values `a` and `b`
   626is defined as the greatest lower bound of `a` and `b`. (That is, the
   627value `u` such that `u ⊑ a` and `u ⊑ b`,
   628and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
   629it holds that `v ⊑ u`.)
   630Since CUE values form a lattice, the unification of two CUE values is
   631always unique.
   632
   633These all follow from the definition of unification:
   634- The unification of `a` with itself is always `a`.
   635- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
   636- The unification of a value with bottom is always bottom.
   637
   638Unification in CUE is a [binary expression](#operands), written `a & b`.
   639It is commutative, associative, and idempotent.
   640As a consequence, order of evaluation is irrelevant, a property that is key
   641to many of the constructs in the CUE language as well as the tooling layered
   642on top of it.
   643
   644
   645
   646<!-- TODO: explicitly mention that disjunction is not a binary operation
   647but a definition of a single value?-->
   648
   649
   650### Disjunction
   651
   652The _disjunction_ of values `a` and `b`
   653is defined as the least upper bound of `a` and `b`.
   654(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
   655and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
   656it holds that `d ⊑ e`.)
   657This style of disjunctions is sometimes also referred to as sum types.
   658Since CUE values form a lattice, the disjunction of two CUE values is always unique.
   659
   660
   661These all follow from the definition of disjunction:
   662- The disjunction of `a` with itself is always `a`.
   663- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
   664- The disjunction of a value `a` with bottom is always `a`.
   665- The disjunction of two bottom values is bottom.
   666
   667Disjunction in CUE is a [binary expression](#operands), written `a | b`.
   668It is commutative, associative, and idempotent.
   669
   670The unification of a disjunction with another value is equal to the disjunction
   671composed of the unification of this value with all of the original elements
   672of the disjunction.
   673In other words, unification distributes over disjunction.
   674
   675```
   676(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
   677```
   678
   679```
   680Expression                Result
   681({a:1} | {b:2}) & {c:3}   {a:1, c:3} | {b:2, c:3}
   682(int | string) & "foo"    "foo"
   683("a" | "b") & "c"         _|_
   684```
   685
   686A disjunction is _normalized_ if there is no element
   687`a` for which there is an element `b` such that `a ⊑ b`.
   688
   689<!--
   690Normalization is important, as we need to account for spurious elements
   691For instance "tcp" | "tcp" should resolve to "tcp".
   692
   693Also consider
   694
   695  ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
   696
   697in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
   698this expression is logically equivalent to {a:1} and should therefore be
   699considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
   700
   701For instance, in
   702
   703  x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
   704  y: x.a // 1
   705
   706y should resolve to 1, and not an error.
   707
   708For comparison, in
   709
   710  x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
   711  y: x.a // _|_
   712
   713y should be an error as x is still ambiguous before the selector is applied,
   714even though `a` resolves to 1 in all cases.
   715-->
   716
   717
   718#### Default values
   719
   720Any value `v` _may_ be associated with a default value `d`,
   721where `d` must be in instance of `v` (`d ⊑ v`).
   722
   723Default values are introduced by means of disjunctions.
   724Any element of a disjunction can be _marked_ as a default
   725by prefixing it with an asterisk `*` ([a unary expression](#operators)).
   726Syntactically consecutive disjunctions are considered to be
   727part of a single disjunction,
   728whereby multiple disjuncts can be marked as default.
   729A _marked disjunction_ is one where any of its terms are marked.
   730So `a | b | *c | d` is a single marked disjunction of four terms,
   731whereas `a | (b | *c | d)` is an unmarked disjunction of two terms,
   732one of which is a marked disjunction of three terms.
   733During unification, if all the marked disjuncts of a marked disjunction are
   734eliminated, then the remaining unmarked disjuncts are considered as if they
   735originated from an unmarked disjunction
   736<!-- TODO: this formulation should be worked out more.  -->
   737As explained below, distinguishing the nesting of disjunctions like this
   738is only relevant when both an outer and nested disjunction are marked.
   739
   740Intuitively, when an expression needs to be resolved for an operation other
   741than unification or disjunction,
   742non-starred elements are dropped in favor of starred ones if the starred ones
   743do not resolve to bottom.
   744
   745To define the unification and disjunction operation we use the notation
   746`⟨v⟩` to denote a CUE value `v` that is not associated with a default
   747and the notation `⟨v, d⟩` to denote a value `v` associated with a default
   748value `d`.
   749
   750The rewrite rules for unifying such values are as follows:
   751```
   752U0: ⟨v1⟩ & ⟨v2⟩         => ⟨v1&v2⟩
   753U1: ⟨v1, d1⟩ & ⟨v2⟩     => ⟨v1&v2, d1&v2⟩
   754U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩
   755```
   756
   757The rewrite rules for disjoining terms of unmarked disjunctions are
   758```
   759D0: ⟨v1⟩ | ⟨v2⟩         => ⟨v1|v2⟩
   760D1: ⟨v1, d1⟩ | ⟨v2⟩     => ⟨v1|v2, d1⟩
   761D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩
   762```
   763
   764Terms of marked disjunctions are first rewritten according to the following
   765rules:
   766```
   767M0:  ⟨v⟩    => ⟨v⟩        don't introduce defaults for unmarked term
   768M1: *⟨v⟩    => ⟨v, v⟩     introduce identical default for marked term
   769M2: *⟨v, d⟩ => ⟨v, d⟩     keep existing defaults for marked term
   770M3:  ⟨v, d⟩ => ⟨v⟩        strip existing defaults from unmarked term
   771```
   772
   773Note that for any marked disjunction `a`,
   774the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`.
   775
   776```
   777Expression               Value-default pair     Rules applied
   778*"tcp" | "udp"           ⟨"tcp"|"udp", "tcp"⟩    M1, D1
   779string | *"foo"          ⟨string, "foo"⟩         M1, D1
   780
   781*1 | 2 | 3               ⟨1|2|3, 1⟩              M1, D1
   782
   783(*1|2|3) | (1|*2|3)      ⟨1|2|3, 1|2⟩            M1, D1, D2
   784(*1|2|3) | *(1|*2|3)     ⟨1|2|3, 2⟩              M1, M2, M3, D1, D2
   785(*1|2|3) | (1|*2|3)&2    ⟨1|2|3, 1|2⟩            M1, D1, U1, D2
   786
   787(*1|2) & (1|*2)          ⟨1|2, _|_⟩              M1, D1, U2
   788```
   789
   790<!-- TODO: define and consistently use the value-default pair syntax -->
   791
   792The rules of subsumption for defaults can be derived from the above definitions
   793and are as follows.
   794
   795```
   796⟨v2, d2⟩ ⊑ ⟨v1, d1⟩  if v2 ⊑ v1 and d2 ⊑ d1
   797⟨v1, d1⟩ ⊑ ⟨v⟩       if v1 ⊑ v
   798⟨v⟩      ⊑ ⟨v1, d1⟩  if v ⊑ d1
   799```
   800
   801<!--
   802For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
   803
   804The last one is so restrictive as v could still be made more specific by
   805associating it with a default that is not subsumed by d1.
   806
   807Proof:
   808  by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
   809  where the most general value is (v, v).
   810  Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
   811  from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
   812  exactly defines the boundary of this subsumption.
   813-->
   814
   815<!--
   816(non-normalized entries could also be implicitly marked, allowing writing
   817int | 1, instead of int | *1, but that can be done in a backwards
   818compatible way later if really desirable, as long as we require that
   819disjunction literals be normalized).
   820-->
   821
   822```
   823Expression                       Resolves to
   824"tcp" | "udp"                    "tcp" | "udp"
   825*"tcp" | "udp"                   "tcp"
   826float | *1                       1
   827*string | 1.0                    string
   828(*1|2) + (2|*3)                  4
   829
   830(*1|2|3) | (1|*2|3)              1|2
   831(*1|2|3) & (1|*2|3)              1|2|3 // default is _|_
   832
   833(* >=5 | int) & (* <=5 | int)    5
   834
   835(*"tcp"|"udp") & ("udp"|*"tcp")  "tcp"
   836(*"tcp"|"udp") & ("udp"|"tcp")   "tcp"
   837(*"tcp"|"udp") & "tcp"           "tcp"
   838(*"tcp"|"udp") & (*"udp"|"tcp")  "tcp" | "udp" // default is _|_
   839
   840(*true | false) & bool           true
   841(*true | false) & (true | false) true
   842
   843{a: 1} | {b: 1}                  {a: 1} | {b: 1}
   844{a: 1} | *{b: 1}                 {b:1}
   845*{a: 1} | *{b: 1}                {a: 1} | {b: 1}
   846({a: 1} | {b: 1}) & {a:1}        {a:1}  | {a: 1, b: 1}
   847({a:1}|*{b:1}) & ({a:1}|*{b:1})  {b:1}
   848```
   849
   850
   851### Bottom and errors
   852
   853Any evaluation error in CUE results in a bottom value, represented by
   854the token `_|_`.
   855Bottom is an instance of every other value.
   856Any evaluation error is represented as bottom.
   857
   858Implementations may associate error strings with different instances of bottom;
   859logically they all remain the same value.
   860
   861```
   862bottom_lit = "_|_" .
   863```
   864
   865
   866### Top
   867
   868Top is represented by the underscore character `_`, lexically an identifier.
   869Unifying any value `v` with top results in `v` itself.
   870
   871```
   872Expr        Result
   873_ &  5        5
   874_ &  _        _
   875_ & _|_      _|_
   876_ | _|_       _
   877```
   878
   879
   880### Null
   881
   882The _null value_ is represented with the keyword `null`.
   883It has only one parent, top, and one child, bottom.
   884It is unordered with respect to any other value.
   885
   886```
   887null_lit   = "null" .
   888```
   889
   890```
   891null & 8     _|_
   892null & _     null
   893null & _|_   _|_
   894```
   895
   896
   897### Boolean values
   898
   899A _boolean type_ represents the set of Boolean truth values denoted by
   900the keywords `true` and `false`.
   901The predeclared boolean type is `bool`; it is a defined type and a separate
   902element in the lattice.
   903
   904```
   905bool_lit = "true" | "false" .
   906```
   907
   908```
   909bool & true          true
   910true & true          true
   911true & false         _|_
   912bool & (false|true)  false | true
   913bool & (true|false)  true | false
   914```
   915
   916
   917### Numeric values
   918
   919The _integer type_ represents the set of all integral numbers.
   920The _decimal floating-point type_ represents the set of all decimal floating-point
   921numbers.
   922They are two distinct types.
   923Both are instances instances of a generic `number` type.
   924
   925<!--
   926TODO: would be nice to make this a rendered diagram with Mermaid.
   927
   928                    number
   929                   /      \
   930                int      float
   931-->
   932
   933The predeclared number, integer, and decimal floating-point types are
   934`number`, `int` and `float`; they are defined types.
   935<!--
   936TODO: should we drop float? It is somewhat preciser and probably a good idea
   937to have it in the programmatic API, but it may be confusing to have to deal
   938with it in the language.
   939-->
   940
   941A decimal floating-point literal always has type `float`;
   942it is not an instance of `int` even if it is an integral number.
   943
   944Integer literals are always of type `int` and don't match type `float`.
   945
   946Numeric literals are exact values of arbitrary precision.
   947If the operation permits it, numbers should be kept in arbitrary precision.
   948
   949Implementation restriction: although numeric values have arbitrary precision
   950in the language, implementations may implement them using an internal
   951representation with limited precision.
   952That said, every implementation must:
   953
   954- Represent integer values with at least 256 bits.
   955- Represent floating-point values with a mantissa of at least 256 bits and
   956a signed binary exponent of at least 16 bits.
   957- Give an error if unable to represent an integer value precisely.
   958- Give an error if unable to represent a floating-point value due to overflow.
   959- Round to the nearest representable value if unable to represent
   960a floating-point value due to limits on precision.
   961These requirements apply to the result of any expression except for builtin
   962functions, for which an unusual loss of precision must be explicitly documented.
   963
   964
   965### Strings
   966
   967The _string type_ represents the set of UTF-8 strings,
   968not allowing surrogates.
   969The predeclared string type is `string`; it is a defined type.
   970
   971The length of a string `s` (its size in bytes) can be discovered using
   972the builtin function `len`.
   973
   974
   975### Bytes
   976
   977The _bytes type_ represents the set of byte sequences.
   978A byte sequence value is a (possibly empty) sequence of bytes.
   979The number of bytes is called the length of the byte sequence
   980and is never negative.
   981The predeclared byte sequence type is `bytes`; it is a defined type.
   982
   983
   984### Bounds
   985
   986A _bound_, syntactically a [unary expression](#operands), defines
   987a logically infinite disjunction of concrete values represented as a single comparison.
   988For example, `>= 2` represents the infinite disjunction `2|3|4|5|6|7|…`.
   989
   990For any [comparison operator](#comparison-operators) `op` except `==`,
   991`op a` is the disjunction of every `x` such that `x op a`.
   992
   993
   994```
   9952 & >=2 & <=5           // 2, where 2 is either an int or float.
   9962.5 & >=1 & <=5         // 2.5
   9972 & >=1.0 & <3.0        // 2.0
   9982 & >1 & <3.0           // 2.0
   9992.5 & int & >1 & <5     // _|_
  10002.5 & float & >1 & <5   // 2.5
  1001int & 2 & >1.0 & <3.0   // _|_
  10022.5 & >=(int & 1) & <5  // _|_
  1003>=0 & <=7 & >=3 & <=10  // >=3 & <=7
  1004!=null & 1              // 1
  1005>=5 & <=5               // 5
  1006```
  1007
  1008
  1009### Structs
  1010
  1011A _struct_ is a set of elements called _fields_, each of
  1012which has a name, called a _label_, and value.
  1013
  1014We say a label is _defined_ for a struct if the struct has a field with the
  1015corresponding label.
  1016The value for a label `f` of struct `a` is denoted `a.f`.
  1017A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
  1018defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
  1019Note that if `a` is an instance of `b` it may have fields with labels that
  1020are not defined for `b`.
  1021
  1022The (unique) struct with no fields, written `{}`, has every struct as an
  1023instance. It can be considered the type of all structs.
  1024
  1025```
  1026{a: 1} ⊑ {}
  1027{a: 1, b: 1} ⊑ {a: 1}
  1028{a: 1} ⊑ {a: int}
  1029{a: 1, b: 1.0} ⊑ {a: int, b: number}
  1030
  1031{} ⋢ {a: 1}
  1032{a: 2} ⋢ {a: 1}
  1033{a: 1} ⋢ {b: 1}
  1034```
  1035
  1036The successful unification of structs `a` and `b` is a new struct `c` which
  1037has all fields of both `a` and `b`, where
  1038the value of a field `f` in `c` is `a.f & b.f` if `f` is defined in both `a` and `b`,
  1039or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
  1040Any [references](#references) to `a` or `b`
  1041in their respective field values need to be replaced with references to `c`.
  1042The result of a unification is bottom (`_|_`) if any of its defined
  1043fields evaluates to bottom, recursively.
  1044
  1045A struct literal may contain multiple fields with the same label,
  1046the result of which is the unification of all those fields.
  1047
  1048```
  1049StructLit       = "{" { Declaration "," } "}" .
  1050Declaration     = Field | Ellipsis | Embedding | LetClause | attribute .
  1051Ellipsis        = "..." [ Expression ] .
  1052Embedding       = Comprehension | AliasExpr .
  1053Field           = Label ":" { Label ":" } AliasExpr { attribute } .
  1054Label           = [ identifier "=" ] LabelExpr .
  1055LabelExpr       = LabelName [ "?" | "!" ] | "[" AliasExpr "]" .
  1056LabelName       = identifier | simple_string_lit | "(" AliasExpr ")" .
  1057
  1058attribute       = "@" identifier "(" attr_tokens ")" .
  1059attr_tokens     = { attr_token |
  1060                    "(" attr_tokens ")" |
  1061                    "[" attr_tokens "]" |
  1062                    "{" attr_tokens "}" } .
  1063attr_token      = /* any token except '(', ')', '[', ']', '{', or '}' */
  1064```
  1065
  1066```
  1067Expression                             Result
  1068{a: int, a: 1}                         {a: 1}
  1069{a: int} & {a: 1}                      {a: 1}
  1070{a: >=1 & <=7} & {a: >=5 & <=9}        {a: >=5 & <=7}
  1071{a: >=1 & <=7, a: >=5 & <=9}           {a: >=5 & <=7}
  1072
  1073{a: 1} & {b: 2}                        {a: 1, b: 2}
  1074{a: 1, b: int} & {b: 2}                {a: 1, b: 2}
  1075
  1076{a: 1} & {a: 2}                        _|_
  1077```
  1078
  1079
  1080#### Field constraints
  1081
  1082A struct may declare _field constraints_ which define values
  1083that should be unified with a given field once it is defined.
  1084The existence of a field constraint declares, but does not define, that field.
  1085
  1086Syntactically, a field is marked as a constraint
  1087by following its label with an _optional_ marker `?`
  1088or _required_ marker `!`.
  1089These markers are not part of the field name.
  1090
  1091A struct that has a required field constraint with a bottom value
  1092evaluates to bottom.
  1093An optional field constraint with a bottom value does _not_ invalidate
  1094the struct that contains it
  1095as long as it is not unified with a defined field.
  1096
  1097The subsumption relation for fields with the various markers is defined as
  1098```
  1099{a?: x} ⊑ {a!: x} ⊑ {a: x}
  1100```
  1101for any given `x`.
  1102
  1103Implementations may error upon encountering a required field constraint
  1104when manifesting CUE as data.
  1105
  1106```
  1107Expression                             Result
  1108{foo?: 3} & {foo: 3}                   {foo: 3}
  1109{foo!: 3} & {foo: 3}                   {foo: 3}
  1110
  1111{foo!: int} & {foo: int}               {foo:  int}
  1112{foo!: int} & {foo?: <1}               {foo!: <1}
  1113{foo!: int} & {foo: <=3}               {foo:  <=3}
  1114{foo!: int} & {foo: 3}                 {foo:  3}
  1115
  1116{foo!: 3} & {foo: int}                 {foo: 3}
  1117{foo!: 3} & {foo: <=4}                 {foo: 3}
  1118
  1119{foo?: 1} & {foo?: 2}                  {foo?: _|_} // No error
  1120{foo?: 1} & {foo!: 2}                  _|_
  1121{foo?: 1} & {foo: 2}                   _|_
  1122```
  1123
  1124<!-- see https://github.com/cue-lang/proposal/blob/main/designs/1951-required-fields-v2.md -->
  1125
  1126<!--NOTE: About bottom values for optional fields being okay.
  1127
  1128The proposition ¬P is a close cousin of P → ⊥ and is often used
  1129as an approximation to avoid the issues of using not.
  1130Bottom (⊥) is also frequently used to mean undefined. This makes sense.
  1131Consider `{a?: 2} & {a?: 3}`.
  1132Both structs say `a` is optional; in other words, it may be omitted.
  1133So we can still get a valid result by omitting `a`, even in
  1134case of a conflict.
  1135
  1136Granted, this definition may lead to confusing results, especially in
  1137definitions, when tightening an optional field leads to unintentionally
  1138discarding it.
  1139It could be a role of vet checkers to identify such cases (and suggest users
  1140to explicitly use `_|_` to discard a field, for instance).
  1141
  1142TODO: These examples show also how field constraints interact with defaults.
  1143Should we included this? Probably not necessary, as this is an orthogonal
  1144concern.
  1145```
  1146Expression                             Result
  1147a: { foo?: string }                    a: { foo?: string }
  1148b: { foo: "bar" }                      b: { foo: "bar" }
  1149c: { foo?: *"baz" | string }           c: { foo?: *"baz" | string }
  1150
  1151d: a & b                               { foo: "bar" }
  1152e: b & c                               { foo: "bar" }
  1153f: a & c                               { foo?: *"baz" | string }
  1154g: a & { foo?: number }                { foo?: _|_ } // This is fine
  1155h: b & { foo?: number }                _|_
  1156i: c & { foo: string }                 { foo: *"baz" | string }
  1157```
  1158-->
  1159
  1160
  1161#### Dynamic fields
  1162
  1163A _dynamic field_ is a field whose label is determined by
  1164an expression wrapped in parentheses.
  1165A dynamic field may be marked as optional or required.
  1166
  1167```
  1168Expression                             Result
  1169a:   "foo"                             a:   "foo"
  1170b:   "bar"                             b:   "bar"
  1171(a): "baz"                             foo: "baz"
  1172
  1173(a+b): "qux"                           foobar: "qux"
  1174
  1175(a)?: string                           foo?: string
  1176(b)!: string                           bar!: string
  1177```
  1178
  1179
  1180#### Pattern and default constraints
  1181
  1182A struct may define constraints that apply to a collection of fields.
  1183
  1184A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
  1185is a value of type string, and a value to unify with fields whose label
  1186unifies with the pattern.
  1187For a given struct `a` with pattern constraint `[p]: v`, `v` is unified
  1188with any field with name `f` in `a` for which `p & f` is not bottom.
  1189When unifying struct `a` and `b`,
  1190any pattern constraint declared in `a` and `b`
  1191are also declared in the result of unification.
  1192
  1193<!-- TODO: Update grammar and support this.
  1194A pattern constraints with a pattern preceded by `...` indicates
  1195the pattern can only matches fields in `b` for which there
  1196exists no field in `a` with the same label.
  1197-->
  1198
  1199Additionally, a _default constraint_, denoted `...value`, defines a value
  1200to unify with any field for which there is no other declaration in a struct.
  1201When unifying structs `a` and `b`,
  1202a default constraint `...v` declared in `a`
  1203defines that the value `v` should unify with any field in the resulting struct `c`
  1204whose label does not unify with any of the patterns of the pattern
  1205constraints defined for `a` _and_ for which there exists no field declaration
  1206in `a` with that label.
  1207The token `...` is a shorthand for `..._`.
  1208_Note_: default constraints of the form `..._` are not yet implemented.
  1209
  1210
  1211```
  1212a: {
  1213    foo:      string  // foo is a string
  1214    [=~"^i"]: int     // all other fields starting with i are integers
  1215    [=~"^b"]: bool    // all other fields starting with b are booleans
  1216    [>"c"]:   string  // all other fields lexically after c are strings
  1217
  1218    ...string         // all other fields must be a string. Note: default constraints are not yet implemented.
  1219}
  1220
  1221b: a & {
  1222    i3:    3
  1223    bar:   true
  1224    other: "a string"
  1225}
  1226```
  1227
  1228<!--
  1229TODO: are these two equivalent? Rog says that maybe you'll be able to refer
  1230to optional fields at some point, which will never make sense for patterns.
  1231Marcel says this is already mentioned elsewhere.
  1232
  1233a: {
  1234	["foo"]: int
  1235	foo?: int
  1236}
  1237-->
  1238
  1239Concrete field labels may be an identifier or string, the latter of which may be
  1240interpolated.
  1241Fields with identifier labels can be referred to within the scope they are
  1242defined, string labels cannot.
  1243References within such interpolated strings are resolved within
  1244the scope of the struct in which the label sequence is
  1245defined and can reference concrete labels lexically preceding
  1246the label within a label sequence.
  1247<!-- We allow this so that rewriting a CUE file to collapse or expand
  1248field sequences has no impact on semantics.
  1249-->
  1250
  1251<!--TODO: first implementation round will not yet have expression labels
  1252
  1253An ExpressionLabel sets a collection of optional fields to a field value.
  1254By default it defines this value for all possible string labels.
  1255An optional expression limits this to the set of optional fields which
  1256labels match the expression.
  1257-->
  1258
  1259
  1260<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
  1261
  1262
  1263<!-- NOTE:
  1264A DefinitionDecl does not allow repeated labels. This is to avoid
  1265any ambiguity or confusion about whether earlier path components
  1266are to be interpreted as declarations or normal fields (they should
  1267always be normal fields.)
  1268-->
  1269
  1270<!--NOTE:
  1271The syntax has been deliberately restricted to allow for the following
  1272future extensions and relaxations:
  1273  - Allow omitting a "?" in an expression label to indicate a concrete
  1274    string value (but maybe we want to use () for that).
  1275  - Make the "?" in expression label optional if expression labels
  1276    are always optional.
  1277  - Or allow eliding the "?" if the expression has no references and
  1278    is obviously not concrete (such as `[string]`).
  1279  - The expression of an expression label may also indicate a struct with
  1280    integer or even number labels
  1281    (beware of imprecise computation in the latter).
  1282      e.g. `{ [int]: string }` is a map of integers to strings.
  1283  - Allow for associative lists (`foo [@.field]: {field: string}`)
  1284  - The `...` notation can be extended analogously to that of a ListList,
  1285    by allowing it to follow with an expression for the remaining properties.
  1286    In that case it is no longer a shorthand for `[string]: _`, but rather
  1287    would define the value for any other value for which there is no field
  1288    defined.
  1289    Like the definition with List, this is somewhat odd, but it allows the
  1290    encoding of JSON schema's and (non-structural) OpenAPI's
  1291    additionalProperties and additionalItems.
  1292-->
  1293
  1294```
  1295intMap: [string]: int
  1296intMap: {
  1297    t1: 43
  1298    t2: 2.4  // error: 2.4 is not an integer
  1299}
  1300
  1301nameMap: [string]: {
  1302    firstName: string
  1303    nickName:  *firstName | string
  1304}
  1305
  1306nameMap: hank: firstName: "Hank"
  1307```
  1308
  1309The optional field set defined by `nameMap` matches every field,
  1310in this case just `hank`, and unifies the associated constraint
  1311with the matched field, resulting in:
  1312
  1313```
  1314nameMap: hank: {
  1315    firstName: "Hank"
  1316    nickName:  "Hank"
  1317}
  1318```
  1319
  1320
  1321#### Closed structs
  1322
  1323By default, structs are open to adding fields.
  1324Instances of an open struct `p` may contain fields not defined in `p`.
  1325This is makes it easy to add fields, but can lead to bugs:
  1326
  1327```
  1328S: {
  1329    field1: string
  1330}
  1331
  1332S1: S & { field2: "foo" }
  1333
  1334// S1 is { field1: string, field2: "foo" }
  1335
  1336
  1337A: {
  1338    field1: string
  1339    field2: string
  1340}
  1341
  1342A1: A & {
  1343    feild1: "foo"  // "field1" was accidentally misspelled
  1344}
  1345
  1346// A1 is
  1347//    { field1: string, field2: string, feild1: "foo" }
  1348// not the intended
  1349//    { field1: "foo", field2: string }
  1350```
  1351
  1352A _closed struct_ `c` is a struct whose instances may not declare any field
  1353with a name that does not match the name of a field
  1354or the pattern of a pattern constraint defined in `c`.
  1355Hidden fields are excluded from this limitation.
  1356A struct that is the result of unifying any struct with a [`...`](#structs)
  1357declaration is defined for all regular fields.
  1358Closing a struct is equivalent to adding `..._|_` to it.
  1359
  1360Syntactically, structs are closed explicitly with the `close` builtin or
  1361implicitly and recursively by [definitions](#definitions-and-hidden-fields).
  1362
  1363
  1364```
  1365A: close({
  1366    field1: string
  1367    field2: string
  1368})
  1369
  1370A1: A & {
  1371    feild1: string
  1372} // _|_ feild1 not defined for A
  1373
  1374A2: A & {
  1375    for k,v in { feild1: string } {
  1376        k: v
  1377    }
  1378}  // _|_ feild1 not defined for A
  1379
  1380C: close({
  1381    [_]: _
  1382})
  1383
  1384C2: C & {
  1385    for k,v in { thisIsFine: string } {
  1386        "\(k)": v
  1387    }
  1388}
  1389
  1390D: close({
  1391    // Values generated by comprehensions are treated as embeddings.
  1392    for k,v in { x: string } {
  1393        "\(k)": v
  1394    }
  1395})
  1396```
  1397
  1398<!-- (jba) Somewhere it should be said that optional fields are only
  1399     interesting inside closed structs. -->
  1400
  1401<!-- TODO: move embedding section to above the previous one -->
  1402
  1403#### Embedding
  1404
  1405A struct may contain an _embedded value_, an operand used as a declaration.
  1406An embedded value of type struct is unified with the struct in which it is
  1407embedded, but disregarding the restrictions imposed by closed structs.
  1408So if an embedding resolves to a closed struct, the corresponding enclosing
  1409struct will also be closed, but may have fields that are not allowed if
  1410normal rules for closed structs were observed.
  1411
  1412If an embedded value is not of type struct, the struct may only have
  1413definitions or hidden fields. Regular fields are not allowed in such case.
  1414
  1415The result of `{ A }` is `A` for any `A` (including definitions).
  1416
  1417Syntactically, embeddings may be any expression.
  1418
  1419```
  1420S1: {
  1421    a: 1
  1422    b: 2
  1423    {
  1424        c: 3
  1425    }
  1426}
  1427// S1 is { a: 1, b: 2, c: 3 }
  1428
  1429S2: close({
  1430    a: 1
  1431    b: 2
  1432    {
  1433        c: 3
  1434    }
  1435})
  1436// same as close(S1)
  1437
  1438S3: {
  1439    a: 1
  1440    b: 2
  1441    close({
  1442        c: 3
  1443    })
  1444}
  1445// same as S2
  1446```
  1447
  1448
  1449#### Definitions and hidden fields
  1450
  1451A field is a _definition_ if its identifier starts with `#` or `_#`.
  1452A field is _hidden_ if its identifier starts with a `_`.
  1453All other fields are _regular_.
  1454
  1455Definitions and hidden fields are not emitted when converting a CUE program
  1456to data and are never required to be concrete.
  1457
  1458Referencing a definition will recursively [close](#closed-structs) it.
  1459That is, a referenced definition will not unify with a struct
  1460that would add a field anywhere within the definition that it does not
  1461already define or explicitly allow with a pattern constraint or `...`.
  1462[Embedding](#embedding) allows bypassing this check.
  1463
  1464If referencing a definition would always result in an error, implementations
  1465may report this inconsistency at the point of its declaration.
  1466
  1467```
  1468#MyStruct: {
  1469    sub: field:    string
  1470}
  1471
  1472#MyStruct: {
  1473    sub: enabled?: bool
  1474}
  1475
  1476myValue: #MyStruct & {
  1477    sub: feild:   2     // error, feild not defined in #MyStruct
  1478    sub: enabled: true  // okay
  1479}
  1480
  1481#D: {
  1482    #OneOf
  1483
  1484    c: int // adds this field.
  1485}
  1486
  1487#OneOf: { a: int } | { b: int }
  1488
  1489
  1490D1: #D & { a: 12, c: 22 }  // { a: 12, c: 22 }
  1491D2: #D & { a: 12, b: 33 }  // _|_ // cannot define both `a` and `b`
  1492```
  1493
  1494
  1495```
  1496#A: {a: int}
  1497
  1498B: {
  1499    #A
  1500    b: c: int
  1501}
  1502
  1503x: B
  1504x: d: 3  // not allowed, as closed by embedded #A
  1505
  1506y: B.b
  1507y: d: 3  // allowed as nothing closes b
  1508
  1509#B: {
  1510    #A
  1511    b: c: int
  1512}
  1513
  1514z: #B.b
  1515z: d: 3  // not allowed, as referencing #B closes b
  1516```
  1517
  1518
  1519<!---
  1520JSON fields are usual camelCase. Clashes can be avoided by adopting the
  1521convention that definitions be TitleCase. Unexported definitions are still
  1522subject to clashes, but those are likely easier to resolve because they are
  1523package internal.
  1524--->
  1525
  1526
  1527#### Attributes
  1528
  1529Attributes allow associating meta information with values.
  1530Their primary purpose is to define mappings between CUE and
  1531other representations.
  1532Attributes do not influence the evaluation of CUE.
  1533
  1534An attribute associates an identifier with a value, a balanced token sequence,
  1535which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
  1536The sequence may not contain interpolations.
  1537
  1538Fields, structs and packages can be associated with a set of attributes.
  1539Attributes accumulate during unification, but implementations may remove
  1540duplicates that have the same source string representation.
  1541The interpretation of an attribute, including the handling of multiple
  1542attributes for a given identifier, is up to the consumer of the attribute.
  1543
  1544Field attributes define additional information about a field,
  1545such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
  1546name of the field when mapping to a different language.
  1547
  1548
  1549```
  1550// Package attribute
  1551@protobuf(proto3)
  1552
  1553myStruct1: {
  1554    // Struct attribute:
  1555    @jsonschema(id="https://example.org/mystruct1.json")
  1556
  1557    // Field attributes
  1558    field: string @go(Field)
  1559    attr:  int    @xml(,attr) @go(Attr)
  1560}
  1561
  1562myStruct2: {
  1563    field: string @go(Field)
  1564    attr:  int    @xml(a1,attr) @go(Attr)
  1565}
  1566
  1567Combined: myStruct1 & myStruct2
  1568// field: string @go(Field)
  1569// attr:  int    @xml(,attr) @xml(a1,attr) @go(Attr)
  1570```
  1571
  1572
  1573#### Aliases
  1574
  1575Aliases name values that can be referred to
  1576within the [scope](#declarations-and-scopes) in which they are declared.
  1577The name of an alias must be unique within its scope.
  1578
  1579```
  1580AliasExpr  = [ identifier "=" ] Expression .
  1581```
  1582
  1583Aliases can appear in several positions:
  1584
  1585<!--- TODO: consider allowing this. It should be considered whether
  1586having field aliases isn't already sufficient.
  1587
  1588As a declaration in a struct (`X=value`):
  1589
  1590- binds identifier `X` to a value embedded within the struct.
  1591--->
  1592
  1593In front of a Label (`X=label: value`):
  1594
  1595- binds the identifier to the same value as `label` would be bound
  1596  to if it were a valid identifier.
  1597
  1598In front of a dynamic field (`X=(label): value`):
  1599
  1600- binds the identifier to the same value as `label` if it were a valid
  1601  static identifier.
  1602
  1603In front of a dynamic field expression (`(X=expr): value`):
  1604
  1605- binds the identifier to the concrete label resulting from evaluating `expr`.
  1606
  1607In front of a pattern constraint (`X=[expr]: value`):
  1608
  1609- binds the identifier to the same field as the matched by the pattern
  1610  within the instance of the field value (`value`).
  1611
  1612In front of a pattern constraint expression (`[X=expr]: value`):
  1613
  1614- binds the identifier to the concrete label that matches `expr`
  1615  within the instances of the field value (`value`).
  1616
  1617Before a value (`foo: X=x`)
  1618
  1619- binds the identifier to the value it precedes within the scope of that value.
  1620
  1621Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
  1622
  1623- binds the identifier to the list element it precedes within the scope of the
  1624  list expression.
  1625
  1626<!-- TODO: explain the difference between aliases and definitions.
  1627     Now that you have definitions, are aliases really necessary?
  1628     Consider removing.
  1629-->
  1630
  1631```
  1632// A field alias
  1633foo: X  // 4
  1634X="not an identifier": 4
  1635
  1636// A value alias
  1637foo: X={x: X.a}
  1638bar: foo & {a: 1}  // {a: 1, x: 1}
  1639
  1640// A label alias
  1641[Y=string]: { name: Y }
  1642foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
  1643```
  1644
  1645<!-- TODO: also allow aliases as lists -->
  1646
  1647
  1648#### Let declarations
  1649
  1650_Let declarations_ bind an identifier to an expression.
  1651The identifier is only visible within the [scope](#declarations-and-scopes)
  1652in which it is declared.
  1653The identifier must be unique within its scope.
  1654
  1655```
  1656let x = expr
  1657
  1658a: x + 1
  1659b: x + 2
  1660```
  1661
  1662#### Shorthand notation for nested structs
  1663
  1664A field whose value is a struct with a single field may be written as
  1665a colon-separated sequence of the two field names,
  1666followed by a colon and the value of that single field.
  1667
  1668```
  1669job: myTask: replicas: 2
  1670```
  1671expands to
  1672```
  1673job: {
  1674    myTask: {
  1675        replicas: 2
  1676    }
  1677}
  1678```
  1679
  1680<!-- OPTIONAL FIELDS:
  1681
  1682The optional marker solves the issue of having to print large amounts of
  1683boilerplate when dealing with large types with many optional or default
  1684values (such as Kubernetes).
  1685Writing such optional values in terms of *null | value is tedious,
  1686unpleasant to read, and as it is not well defined what can be dropped or not,
  1687all null values have to be emitted from the output, even if the user
  1688doesn't override them.
  1689Part of the issue is how null is defined. We could adopt a Typescript-like
  1690approach of introducing "void" or "undefined" to mean "not defined and not
  1691part of the output". But having all of null, undefined, and void can be
  1692confusing. If these ever are introduced anyway, the ? operator could be
  1693expressed along the lines of
  1694   foo?: bar
  1695being a shorthand for
  1696   foo: void | bar
  1697where void is the default if no other default is given.
  1698
  1699The current mechanical definition of "?" is straightforward, though, and
  1700probably avoids the need for void, while solving a big issue.
  1701
  1702Caveats:
  1703[1] this definition requires explicitly defined fields to be emitted, even
  1704if they could be elided (for instance if the explicit value is the default
  1705value defined an optional field). This is probably a good thing.
  1706
  1707[2] a default value may still need to be included in an output if it is not
  1708the zero value for that field and it is not known if any outside system is
  1709aware of defaults. For instance, which defaults are specified by the user
  1710and which by the schema understood by the receiving system.
  1711The use of "?" together with defaults should therefore be used carefully
  1712in non-schema definitions.
  1713Problematic cases should be easy to detect by a vet-like check, though.
  1714
  1715[3] It should be considered how this affects the trim command.
  1716Should values implied by optional fields be allowed to be removed?
  1717Probably not. This restriction is unlikely to limit the usefulness of trim,
  1718though.
  1719
  1720[4] There should be an option to emit all concrete optional values.
  1721```
  1722-->
  1723
  1724### Lists
  1725
  1726A list literal defines a new value of type list.
  1727A list may be open or closed.
  1728An open list is indicated with a `...` at the end of an element list,
  1729optionally followed by a value for the remaining elements.
  1730
  1731The length of a closed list is the number of elements it contains.
  1732The length of an open list is the number of elements as a lower bound
  1733and an unlimited number of elements as its upper bound.
  1734
  1735```
  1736ListLit       = "[" [ ElementList [ "," ] ] "]" .
  1737ElementList   = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] .
  1738```
  1739
  1740Lists can be thought of as structs:
  1741
  1742```
  1743List: *null | {
  1744    Elem: _
  1745    Tail: List
  1746}
  1747```
  1748
  1749For closed lists, `Tail` is `null` for the last element, for open lists it is
  1750`*null | List`, defaulting to the shortest variant.
  1751For instance, the open list [ 1, 2, ... ] can be represented as:
  1752```
  1753open: List & { Elem: 1, Tail: { Elem: 2 } }
  1754```
  1755and the closed version of this list, [ 1, 2 ], as
  1756```
  1757closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
  1758```
  1759
  1760Using this representation, the subsumption rule for lists can
  1761be derived from those of structs.
  1762Implementations are not required to implement lists as structs.
  1763The `Elem` and `Tail` fields are not special and `len` will not work as
  1764expected in these cases.
  1765
  1766
  1767## Declarations and Scopes
  1768
  1769
  1770### Blocks
  1771
  1772A _block_ is a possibly empty sequence of declarations.
  1773The braces of a struct literal `{ ... }` form a block, but there are
  1774others as well:
  1775
  1776- The _universe block_ encompasses all CUE source text.
  1777- Each [package](#modules-instances-and-packages) has a _package block_
  1778  containing all CUE source text in that package.
  1779- Each file has a _file block_ containing all CUE source text in that file.
  1780- Each `for` and `let` clause in a [comprehension](#comprehensions)
  1781  is considered to be its own implicit block.
  1782
  1783Blocks nest and influence scoping.
  1784
  1785
  1786### Declarations and scope
  1787
  1788A _declaration_  may bind an identifier to a field, alias, or package.
  1789Every identifier in a program must be declared.
  1790Other than for fields,
  1791no identifier may be declared twice within the same block.
  1792For fields, an identifier may be declared more than once within the same block,
  1793resulting in a field with a value that is the result of unifying the values
  1794of all fields with the same identifier.
  1795String labels do not bind an identifier to the respective field.
  1796
  1797The _scope_ of a declared identifier is the extent of source text in which the
  1798identifier denotes the specified field, alias, or package.
  1799
  1800CUE is lexically scoped using blocks:
  1801
  18021. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
  18031. The scope of an identifier denoting a field
  1804  declared at top level (outside any struct literal) is the package block.
  18051. The scope of an identifier denoting an alias
  1806  declared at top level (outside any struct literal) is the file block.
  18071. The scope of a let identifier
  1808  declared at top level (outside any struct literal) is the file block.
  18091. The scope of the package name of an imported package is the file block of the
  1810  file containing the import declaration.
  18111. The scope of a field, alias or let identifier declared inside a struct
  1812   literal is the innermost containing block.
  1813
  1814An identifier declared in a block may be redeclared in an inner block.
  1815While the identifier of the inner declaration is in scope, it denotes the entity
  1816declared by the inner declaration.
  1817
  1818The package clause is not a declaration;
  1819the package name does not appear in any scope.
  1820Its purpose is to identify the files belonging to the same package
  1821and to specify the default name for import declarations.
  1822
  1823
  1824### Predeclared identifiers
  1825
  1826CUE predefines a set of types and builtin functions.
  1827For each of these there is a corresponding keyword which is the name
  1828of the predefined identifier, prefixed with `__`.
  1829
  1830```
  1831Functions
  1832len close and or
  1833
  1834Types
  1835null      The null type and value
  1836bool      All boolean values
  1837int       All integral numbers
  1838float     All decimal floating-point numbers
  1839string    Any valid UTF-8 sequence
  1840bytes     Any valid byte sequence
  1841
  1842Derived   Value
  1843number    int | float
  1844uint      >=0
  1845uint8     >=0 & <=255
  1846int8      >=-128 & <=127
  1847uint16    >=0 & <=65535
  1848int16     >=-32_768 & <=32_767
  1849rune      >=0 & <=0x10FFFF
  1850uint32    >=0 & <=4_294_967_295
  1851int32     >=-2_147_483_648 & <=2_147_483_647
  1852uint64    >=0 & <=18_446_744_073_709_551_615
  1853int64     >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
  1854uint128   >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
  1855int128    >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
  1856           <=170_141_183_460_469_231_731_687_303_715_884_105_727
  1857float32   >=-3.40282346638528859811704183484516925440e+38 &
  1858          <=3.40282346638528859811704183484516925440e+38
  1859float64   >=-1.797693134862315708145274237317043567981e+308 &
  1860          <=1.797693134862315708145274237317043567981e+308
  1861```
  1862
  1863
  1864### Exported identifiers
  1865
  1866<!-- move to a more logical spot -->
  1867
  1868An identifier of a package may be exported to permit access to it
  1869from another package.
  1870All identifiers not starting with `_` (so all regular fields and definitions
  1871starting with `#`) are exported.
  1872Any identifier starting with `_` is not visible outside the package and resides
  1873in a separate namespace than namesake identifiers of other packages.
  1874
  1875```
  1876package mypackage
  1877
  1878foo:   string  // visible outside mypackage
  1879"bar": string  // visible outside mypackage
  1880
  1881#Foo: {      // visible outside mypackage
  1882    a:  1    // visible outside mypackage
  1883    _b: 2    // not visible outside mypackage
  1884
  1885    #C: {    // visible outside mypackage
  1886        d: 4 // visible outside mypackage
  1887    }
  1888    _#E: foo // not visible outside mypackage
  1889}
  1890```
  1891
  1892
  1893### Uniqueness of identifiers
  1894
  1895Given a set of identifiers, an identifier is called unique if it is different
  1896from every other in the set, after applying normalization following
  1897[Unicode Annex #31](https://unicode.org/reports/tr31/).
  1898Two identifiers are different if they are spelled differently
  1899or if they appear in different packages and are not exported.
  1900Otherwise, they are the same.
  1901
  1902
  1903### Field declarations
  1904
  1905A field associates the value of an expression to a label within a struct.
  1906If this label is an identifier, it binds the field to that identifier,
  1907so the field's value can be referenced by writing the identifier.
  1908String labels are not bound to fields.
  1909```
  1910a: {
  1911    b: 2
  1912    "s": 3
  1913
  1914    c: b   // 2
  1915    d: s   // _|_ unresolved identifier "s"
  1916    e: a.s // 3
  1917}
  1918```
  1919
  1920If an expression may result in a value associated with a default value
  1921as described in [default values](#default-values), the field binds to this
  1922value-default pair.
  1923
  1924
  1925<!-- TODO: disallow creating identifiers starting with __
  1926...and reserve them for builtin values.
  1927
  1928The issue is with code generation. As no guarantee can be given that
  1929a predeclared identifier is not overridden in one of the enclosing scopes,
  1930code will have to handle detecting such cases and renaming them.
  1931An alternative is to have the predeclared identifiers be aliases for namesake
  1932equivalents starting with a double underscore (e.g. string -> __string),
  1933allowing generated code (normal code would keep using `string`) to refer
  1934to these directly.
  1935-->
  1936
  1937
  1938### Let declarations
  1939
  1940<!--
  1941TODO: why are there two "Let declarations" sections?
  1942-->
  1943
  1944Within a struct, a let clause binds an identifier to the given expression.
  1945
  1946Within the scope of the identifier, the identifier refers to the
  1947_locally declared_ expression.
  1948The expression is evaluated in the scope it was declared.
  1949
  1950
  1951## Expressions
  1952
  1953An expression specifies the computation of a value by applying operators and
  1954builtin functions to operands.
  1955
  1956Expressions that require concrete values are called _incomplete_ if any of
  1957their operands are not concrete, but define a value that would be legal for
  1958that expression.
  1959Incomplete expressions may be left unevaluated until a concrete value is
  1960requested at the application level.
  1961
  1962### Operands
  1963
  1964Operands denote the elementary values in an expression.
  1965An operand may be a literal, a (possibly qualified) identifier denoting
  1966a field, alias, or let declaration, or a parenthesized expression.
  1967
  1968```
  1969Operand     = Literal | OperandName | "(" Expression ")" .
  1970Literal     = BasicLit | ListLit | StructLit .
  1971BasicLit    = int_lit | float_lit | string_lit |
  1972              null_lit | bool_lit | bottom_lit .
  1973OperandName = identifier | QualifiedIdent .
  1974```
  1975
  1976### Qualified identifiers
  1977
  1978A qualified identifier is an identifier qualified with a package name prefix.
  1979
  1980```
  1981QualifiedIdent = PackageName "." identifier .
  1982```
  1983
  1984A qualified identifier accesses an identifier in a different package,
  1985which must be [imported](#import-declarations).
  1986The identifier must be declared in the [package block](#blocks) of that package.
  1987
  1988```
  1989math.Sin    // denotes the Sin function in package math
  1990```
  1991
  1992### References
  1993
  1994An identifier operand refers to a field and is called a reference.
  1995The value of a reference is a copy of the expression associated with the field
  1996that it is bound to,
  1997with any references within that expression bound to the respective copies of
  1998the fields they were originally bound to.
  1999Implementations may use a different mechanism to evaluate as long as
  2000these semantics are maintained.
  2001
  2002```
  2003a: {
  2004    place:    string
  2005    greeting: "Hello, \(place)!"
  2006}
  2007
  2008b: a & { place: "world" }
  2009c: a & { place: "you" }
  2010
  2011d: b.greeting  // "Hello, world!"
  2012e: c.greeting  // "Hello, you!"
  2013```
  2014
  2015
  2016
  2017### Primary expressions
  2018
  2019Primary expressions are the operands for unary and binary expressions.
  2020
  2021```
  2022PrimaryExpr =
  2023	Operand |
  2024	PrimaryExpr Selector |
  2025	PrimaryExpr Index |
  2026	PrimaryExpr Arguments .
  2027
  2028Selector       = "." (identifier | simple_string_lit) .
  2029Index          = "[" Expression "]" .
  2030Argument       = Expression .
  2031Arguments      = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
  2032```
  2033<!---
  2034TODO:
  2035	PrimaryExpr Query |
  2036Query          = "." Filters .
  2037Filters        = Filter { Filter } .
  2038Filter         = "[" [ "?" ] AliasExpr "]" .
  2039
  2040TODO: maybe reintroduce slices, as they are useful in queries, probably this
  2041time with Python semantics.
  2042	PrimaryExpr Slice |
  2043Slice          = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
  2044
  2045Argument       = Expression | ( identifier ":" Expression ).
  2046
  2047// & expression type
  2048// string_lit: same as label. Arguments is current node.
  2049// If selector is applied to list, it performs the operation for each
  2050// element.
  2051
  2052TODO: considering allowing decimal_lit for selectors.
  2053--->
  2054
  2055```
  2056x
  20572
  2058(s + ".txt")
  2059f(3.1415, true)
  2060m["foo"]
  2061obj.color
  2062f.p[i].x
  2063```
  2064
  2065
  2066### Selectors
  2067
  2068For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
  2069the selector expression
  2070
  2071```
  2072x.f
  2073```
  2074
  2075denotes the element of a <!--list or -->struct `x` identified by `f`.
  2076<!--For structs, -->
  2077`f` must be an identifier or a string literal identifying
  2078any definition or regular non-optional field.
  2079The identifier `f` is called the field selector.
  2080
  2081<!--
  2082Allowing strings to be used as field selectors obviates the need for
  2083backquoted identifiers. Note that some standards use names for structs that
  2084are not standard identifiers (such "Fn::Foo"). Note that indexing does not
  2085allow access to identifiers.
  2086-->
  2087
  2088<!--
  2089For lists, `f` must be an integer and follows the same lookup rules as
  2090for the index operation.
  2091The type of the selector expression is the type of `f`.
  2092-->
  2093
  2094If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
  2095
  2096<!--
  2097TODO: consider allowing this and also for selectors. It needs to be considered
  2098how defaults are carried forward in cases like:
  2099
  2100    x: { a: string | *"foo" } | *{ a: int | *4 }
  2101    y: x.a & string
  2102
  2103What is y in this case?
  2104   (x.a & string, _|_)
  2105   (string|"foo", _|_)
  2106   (string|"foo", "foo)
  2107If the latter, then why?
  2108
  2109For a disjunction of the form `x1 | ... | xn`,
  2110the selector is applied to each element `x1.f | ... | xn.f`.
  2111-->
  2112
  2113Otherwise, if `x` is not a <!--list or -->struct,
  2114or if `f` does not exist in `x`,
  2115the result of the expression is bottom (an error).
  2116In the latter case the expression is incomplete.
  2117The operand of a selector may be associated with a default.
  2118
  2119```
  2120T: {
  2121    x:     int
  2122    y:     3
  2123    "x-y": 4
  2124}
  2125
  2126a: T.x     // int
  2127b: T.y     // 3
  2128c: T.z     // _|_ // field 'z' not found in T
  2129d: T."x-y" // 4
  2130
  2131e: {a: 1|*2} | *{a: 3|*4}
  2132f: e.a  // 4 (default value)
  2133```
  2134
  2135<!--
  2136```
  2137(v, d).f  =>  (v.f, d.f)
  2138
  2139e: {a: 1|*2} | *{a: 3|*4}
  2140f: e.a  // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
  2141
  2142```
  2143-->
  2144
  2145
  2146### Index expressions
  2147
  2148A primary expression of the form
  2149
  2150```
  2151a[x]
  2152```
  2153
  2154denotes the element of a list or struct `a` indexed by `x`.
  2155The value `x` is called the index or field name, respectively.
  2156The following rules apply:
  2157
  2158If `a` is not a struct:
  2159
  2160- `a` is a list (which need not be complete)
  2161- the index `x` unified with `int` must be concrete.
  2162- the index `x` is in range if `0 <= x < len(a)`, where only the
  2163  explicitly defined values of an open-ended list are considered,
  2164  otherwise it is out of range
  2165
  2166The result of `a[x]` is
  2167
  2168for `a` of list type:
  2169
  2170- the list element at index `x`, if `x` is within range
  2171- bottom (an error), otherwise
  2172
  2173
  2174for `a` of struct type:
  2175
  2176- the index `x` unified with `string` must be concrete.
  2177- the value of the regular and non-optional field named `x` of struct `a`,
  2178  if this field exists
  2179- bottom (an error), otherwise
  2180
  2181
  2182```
  2183a: [ 1, 2 ][1]     // 2
  2184b: [ 1, 2 ][2]     // _|_
  2185c: [ 1, 2, ...][2] // _|_
  2186
  2187// Defaults are selected for both operand and index:
  2188x: [1, 2] | *[3, 4]
  2189y: int | *1
  2190z: x[y]  // 4
  2191```
  2192
  2193### Operators
  2194
  2195Operators combine operands into expressions.
  2196
  2197```
  2198Expression = UnaryExpr | Expression binary_op Expression .
  2199UnaryExpr  = PrimaryExpr | unary_op UnaryExpr .
  2200
  2201binary_op  = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op  .
  2202rel_op     = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
  2203add_op     = "+" | "-" .
  2204mul_op     = "*" | "/" .
  2205unary_op   = "+" | "-" | "!" | "*" | rel_op .
  2206```
  2207
  2208Comparisons are discussed [elsewhere](#comparison-operators).
  2209For any binary operators, the operand types must unify.
  2210
  2211<!-- TODO: durations
  2212 unless the operation involves durations.
  2213
  2214Except for duration operations, if one operand is an untyped [literal] and the
  2215other operand is not, the constant is [converted] to the type of the other
  2216operand.
  2217-->
  2218
  2219<!--
  2220Operands of unary and binary expressions may be associated with a default using
  2221the following:
  2222
  2223```
  2224O1: op (v1, d1)          => (op v1, op d1)
  2225
  2226O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
  2227and because v => (v, v)
  2228O3: v1       op (v2, d2) => (v1 op v2, v1 op d2)
  2229O4: (v1, d1) op v2       => (v1 op v2, d1 op v2)
  2230```
  2231
  2232```
  2233Field               Resulting Value-Default pair
  2234a: *1|2             (1|2, 1)
  2235b: -a               (-a, -1)
  2236
  2237c: a + 2            (a+2, 3)
  2238d: a + a            (a+a, 2)
  2239```
  2240-->
  2241
  2242#### Operator precedence
  2243
  2244Unary operators have the highest precedence.
  2245
  2246There are eight precedence levels for binary operators.
  2247Multiplication operators binds strongest, followed by
  2248addition operators, comparison operators,
  2249`&&` (logical AND), `||` (logical OR), `&` (unification),
  2250and finally `|` (disjunction):
  2251
  2252```
  2253Precedence    Operator
  2254    7             *  /
  2255    6             +  -
  2256    5             ==  !=  <  <=  >  >= =~ !~
  2257    4             &&
  2258    3             ||
  2259    2             &
  2260    1             |
  2261```
  2262
  2263Binary operators of the same precedence associate from left to right.
  2264For instance, `x / y * z` is the same as `(x / y) * z`.
  2265
  2266```
  2267+x
  226823 + 3*x[i]
  2269x <= f()
  2270f() || g()
  2271x == y+1 && y == z-1
  22722 | int
  2273{ a: 1 } & { b: 2 }
  2274```
  2275
  2276#### Arithmetic operators
  2277
  2278Arithmetic operators apply to numeric values and yield a result of the same type
  2279as the first operand. The four standard arithmetic operators
  2280`(+, -, *, /)` apply to integer and decimal floating-point types;
  2281`+` and `*` also apply to strings and bytes.
  2282
  2283```
  2284+    sum                    integers, floats, strings, bytes
  2285-    difference             integers, floats
  2286*    product                integers, floats, strings, bytes
  2287/    quotient               integers, floats
  2288```
  2289
  2290For any operator that accepts operands of type `float`, any operand may be
  2291of type `int` or `float`, in which case the result will be `float`
  2292if it cannot be represented as an `int` or if any of the operands are `float`,
  2293or `int` otherwise.
  2294So the result of `1 / 2` is `0.5` and is of type `float`.
  2295
  2296The result of division by zero is bottom (an error).
  2297<!-- TODO: consider making it +/- Inf -->
  2298Integer division is implemented through the builtin functions
  2299`quo`, `rem`, `div`, and `mod`.
  2300
  2301The unary operators `+` and `-` are defined for numeric values as follows:
  2302
  2303```
  2304+x                          is 0 + x
  2305-x    negation              is 0 - x
  2306```
  2307
  2308#### String operators
  2309
  2310Strings can be concatenated using the `+` operator:
  2311```
  2312s: "hi " + name + " and good bye"
  2313```
  2314String addition creates a new string by concatenating the operands.
  2315
  2316A string can be repeated by multiplying it:
  2317
  2318```
  2319s: "etc. "*3  // "etc. etc. etc. "
  2320```
  2321
  2322<!-- jba: Do these work for byte sequences? If not, why not? -->
  2323
  2324
  2325##### Comparison operators
  2326
  2327Comparison operators compare two operands and yield an untyped boolean value.
  2328
  2329```
  2330==    equal
  2331!=    not equal
  2332<     less
  2333<=    less or equal
  2334>     greater
  2335>=    greater or equal
  2336=~    matches regular expression
  2337!~    does not match regular expression
  2338```
  2339
  2340<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
  2341
  2342In any comparison, the types of the two operands must unify or one of the
  2343operands must be null.
  2344
  2345The equality operators `==` and `!=` apply to operands that are comparable.
  2346The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
  2347The matching operators `=~` and `!~` apply to a string and a regular
  2348expression operand.
  2349These terms and the result of the comparisons are defined as follows:
  2350
  2351- Null is comparable with itself and any other type.
  2352  Two null values are always equal, null is unequal with anything else.
  2353- Boolean values are comparable.
  2354  Two boolean values are equal if they are either both true or both false.
  2355- Integer values are comparable and ordered, in the usual way.
  2356- Floating-point values are comparable and ordered, as per the definitions
  2357  for binary coded decimals in the IEEE-754-2008 standard.
  2358- Floating point numbers may be compared with integers.
  2359- String and bytes values are comparable and ordered lexically byte-wise.
  2360- Struct are not comparable.
  2361- Lists are not comparable.
  2362- The regular expression syntax is the one accepted by RE2,
  2363  described in https://github.com/google/re2/wiki/Syntax,
  2364  except for `\C`.
  2365- `s =~ r` is true if `s` matches the regular expression `r`.
  2366- `s !~ r` is true if `s` does not match regular expression `r`.
  2367
  2368<!--- TODO: consider the following
  2369- For regular expression, named capture groups are interpreted as CUE references
  2370  that must unify with the strings matching this capture group.
  2371--->
  2372<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
  2373<!-- Consider implementing Level 2 of Unicode regular expression. -->
  2374
  2375```
  23763 < 4       // true
  23773 < 4.0     // true
  2378null == 2   // false
  2379null != {}  // true
  2380{} == {}    // _|_: structs are not comparable against structs
  2381
  2382"Wild cats" =~ "cat"   // true
  2383"Wild cats" !~ "dog"   // true
  2384
  2385"foo" =~ "^[a-z]{3}$"  // true
  2386"foo" =~ "^[a-z]{4}$"  // false
  2387```
  2388
  2389<!-- jba
  2390I think I know what `3 < a` should mean if
  2391
  2392    a: >=1 & <=5
  2393
  2394It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
  2395
  2396But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
  2397-->
  2398
  2399#### Logical operators
  2400
  2401Logical operators apply to boolean values and yield a result of the same type
  2402as the operands. The right operand is evaluated conditionally.
  2403
  2404```
  2405&&    conditional AND    p && q  is  "if p then q else false"
  2406||    conditional OR     p || q  is  "if p then true else q"
  2407!     NOT                !p      is  "not p"
  2408```
  2409
  2410
  2411<!--
  2412### TODO TODO TODO
  2413
  24143.14 / 0.0   // illegal: division by zero
  2415Illegal conversions always apply to CUE.
  2416
  2417Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
  2418-->
  2419
  2420<!--- TODO(mpvl): conversions
  2421### Conversions
  2422Conversions are expressions of the form `T(x)` where `T` and `x` are
  2423expressions.
  2424The result is always an instance of `T`.
  2425
  2426```
  2427Conversion = Expression "(" Expression [ "," ] ")" .
  2428```
  2429--->
  2430<!---
  2431
  2432A literal value `x` can be converted to type T if `x` is representable by a
  2433value of `T`.
  2434
  2435As a special case, an integer literal `x` can be converted to a string type
  2436using the same rule as for non-constant x.
  2437
  2438Converting a literal yields a typed value as result.
  2439
  2440```
  2441uint(iota)               // iota value of type uint
  2442float32(2.718281828)     // 2.718281828 of type float32
  2443complex128(1)            // 1.0 + 0.0i of type complex128
  2444float32(0.49999999)      // 0.5 of type float32
  2445float64(-1e-1000)        // 0.0 of type float64
  2446string('x')              // "x" of type string
  2447string(0x266c)           // "♬" of type string
  2448MyString("foo" + "bar")  // "foobar" of type MyString
  2449string([]byte{'a'})      // not a constant: []byte{'a'} is not a constant
  2450(*int)(nil)              // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
  2451int(1.2)                 // illegal: 1.2 cannot be represented as an int
  2452string(65.0)             // illegal: 65.0 is not an integer constant
  2453```
  2454--->
  2455<!---
  2456
  2457A conversion is always allowed if `x` is an instance of `T`.
  2458
  2459If `T` and `x` of different underlying type, a conversion is allowed if
  2460`x` can be converted to a value `x'` of `T`'s type, and
  2461`x'` is an instance of `T`.
  2462A value `x` can be converted to the type of `T` in any of these cases:
  2463
  2464- `x` is a struct and is subsumed by `T`.
  2465- `x` and `T` are both integer or floating points.
  2466- `x` is an integer or a byte sequence and `T` is a string.
  2467- `x` is a string and `T` is a byte sequence.
  2468
  2469Specific rules apply to conversions between numeric types, structs,
  2470or to and from a string type. These conversions may change the representation
  2471of `x`.
  2472All other conversions only change the type but not the representation of x.
  2473
  2474
  2475#### Conversions between numeric ranges
  2476For the conversion of numeric values, the following rules apply:
  2477
  24781. Any integer value can be converted into any other integer value
  2479   provided that it is within range.
  24802. When converting a decimal floating-point number to an integer, the fraction
  2481   is discarded (truncation towards zero). TODO: or disallow truncating?
  2482
  2483```
  2484a: uint16(int(1000))  // uint16(1000)
  2485b: uint8(1000)        // _|_ // overflow
  2486c: int(2.5)           // 2  TODO: TBD
  2487```
  2488
  2489
  2490#### Conversions to and from a string type
  2491
  2492Converting a list of bytes to a string type yields a string whose successive
  2493bytes are the elements of the slice.
  2494Invalid UTF-8 is converted to `"\uFFFD"`.
  2495
  2496```
  2497string('hell\xc3\xb8')   // "hellø"
  2498string(bytes([0x20]))    // " "
  2499```
  2500
  2501As string value is always convertible to a list of bytes.
  2502
  2503```
  2504bytes("hellø")   // 'hell\xc3\xb8'
  2505bytes("")        // ''
  2506```
  2507
  2508#### Conversions between list types
  2509
  2510Conversions between list types are possible only if `T` strictly subsumes `x`
  2511and the result will be the unification of `T` and `x`.
  2512
  2513If we introduce named types this would be different from IP & [10, ...]
  2514
  2515Consider removing this until it has a different meaning.
  2516
  2517```
  2518IP:        4*[byte]
  2519Private10: IP([10, ...])  // [10, byte, byte, byte]
  2520```
  2521
  2522#### Conversions between struct types
  2523
  2524A conversion from `x` to `T`
  2525is applied using the following rules:
  2526
  25271. `x` must be an instance of `T`,
  25282. all fields defined for `x` that are not defined for `T` are removed from
  2529  the result of the conversion, recursively.
  2530
  2531<!-- jba: I don't think you say anywhere that the matching fields are unified.
  2532mpvl: they are not, x must be an instance of T, in which case x == T&x,
  2533so unification would be unnecessary.
  2534-->
  2535<!--
  2536```
  2537T: {
  2538    a: { b: 1..10 }
  2539}
  2540
  2541x1: {
  2542    a: { b: 8, c: 10 }
  2543    d: 9
  2544}
  2545
  2546c1: T(x1)             // { a: { b: 8 } }
  2547c2: T({})             // _|_  // missing field 'a' in '{}'
  2548c3: T({ a: {b: 0} })  // _|_  // field a.b does not unify (0 & 1..10)
  2549```
  2550-->
  2551
  2552### Calls
  2553
  2554Calls can be made to core library functions, called builtins.
  2555Given an expression `f` of function type F,
  2556```
  2557f(a1, a2, … an)
  2558```
  2559calls `f` with arguments `a1, a2, … an`. Arguments must be expressions
  2560of which the values are an instance of the parameter types of `F`
  2561and are evaluated before the function is called.
  2562
  2563```
  2564a: math.Atan2(x, y)
  2565```
  2566
  2567In a function call, the function value and arguments are evaluated in the usual
  2568order.
  2569After they are evaluated, the parameters of the call are passed by value
  2570to the function and the called function begins execution.
  2571The return parameters
  2572of the function are passed by value back to the calling function when the
  2573function returns.
  2574
  2575
  2576### Comprehensions
  2577
  2578Lists and fields can be constructed using comprehensions.
  2579
  2580Comprehensions define a clause sequence that consists of a sequence of
  2581`for`, `if`, and `let` clauses, nesting from left to right.
  2582The sequence must start with a `for` or `if` clause.
  2583The `for` and `let` clauses each define a new scope in which new values are
  2584bound to be available for the next clause.
  2585
  2586The `for` clause binds the defined identifiers, on each iteration, to the next
  2587value of some iterable value in a new scope.
  2588A `for` clause may bind one or two identifiers.
  2589If there is one identifier, it binds it to the value of
  2590a list element or struct field value.
  2591If there are two identifiers, the first value will be the key or index,
  2592if available, and the second will be the value.
  2593
  2594For lists, `for` iterates over all elements in the list after closing it.
  2595For structs, `for` iterates over all non-optional regular fields.
  2596
  2597An `if` clause, or guard, specifies an expression that terminates the current
  2598iteration if it evaluates to false.
  2599
  2600The `let` clause binds the result of an expression to the defined identifier
  2601in a new scope.
  2602
  2603A current iteration is said to complete if the innermost block of the clause
  2604sequence is reached.
  2605Syntactically, the comprehension value is a struct.
  2606A comprehension can generate non-struct values by embedding such values within
  2607this struct.
  2608
  2609Within lists, the values yielded by a comprehension are inserted in the list
  2610at the position of the comprehension.
  2611Within structs, the values yielded by a comprehension are embedded within the
  2612struct.
  2613Both structs and lists may contain multiple comprehensions.
  2614
  2615```
  2616Comprehension       = Clauses StructLit .
  2617
  2618Clauses             = StartClause { [ "," ] Clause } .
  2619StartClause         = ForClause | GuardClause .
  2620Clause              = StartClause | LetClause .
  2621ForClause           = "for" identifier [ "," identifier ] "in" Expression .
  2622GuardClause         = "if" Expression .
  2623LetClause           = "let" identifier "=" Expression .
  2624```
  2625
  2626```
  2627a: [1, 2, 3, 4]
  2628b: [for x in a if x > 1 { x+1 }]  // [3, 4, 5]
  2629
  2630c: {
  2631    for x in a
  2632    if x < 4
  2633    let y = 1 {
  2634        "\(x)": x + y
  2635    }
  2636}
  2637d: { "1": 2, "2": 3, "3": 4 }
  2638```
  2639
  2640
  2641### String interpolation
  2642
  2643String interpolation allows constructing strings by replacing placeholder
  2644expressions with their string representation.
  2645String interpolation may be used in single- and double-quoted strings, as well
  2646as their multiline equivalent.
  2647
  2648A placeholder consists of `\(` followed by an expression and `)`.
  2649The expression is evaluated in the scope within which the string is defined.
  2650
  2651The result of the expression is substituted as follows:
  2652- string: as is
  2653- bool: the JSON representation of the bool
  2654- number: a JSON representation of the number that preserves the
  2655precision of the underlying binary coded decimal
  2656- bytes: as if substituted within single quotes or
  2657converted to valid UTF-8 replacing the
  2658maximal subpart of ill-formed subsequences with a single
  2659replacement character (W3C encoding standard) otherwise
  2660- list: illegal
  2661- struct: illegal
  2662
  2663
  2664```
  2665a: "World"
  2666b: "Hello \( a )!" // Hello World!
  2667```
  2668
  2669
  2670## Builtin Functions
  2671
  2672Builtin functions are predeclared. They are called like any other function.
  2673
  2674
  2675### `len`
  2676
  2677The builtin function `len` takes arguments of various types and returns
  2678a result of type int.
  2679
  2680```
  2681Argument type    Result
  2682
  2683bytes            length of byte sequence
  2684list             list length, smallest length for an open list
  2685struct           number of distinct data fields, excluding field constraints
  2686```
  2687<!-- TODO: consider not supporting len, but instead rely on more
  2688precisely named builtin functions:
  2689  - strings.RuneLen(x)
  2690  - bytes.Len(x)  // x may be a string
  2691  - struct.NumFooFields(x)
  2692  - list.Len(x)
  2693-->
  2694
  2695```
  2696Expression           Result
  2697len("Hellø")         6
  2698len([1, 2, 3])       3
  2699len([1, 2, ...])     2
  2700```
  2701
  2702
  2703### `close`
  2704
  2705The builtin function `close` converts a partially defined, or open, struct
  2706to a fully defined, or closed, struct.
  2707
  2708
  2709### `and`
  2710
  2711The builtin function `and` takes a list and returns the result of applying
  2712the `&` operator to all elements in the list.
  2713It returns top for the empty list.
  2714
  2715```
  2716Expression:          Result
  2717and([a, b])          a & b
  2718and([a])             a
  2719and([])              _
  2720```
  2721
  2722### `or`
  2723
  2724The builtin function `or` takes a list and returns the result of applying
  2725the `|` operator to all elements in the list.
  2726It returns bottom for the empty list.
  2727
  2728```
  2729Expression:          Result
  2730or([a, b])           a | b
  2731or([a])              a
  2732or([])               _|_
  2733```
  2734
  2735### `div`, `mod`, `quo` and `rem`
  2736
  2737For two integer values `x` and `y`,
  2738the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)`
  2739implement Euclidean division and
  2740satisfy the following relationship:
  2741
  2742```
  2743r = x - y*q  with 0 <= r < |y|
  2744```
  2745where `|y|` denotes the absolute value of `y`.
  2746
  2747```
  2748 x     y   div(x, y)  mod(x, y)
  2749 5     3        1          2
  2750-5     3       -2          1
  2751 5    -3       -1          2
  2752-5    -3        2          1
  2753```
  2754
  2755For two integer values `x` and `y`,
  2756the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)`
  2757implement truncated division and
  2758satisfy the following relationship:
  2759
  2760```
  2761x = q*y + r  and  |r| < |y|
  2762```
  2763
  2764with `quo(x, y)` truncated towards zero.
  2765
  2766```
  2767 x     y   quo(x, y)  rem(x, y)
  2768 5     3        1          2
  2769-5     3       -1         -2
  2770 5    -3       -1          2
  2771-5    -3        1         -2
  2772```
  2773
  2774A zero divisor in either case results in bottom (an error).
  2775
  2776
  2777## Cycles
  2778
  2779Implementations are required to interpret or reject cycles encountered
  2780during evaluation according to the rules in this section.
  2781
  2782
  2783### Reference cycles
  2784
  2785A _reference cycle_ occurs if a field references itself, either directly or
  2786indirectly.
  2787
  2788```
  2789// x references itself
  2790x: x
  2791
  2792// indirect cycles
  2793b: c
  2794c: d
  2795d: b
  2796```
  2797
  2798Implementations should treat these as `_`.
  2799Two particular cases are discussed below.
  2800
  2801
  2802#### Expressions that unify an atom with an expression
  2803
  2804An expression of the form `a & e`, where `a` is an atom
  2805and `e` is an expression, always evaluates to `a` or bottom.
  2806As it does not matter how we fail, we can assume the result to be `a`
  2807and postpone validating `a == e` until after all references
  2808in `e` have been resolved.
  2809
  2810```
  2811// Config            Evaluates to (requiring concrete values)
  2812x: {                  x: {
  2813    a: b + 100            a: _|_ // cycle detected
  2814    b: a - 100            b: _|_ // cycle detected
  2815}                     }
  2816
  2817y: x & {              y: {
  2818    a: 200                a: 200 // asserted that 200 == b + 100
  2819                          b: 100
  2820}                     }
  2821```
  2822
  2823
  2824#### Field values
  2825
  2826A field value of the form `r & v`,
  2827where `r` evaluates to a reference cycle and `v` is a concrete value,
  2828evaluates to `v`.
  2829Unification is idempotent and unifying a value with itself ad infinitum,
  2830which is what the cycle represents, results in this value.
  2831Implementations should detect cycles of this kind, ignore `r`,
  2832and take `v` as the result of unification.
  2833
  2834<!-- Tomabechi's graph unification algorithm
  2835can detect such cycles at near-zero cost. -->
  2836
  2837```
  2838Configuration    Evaluated
  2839//    c           Cycles in nodes of type struct evaluate
  2840//  ↙︎   ↖         to the fixed point of unifying their
  2841// a  →  b        values ad infinitum.
  2842
  2843a: b & { x: 1 }   // a: { x: 1, y: 2, z: 3 }
  2844b: c & { y: 2 }   // b: { x: 1, y: 2, z: 3 }
  2845c: a & { z: 3 }   // c: { x: 1, y: 2, z: 3 }
  2846
  2847// resolve a             b & {x:1}
  2848// substitute b          c & {y:2} & {x:1}
  2849// substitute c          a & {z:3} & {y:2} & {x:1}
  2850// eliminate a (cycle)   {z:3} & {y:2} & {x:1}
  2851// simplify              {x:1,y:2,z:3}
  2852```
  2853
  2854This rule also applies to field values that are disjunctions of unification
  2855operations of the above form.
  2856
  2857```
  2858a: b&{x:1} | {y:1}  // {x:1,y:3,z:2} | {y:1}
  2859b: {x:2} | c&{z:2}  // {x:2} | {x:1,y:3,z:2}
  2860c: a&{y:3} | {z:3}  // {x:1,y:3,z:2} | {z:3}
  2861
  2862
  2863// resolving a           b&{x:1} | {y:1}
  2864// substitute b          ({x:2} | c&{z:2})&{x:1} | {y:1}
  2865// simplify              c&{z:2}&{x:1} | {y:1}
  2866// substitute c          (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
  2867// simplify              a&{y:3}&{z:2}&{x:1} | {y:1}
  2868// eliminate a (cycle)   {y:3}&{z:2}&{x:1} | {y:1}
  2869// expand                {x:1,y:3,z:2} | {y:1}
  2870```
  2871
  2872Note that all nodes that form a reference cycle to form a struct will evaluate
  2873to the same value.
  2874If a field value is a disjunction, any element that is part of a cycle will
  2875evaluate to this value.
  2876
  2877
  2878### Structural cycles
  2879
  2880A structural cycle is when a node references one of its ancestor nodes.
  2881It is possible to construct a structural cycle by unifying two acyclic values:
  2882```
  2883// acyclic
  2884y: {
  2885    f: h: g
  2886    g: _
  2887}
  2888// acyclic
  2889x: {
  2890    f: _
  2891    g: f
  2892}
  2893// introduces structural cycle
  2894z: x & y
  2895```
  2896Implementations should be able to detect such structural cycles dynamically.
  2897
  2898A structural cycle can result in infinite structure or evaluation loops.
  2899```
  2900// infinite structure
  2901a: b: a
  2902
  2903// infinite evaluation
  2904f: {
  2905    n:   int
  2906    out: n + (f & {n: 1}).out
  2907}
  2908```
  2909CUE must allow or disallow structural cycles under certain circumstances.
  2910
  2911If a node `a` references an ancestor node, we call it and any of its
  2912field values `a.f` _cyclic_.
  2913So if `a` is cyclic, all of its descendants are also regarded as cyclic.
  2914A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
  2915is valid if any of its conjuncts is not cyclic.
  2916
  2917```
  2918// Disallowed: a list of infinite length with all elements being 1.
  2919#List: {
  2920    head: 1
  2921    tail: #List
  2922}
  2923
  2924// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
  2925a: {
  2926    b: c
  2927}
  2928c: {
  2929    d: a
  2930}
  2931
  2932// #List defines a list of arbitrary length. Because the recursive reference
  2933// is part of a disjunction, this does not result in a structural cycle.
  2934#List: {
  2935    head: _
  2936    tail: null | #List
  2937}
  2938
  2939// Usage of #List. The value of tail in the most deeply nested element will
  2940// be `null`: as the value of the disjunct referring to list is the only
  2941// conjunct, all conjuncts are cyclic and the value is invalid and so
  2942// eliminated from the disjunction.
  2943MyList: #List & { head: 1, tail: { head: 2 }}
  2944```
  2945
  2946<!--
  2947### Unused fields
  2948
  2949TODO: rules for detection of unused fields
  2950
  29511. Any alias value must be used
  2952-->
  2953
  2954
  2955## Modules, instances, and packages
  2956
  2957CUE configurations are constructed combining _instances_.
  2958An instance, in turn, is constructed from one or more source files belonging
  2959to the same _package_ that together declare the data representation.
  2960Elements of this data representation may be exported and used
  2961in other instances.
  2962
  2963### Source file organization
  2964
  2965Each source file consists of an optional package clause defining collection
  2966of files to which it belongs,
  2967followed by a possibly empty set of import declarations that declare
  2968packages whose contents it wishes to use, followed by a possibly empty set of
  2969declarations.
  2970
  2971Like with a struct, a source file may contain embeddings.
  2972Unlike with a struct, the embedded expressions may be any value.
  2973If the result of the unification of all embedded values is not a struct,
  2974it will be output instead of its enclosing file when exporting CUE
  2975to a data format
  2976
  2977```
  2978SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
  2979```
  2980
  2981```
  2982"Hello \(#place)!"
  2983
  2984#place: "world"
  2985
  2986// Outputs "Hello world!"
  2987```
  2988
  2989### Package clause
  2990
  2991A package clause is an optional clause that defines the package to which
  2992a source file the file belongs.
  2993
  2994```
  2995PackageClause  = "package" PackageName .
  2996PackageName    = identifier .
  2997```
  2998
  2999The PackageName must not be the blank identifier or a definition identifier.
  3000
  3001```
  3002package math
  3003```
  3004
  3005### Modules and instances
  3006
  3007A _module_ defines a tree of directories, rooted at the _module root_.
  3008
  3009All source files within a module with the same package name belong to the same
  3010package.
  3011<!-- jba: I can't make sense of the above sentence. -->
  3012A module may define multiple packages.
  3013
  3014An _instance_ of a package is any subset of files belonging
  3015to the same package.
  3016<!-- jba: Are you saying that -->
  3017<!-- if I have a package with files a, b and c, then there are 8 instances of -->
  3018<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
  3019<!-- purpose of that definition? -->
  3020It is interpreted as the concatenation of these files.
  3021
  3022An implementation may impose conventions on the layout of package files
  3023to determine which files of a package belongs to an instance.
  3024For example, an instance may be defined as the subset of package files
  3025belonging to a directory and all its ancestors.
  3026<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
  3027
  3028
  3029### Import declarations
  3030
  3031An import declaration states that the source file containing the declaration
  3032depends on definitions of the _imported_ package
  3033and enables access to exported identifiers of that package.
  3034The import names an identifier (PackageName) to be used for access and an
  3035ImportPath that specifies the package to be imported.
  3036
  3037```
  3038ImportDecl       = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
  3039ImportSpec       = [ PackageName ] ImportPath .
  3040ImportLocation   = { unicode_value } .
  3041ImportPath       = `"` ImportLocation [ ":" identifier ] `"` .
  3042```
  3043
  3044The PackageName is used in qualified identifiers to access
  3045exported identifiers of the package within the importing source file.
  3046It is declared in the file block.
  3047It defaults to the identifier specified in the package clause of the imported
  3048package, which must match either the last path component of ImportLocation
  3049or the identifier following it.
  3050
  3051<!--
  3052Note: this deviates from the Go spec where there is no such restriction.
  3053This restriction has the benefit of being to determine the identifiers
  3054for packages from within the file itself. But for CUE it is has another benefit:
  3055when using package hierarchies, one is more likely to want to include multiple
  3056packages within the same directory structure. This mechanism allows
  3057disambiguation in these cases.
  3058-->
  3059
  3060The interpretation of the ImportPath is implementation-dependent but it is
  3061typically either the path of a builtin package or a fully qualifying location
  3062of a package within a source code repository.
  3063
  3064An ImportLocation must be a non-empty string using only characters belonging to
  3065Unicode's L, M, N, P, and S general categories
  3066(the Graphic characters without spaces)
  3067and may not include the characters ``!"#$%&'()*,:;<=>?[\\]^`{|}``
  3068or the Unicode replacement character U+FFFD.
  3069
  3070Assume we have package containing the package clause `package math`,
  3071which exports function `Sin` at the path identified by `lib/math`.
  3072This table illustrates how `Sin` is accessed in files
  3073that import the package after the various types of import declaration.
  3074
  3075<!-- TODO: a better example than lib/math:math, where the suffix is a no-op -->
  3076
  3077```
  3078Import declaration          Local name of Sin
  3079
  3080import   "lib/math"         math.Sin
  3081import   "lib/math:math"    math.Sin
  3082import m "lib/math"         m.Sin
  3083```
  3084
  3085An import declaration declares a dependency relation between the importing and
  3086imported package. It is illegal for a package to import itself, directly or
  3087indirectly, or to directly import a package without referring to any of its
  3088exported identifiers.
  3089
  3090
  3091### An example package
  3092
  3093TODO

View as plain text