1<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
17# The CUE Language Specification
18
19## Introduction
20
21This is a reference manual for the CUE data constraint language.
22CUE, pronounced cue or Q, is a general-purpose and strongly typed
23constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
27a general purpose scripting language for creating scripts as well as
28simple servers, also expressed in CUE.
29
30CUE was designed with cloud configuration and related systems in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
34data in a straightforward manner.
35
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
41from the Go specification as a result.
42
43CUE draws its influence from many languages.
44Its main influences were BCL/GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
46Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
47Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
48
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes `""` or back
76quotes ` `` `.
77
78The form `a … b` represents the set of characters from a through b as
79alternatives. The horizontal ellipsis `…` is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character `…` (as opposed to the three characters `...`) is not a
82token of the CUE language.
83
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character `_` (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" | "$" .
129decimal_digit = "0" … "9" .
130binary_digit = "0" … "1" .
131octal_digit = "0" … "7" .
132hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
133```
134
135
136## Lexical elements
137
138### Comments
139
140Comments serve as program documentation.
141CUE supports line comments that start with the character sequence `//`
142and stop at the end of the line.
143
144A comment cannot start inside a string literal or inside a comment.
145A comment acts like a newline.
146
147
148### Tokens
149
150Tokens form the vocabulary of the CUE language. There are four classes:
151identifiers, keywords, operators and punctuation, and literals. White space,
152formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
153(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
154would otherwise combine into a single token. Also, a newline or end of file may
155trigger the insertion of a comma. While breaking the input into tokens, the
156next token is the longest sequence of characters that form a valid token.
157
158
159### Commas
160
161The formal grammar uses commas `,` as terminators in a number of productions.
162CUE programs may omit most of these commas using the following rules:
163
164When the input is broken into tokens, a comma is automatically inserted into
165the token stream immediately after a line's final token if that token is
166
167- an identifier, keyword, or bottom
168- a number or string literal, including an interpolation
169- one of the characters `)`, `]`, `}`, or `?`
170- an ellipsis `...`
171
172
173Although commas are automatically inserted, the parser will require
174explicit commas between two list elements.
175
176<!--
177TODO: remove the above exception
178-->
179
180To reflect idiomatic use, examples in this document elide commas using
181these rules.
182
183
184### Identifiers
185
186Identifiers name entities such as fields and aliases.
187An identifier is a sequence of one or more letters (which includes `_` and `$`)
188and digits, optionally preceded by `#` or `_#`.
189It may not be `_` or `$`.
190The first character in an identifier, or after an `#` if it contains one,
191must be a letter.
192Identifiers starting with a `#` or `_` are reserved for definitions and hidden
193fields.
194
195<!--
196TODO: allow identifiers as defined in Unicode UAX #31
197(https://unicode.org/reports/tr31/).
198
199Identifiers are normalized using the NFC normal form.
200-->
201
202```
203identifier = [ "#" | "_#" ] letter { letter | unicode_digit } .
204```
205
206```
207a
208_x9
209fieldName
210αβ
211```
212
213<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
214
215Some identifiers are [predeclared](#predeclared-identifiers).
216
217
218### Keywords
219
220CUE has a limited set of keywords.
221In addition, CUE reserves all identifiers starting with `__` (double underscores)
222as keywords.
223These are typically targets of pre-declared identifiers.
224
225All keywords may be used as labels (field names).
226Unless noted otherwise, they can also be used as identifiers to refer to
227the same name.
228
229
230#### Values
231
232The following keywords are values.
233
234```
235null true false
236```
237
238These can never be used to refer to a field of the same name.
239This restriction is to ensure compatibility with JSON configuration files.
240
241
242#### Preamble
243
244The following keywords are used at the preamble of a CUE file.
245After the preamble, they may be used as identifiers to refer to namesake fields.
246
247```
248package import
249```
250
251
252#### Comprehension clauses
253
254The following keywords are used in comprehensions.
255
256```
257for in if let
258```
259
260<!--
261TODO:
262 reduce [to]
263 order [by]
264-->
265
266
267### Operators and punctuation
268
269The following character sequences represent operators and punctuation:
270
271```
272+ && == < = ( )
273- || != > : { }
274* & =~ <= ? [ ] ,
275/ | !~ >= ! _|_ ... .
276```
277<!--
278Free tokens: ; ~ ^
279// To be used:
280 @ at: associative lists.
281
282// Idea: use # instead of @ for attributes and allow then at declaration level.
283// This will open up the possibility of defining #! at the start of a file
284// without requiring special syntax. Although probably not quite.
285 -->
286
287
288### Numeric literals
289
290There are several kinds of numeric literals.
291
292```
293int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
294decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .
295decimals = decimal_digit { [ "_" ] decimal_digit } .
296si_it = decimals [ "." decimals ] multiplier |
297 "." decimals multiplier .
298binary_lit = "0b" binary_digit { [ "_" ] binary_digit } .
299hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
300octal_lit = "0o" octal_digit { [ "_" ] octal_digit } .
301multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
302
303float_lit = decimals "." [ decimals ] [ exponent ] |
304 decimals exponent |
305 "." decimals [ exponent ].
306exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
307```
308
309An _integer literal_ is a sequence of digits representing an integer value.
310An optional prefix sets a non-decimal base: `0o` for octal,
311`0x` or `0X` for hexadecimal, and `0b` for binary.
312In hexadecimal literals, letters `a … f` and `A … F` represent values 10 through 15.
313All integers allow interstitial underscores `_`;
314these have no meaning and are solely for readability.
315
316Integer literals may have an SI or IEC multiplier.
317Multipliers can be used with fractional numbers.
318When multiplying a fraction by a multiplier, the result is truncated
319towards zero if it is not an integer.
320
321```
32242
3231.5G // 1_500_000_000
3241.3Ki // 1.3 * 1024 = trunc(1331.2) = 1331
325170_141_183_460_469_231_731_687_303_715_884_105_727
3260xBad_Face
3270o755
3280b0101_0001
329```
330
331A _decimal floating-point literal_ is a representation of
332a decimal floating-point value (a _float_).
333It has an integer part, a decimal point, a fractional part, and an
334exponent part.
335The integer and fractional part comprise decimal digits; the
336exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
337One of the integer part or the fractional part may be elided; one of the decimal
338point or the exponent may be elided.
339
340```
3410.
34272.40
343072.40 // == 72.40
3442.71828
3451.e+0
3466.67428e-11
3471E6
348.25
349.12345E+5
350```
351
352<!--
353TODO: consider allowing Exo (and up), if not followed by a sign
354or number. Alternatively one could only allow Ei, Yi, and Zi.
355-->
356
357Neither a `float_lit` nor an `si_lit` may appear after a token that is:
358
359- an identifier, keyword, or bottom
360- a number or string literal, including an interpolation
361- one of the characters `)`, `]`, `}`, `?`, or `.`.
362
363<!--
364So
365`a + 3.2Ti` -> `a`, `+`, `3.2Ti`
366`a 3.2Ti` -> `a`, `3`, `.`, `2`, `Ti`
367`a + .5e3` -> `a`, `+`, `.5e3`
368`a .5e3` -> `a`, `.`, `5`, `e3`.
369-->
370
371
372### String and byte sequence literals
373
374A string literal represents a string constant obtained from concatenating a
375sequence of characters.
376Byte sequences are a sequence of bytes.
377
378String and byte sequence literals are character sequences between,
379respectively, double and single quotes, as in `"bar"` and `'bar'`.
380Within the quotes, any character may appear except newline and,
381respectively, unescaped double or single quote.
382String literals may only be valid UTF-8.
383Byte sequences may contain any sequence of bytes.
384
385Several escape sequences allow arbitrary values to be encoded as ASCII text.
386An escape sequence starts with an _escape delimiter_, which is `\` by default.
387The escape delimiter may be altered to be `\` plus a fixed number of
388hash symbols `#` by padding the start and end of a string or byte sequence
389literal with this number of hash symbols.
390
391<!--
392TODO: move these examples further up so it's evident why #" exists.
393 #"This is not an \(interpolation)"#
394 #"This is an \#(interpolation)"#
395 #"The sequence "\U0001F604" renders as \#U0001F604."#
396-->
397
398There are four ways to represent the integer value as a numeric constant: `\x`
399followed by exactly two hexadecimal digits; `\u` followed by exactly four
400hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
401plain backslash `\` followed by exactly three octal digits.
402In each case the value of the literal is the value represented by the
403digits in the corresponding base.
404Hexadecimal and octal escapes are only allowed within byte sequences
405(single quotes).
406
407Although these representations all result in an integer, they have different
408valid ranges.
409Octal escapes must represent a value between 0 and 255 inclusive.
410Hexadecimal escapes satisfy this condition by construction.
411The escapes `\u` and `\U` represent Unicode code points so within them
412some values are illegal, in particular those above `0x10FFFF`.
413Surrogate halves are allowed,
414but are translated into their non-surrogate equivalent internally.
415
416The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
417represent individual bytes of the resulting string; all other escapes represent
418the (possibly multi-byte) UTF-8 encoding of individual characters.
419Thus inside a string literal `\377` and `\xFF` represent a single byte of
420value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
421the two bytes `0xc3 0xbf` of the UTF-8 encoding of character `U+00FF`.
422
423```
424\a U+0007 alert or bell
425\b U+0008 backspace
426\f U+000C form feed
427\n U+000A line feed or newline
428\r U+000D carriage return
429\t U+0009 horizontal tab
430\v U+000b vertical tab
431\/ U+002f slash (solidus)
432\\ U+005c backslash
433\' U+0027 single quote (valid escape only within single quoted literals)
434\" U+0022 double quote (valid escape only within double quoted literals)
435```
436
437The escape `\(` is used as an escape for string interpolation.
438A `\(` must be followed by a valid CUE Expression, followed by a `)`.
439
440A backslash at the end of a line elides the line terminator that follows it.
441This may not escape the final newline inside a multiline string: that
442newline is already implicitly elided.
443
444All other sequences starting with a backslash are illegal inside literals.
445
446```
447escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
448byte_value = octal_byte_value | hex_byte_value .
449octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit .
450hex_byte_value = `\` { `#` } "x" hex_digit hex_digit .
451little_u_value = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit .
452big_u_value = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit
453 hex_digit hex_digit hex_digit hex_digit .
454unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
455interpolation = "\" { `#` } "(" Expression ")" .
456
457string_lit = simple_string_lit |
458 multiline_string_lit |
459 simple_bytes_lit |
460 multiline_bytes_lit |
461 `#` string_lit `#` .
462
463simple_string_lit = `"` { unicode_value | interpolation } `"` .
464simple_bytes_lit = `'` { unicode_value | interpolation | byte_value } `'` .
465multiline_string_lit = `"""` newline
466 { unicode_value | interpolation | newline }
467 newline `"""` .
468multiline_bytes_lit = "'''" newline
469 { unicode_value | interpolation | byte_value | newline }
470 newline "'''" .
471```
472
473Carriage return characters (`\r`) inside string literals are discarded from
474the string value.
475
476```
477'a\000\xab'
478'\007'
479'\377'
480'\xa' // illegal: too few hexadecimal digits
481"\n"
482"\""
483'Hello, world!\n'
484"Hello, \( name )!"
485"日本語"
486"\u65e5本\U00008a9e"
487'\xff\u00FF'
488"\uD800" // illegal: surrogate half (TODO: probably should allow)
489"\U00110000" // illegal: invalid Unicode code point
490
491#"This is not an \(interpolation)"#
492#"This is an \#(interpolation)"#
493#"The sequence "\U0001F604" renders as \#U0001F604."#
494```
495
496These examples all represent the same string:
497
498```
499"日本語" // UTF-8 input text
500'日本語' // UTF-8 input text as byte sequence
501"\u65e5\u672c\u8a9e" // the explicit Unicode code points
502"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
503'\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e' // the explicit UTF-8 bytes
504```
505
506If the source code represents a character as two code points, such as a
507combining form involving an accent and a letter, the result will appear as two
508code points if placed in a string literal.
509
510Strings and byte sequences have a multiline equivalent.
511Multiline strings are like their single-line equivalent,
512but allow newline characters.
513
514Multiline strings and byte sequences respectively start with
515a triple double quote (`"""`) or triple single quote (`'''`),
516immediately followed by a newline, which is discarded from the string contents.
517The string is closed by a matching triple quote, which must be by itself
518on a new line, preceded by optional whitespace.
519The newline preceding the closing quote is discarded from the string contents.
520The whitespace before a closing triple quote must appear before any non-empty
521line after the opening quote and will be removed from each of these
522lines in the string literal.
523A closing triple quote may not appear in the string.
524To include it is suffices to escape one of the quotes.
525
526```
527"""
528 lily:
529 out of the water
530 out of itself
531
532 bass
533 picking \
534 bugs
535 off the moon
536 — Nick Virgilio, Selected Haiku, 1988
537 """
538```
539
540This represents the same string as:
541
542```
543"lily:\nout of the water\nout of itself\n\n" +
544"bass\npicking bugs\noff the moon\n" +
545" — Nick Virgilio, Selected Haiku, 1988"
546```
547
548<!-- TODO: other values
549
550Support for other values:
551- Duration literals
552- regular expressions: `re("[a-z]")`
553-->
554
555
556## Values
557
558In addition to simple values like `"hello"` and `42.0`, CUE has [structs](#structs).
559A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
560Structs are CUE's only way of building up complex values;
561lists, which we will see later,
562are defined in terms of structs.
563
564All possible values are ordered in a lattice,
565a partial order where every two elements have a single greatest lower bound.
566A value `a` is an _instance_ of a value `b`,
567denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
568that is if `a` orders before `b` in the partial order
569(`⊑` is _not_ a CUE operator).
570We also say that `b` _subsumes_ `a` in this case.
571In graphical terms, `b` is "above" `a` in the lattice.
572
573<!-- TODO: link to https://cuelang.org/docs/concepts/logic/ as more reading
574material, especially for those new to lattices
575-->
576
577At the top of the lattice is the single ancestor of all values, called
578[top](#top), denoted `_` in CUE.
579Every value is an instance of top.
580
581At the bottom of the lattice is the value called [bottom](#bottom), denoted `_|_`.
582A bottom value usually indicates an error.
583Bottom is an instance of every value.
584
585An _atom_ is any value whose only instances are itself and bottom.
586Examples of atoms are `42.0`, `"hello"`, `true`, and `null`.
587
588A value is _concrete_ if it is either an atom, or a struct whose field values
589are all concrete, recursively.
590
591CUE's values also include what we normally think of as types, like `string` and
592`float`.
593It does not distinguish between types and values:
594only the relationship of values in the lattice is important.
595Each CUE "type" subsumes the concrete values that one would normally think
596of as part of that type.
597For example, `"hello"` is an instance of `string`, and `42.0` is an instance of
598`float`.
599In addition to `string` and `float`, CUE has `null`, `int`, `bool`, and `bytes`.
600We informally call these CUE's "basic types".
601
602
603```
604false ⊑ bool
605true ⊑ bool
606true ⊑ true
6075.0 ⊑ float
608bool ⊑ _
609_|_ ⊑ _
610_|_ ⊑ _|_
611
612_ ⋢ _|_
613_ ⋢ bool
614int ⋢ bool
615bool ⋢ int
616false ⋢ true
617true ⋢ false
618float ⋢ 5.0
6195 ⋢ 6
620```
621
622
623### Unification
624
625The _unification_ of values `a` and `b`
626is defined as the greatest lower bound of `a` and `b`. (That is, the
627value `u` such that `u ⊑ a` and `u ⊑ b`,
628and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
629it holds that `v ⊑ u`.)
630Since CUE values form a lattice, the unification of two CUE values is
631always unique.
632
633These all follow from the definition of unification:
634- The unification of `a` with itself is always `a`.
635- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
636- The unification of a value with bottom is always bottom.
637
638Unification in CUE is a [binary expression](#operands), written `a & b`.
639It is commutative, associative, and idempotent.
640As a consequence, order of evaluation is irrelevant, a property that is key
641to many of the constructs in the CUE language as well as the tooling layered
642on top of it.
643
644
645
646<!-- TODO: explicitly mention that disjunction is not a binary operation
647but a definition of a single value?-->
648
649
650### Disjunction
651
652The _disjunction_ of values `a` and `b`
653is defined as the least upper bound of `a` and `b`.
654(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
655and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
656it holds that `d ⊑ e`.)
657This style of disjunctions is sometimes also referred to as sum types.
658Since CUE values form a lattice, the disjunction of two CUE values is always unique.
659
660
661These all follow from the definition of disjunction:
662- The disjunction of `a` with itself is always `a`.
663- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
664- The disjunction of a value `a` with bottom is always `a`.
665- The disjunction of two bottom values is bottom.
666
667Disjunction in CUE is a [binary expression](#operands), written `a | b`.
668It is commutative, associative, and idempotent.
669
670The unification of a disjunction with another value is equal to the disjunction
671composed of the unification of this value with all of the original elements
672of the disjunction.
673In other words, unification distributes over disjunction.
674
675```
676(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
677```
678
679```
680Expression Result
681({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
682(int | string) & "foo" "foo"
683("a" | "b") & "c" _|_
684```
685
686A disjunction is _normalized_ if there is no element
687`a` for which there is an element `b` such that `a ⊑ b`.
688
689<!--
690Normalization is important, as we need to account for spurious elements
691For instance "tcp" | "tcp" should resolve to "tcp".
692
693Also consider
694
695 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
696
697in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
698this expression is logically equivalent to {a:1} and should therefore be
699considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
700
701For instance, in
702
703 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
704 y: x.a // 1
705
706y should resolve to 1, and not an error.
707
708For comparison, in
709
710 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
711 y: x.a // _|_
712
713y should be an error as x is still ambiguous before the selector is applied,
714even though `a` resolves to 1 in all cases.
715-->
716
717
718#### Default values
719
720Any value `v` _may_ be associated with a default value `d`,
721where `d` must be in instance of `v` (`d ⊑ v`).
722
723Default values are introduced by means of disjunctions.
724Any element of a disjunction can be _marked_ as a default
725by prefixing it with an asterisk `*` ([a unary expression](#operators)).
726Syntactically consecutive disjunctions are considered to be
727part of a single disjunction,
728whereby multiple disjuncts can be marked as default.
729A _marked disjunction_ is one where any of its terms are marked.
730So `a | b | *c | d` is a single marked disjunction of four terms,
731whereas `a | (b | *c | d)` is an unmarked disjunction of two terms,
732one of which is a marked disjunction of three terms.
733During unification, if all the marked disjuncts of a marked disjunction are
734eliminated, then the remaining unmarked disjuncts are considered as if they
735originated from an unmarked disjunction
736<!-- TODO: this formulation should be worked out more. -->
737As explained below, distinguishing the nesting of disjunctions like this
738is only relevant when both an outer and nested disjunction are marked.
739
740Intuitively, when an expression needs to be resolved for an operation other
741than unification or disjunction,
742non-starred elements are dropped in favor of starred ones if the starred ones
743do not resolve to bottom.
744
745To define the unification and disjunction operation we use the notation
746`⟨v⟩` to denote a CUE value `v` that is not associated with a default
747and the notation `⟨v, d⟩` to denote a value `v` associated with a default
748value `d`.
749
750The rewrite rules for unifying such values are as follows:
751```
752U0: ⟨v1⟩ & ⟨v2⟩ => ⟨v1&v2⟩
753U1: ⟨v1, d1⟩ & ⟨v2⟩ => ⟨v1&v2, d1&v2⟩
754U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩
755```
756
757The rewrite rules for disjoining terms of unmarked disjunctions are
758```
759D0: ⟨v1⟩ | ⟨v2⟩ => ⟨v1|v2⟩
760D1: ⟨v1, d1⟩ | ⟨v2⟩ => ⟨v1|v2, d1⟩
761D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩
762```
763
764Terms of marked disjunctions are first rewritten according to the following
765rules:
766```
767M0: ⟨v⟩ => ⟨v⟩ don't introduce defaults for unmarked term
768M1: *⟨v⟩ => ⟨v, v⟩ introduce identical default for marked term
769M2: *⟨v, d⟩ => ⟨v, d⟩ keep existing defaults for marked term
770M3: ⟨v, d⟩ => ⟨v⟩ strip existing defaults from unmarked term
771```
772
773Note that for any marked disjunction `a`,
774the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`.
775
776```
777Expression Value-default pair Rules applied
778*"tcp" | "udp" ⟨"tcp"|"udp", "tcp"⟩ M1, D1
779string | *"foo" ⟨string, "foo"⟩ M1, D1
780
781*1 | 2 | 3 ⟨1|2|3, 1⟩ M1, D1
782
783(*1|2|3) | (1|*2|3) ⟨1|2|3, 1|2⟩ M1, D1, D2
784(*1|2|3) | *(1|*2|3) ⟨1|2|3, 2⟩ M1, M2, M3, D1, D2
785(*1|2|3) | (1|*2|3)&2 ⟨1|2|3, 1|2⟩ M1, D1, U1, D2
786
787(*1|2) & (1|*2) ⟨1|2, _|_⟩ M1, D1, U2
788```
789
790<!-- TODO: define and consistently use the value-default pair syntax -->
791
792The rules of subsumption for defaults can be derived from the above definitions
793and are as follows.
794
795```
796⟨v2, d2⟩ ⊑ ⟨v1, d1⟩ if v2 ⊑ v1 and d2 ⊑ d1
797⟨v1, d1⟩ ⊑ ⟨v⟩ if v1 ⊑ v
798⟨v⟩ ⊑ ⟨v1, d1⟩ if v ⊑ d1
799```
800
801<!--
802For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
803
804The last one is so restrictive as v could still be made more specific by
805associating it with a default that is not subsumed by d1.
806
807Proof:
808 by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
809 where the most general value is (v, v).
810 Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
811 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
812 exactly defines the boundary of this subsumption.
813-->
814
815<!--
816(non-normalized entries could also be implicitly marked, allowing writing
817int | 1, instead of int | *1, but that can be done in a backwards
818compatible way later if really desirable, as long as we require that
819disjunction literals be normalized).
820-->
821
822```
823Expression Resolves to
824"tcp" | "udp" "tcp" | "udp"
825*"tcp" | "udp" "tcp"
826float | *1 1
827*string | 1.0 string
828(*1|2) + (2|*3) 4
829
830(*1|2|3) | (1|*2|3) 1|2
831(*1|2|3) & (1|*2|3) 1|2|3 // default is _|_
832
833(* >=5 | int) & (* <=5 | int) 5
834
835(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
836(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
837(*"tcp"|"udp") & "tcp" "tcp"
838(*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_
839
840(*true | false) & bool true
841(*true | false) & (true | false) true
842
843{a: 1} | {b: 1} {a: 1} | {b: 1}
844{a: 1} | *{b: 1} {b:1}
845*{a: 1} | *{b: 1} {a: 1} | {b: 1}
846({a: 1} | {b: 1}) & {a:1} {a:1} | {a: 1, b: 1}
847({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1}
848```
849
850
851### Bottom and errors
852
853Any evaluation error in CUE results in a bottom value, represented by
854the token `_|_`.
855Bottom is an instance of every other value.
856Any evaluation error is represented as bottom.
857
858Implementations may associate error strings with different instances of bottom;
859logically they all remain the same value.
860
861```
862bottom_lit = "_|_" .
863```
864
865
866### Top
867
868Top is represented by the underscore character `_`, lexically an identifier.
869Unifying any value `v` with top results in `v` itself.
870
871```
872Expr Result
873_ & 5 5
874_ & _ _
875_ & _|_ _|_
876_ | _|_ _
877```
878
879
880### Null
881
882The _null value_ is represented with the keyword `null`.
883It has only one parent, top, and one child, bottom.
884It is unordered with respect to any other value.
885
886```
887null_lit = "null" .
888```
889
890```
891null & 8 _|_
892null & _ null
893null & _|_ _|_
894```
895
896
897### Boolean values
898
899A _boolean type_ represents the set of Boolean truth values denoted by
900the keywords `true` and `false`.
901The predeclared boolean type is `bool`; it is a defined type and a separate
902element in the lattice.
903
904```
905bool_lit = "true" | "false" .
906```
907
908```
909bool & true true
910true & true true
911true & false _|_
912bool & (false|true) false | true
913bool & (true|false) true | false
914```
915
916
917### Numeric values
918
919The _integer type_ represents the set of all integral numbers.
920The _decimal floating-point type_ represents the set of all decimal floating-point
921numbers.
922They are two distinct types.
923Both are instances instances of a generic `number` type.
924
925<!--
926TODO: would be nice to make this a rendered diagram with Mermaid.
927
928 number
929 / \
930 int float
931-->
932
933The predeclared number, integer, and decimal floating-point types are
934`number`, `int` and `float`; they are defined types.
935<!--
936TODO: should we drop float? It is somewhat preciser and probably a good idea
937to have it in the programmatic API, but it may be confusing to have to deal
938with it in the language.
939-->
940
941A decimal floating-point literal always has type `float`;
942it is not an instance of `int` even if it is an integral number.
943
944Integer literals are always of type `int` and don't match type `float`.
945
946Numeric literals are exact values of arbitrary precision.
947If the operation permits it, numbers should be kept in arbitrary precision.
948
949Implementation restriction: although numeric values have arbitrary precision
950in the language, implementations may implement them using an internal
951representation with limited precision.
952That said, every implementation must:
953
954- Represent integer values with at least 256 bits.
955- Represent floating-point values with a mantissa of at least 256 bits and
956a signed binary exponent of at least 16 bits.
957- Give an error if unable to represent an integer value precisely.
958- Give an error if unable to represent a floating-point value due to overflow.
959- Round to the nearest representable value if unable to represent
960a floating-point value due to limits on precision.
961These requirements apply to the result of any expression except for builtin
962functions, for which an unusual loss of precision must be explicitly documented.
963
964
965### Strings
966
967The _string type_ represents the set of UTF-8 strings,
968not allowing surrogates.
969The predeclared string type is `string`; it is a defined type.
970
971The length of a string `s` (its size in bytes) can be discovered using
972the builtin function `len`.
973
974
975### Bytes
976
977The _bytes type_ represents the set of byte sequences.
978A byte sequence value is a (possibly empty) sequence of bytes.
979The number of bytes is called the length of the byte sequence
980and is never negative.
981The predeclared byte sequence type is `bytes`; it is a defined type.
982
983
984### Bounds
985
986A _bound_, syntactically a [unary expression](#operands), defines
987a logically infinite disjunction of concrete values represented as a single comparison.
988For example, `>= 2` represents the infinite disjunction `2|3|4|5|6|7|…`.
989
990For any [comparison operator](#comparison-operators) `op` except `==`,
991`op a` is the disjunction of every `x` such that `x op a`.
992
993
994```
9952 & >=2 & <=5 // 2, where 2 is either an int or float.
9962.5 & >=1 & <=5 // 2.5
9972 & >=1.0 & <3.0 // 2.0
9982 & >1 & <3.0 // 2.0
9992.5 & int & >1 & <5 // _|_
10002.5 & float & >1 & <5 // 2.5
1001int & 2 & >1.0 & <3.0 // _|_
10022.5 & >=(int & 1) & <5 // _|_
1003>=0 & <=7 & >=3 & <=10 // >=3 & <=7
1004!=null & 1 // 1
1005>=5 & <=5 // 5
1006```
1007
1008
1009### Structs
1010
1011A _struct_ is a set of elements called _fields_, each of
1012which has a name, called a _label_, and value.
1013
1014We say a label is _defined_ for a struct if the struct has a field with the
1015corresponding label.
1016The value for a label `f` of struct `a` is denoted `a.f`.
1017A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
1018defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
1019Note that if `a` is an instance of `b` it may have fields with labels that
1020are not defined for `b`.
1021
1022The (unique) struct with no fields, written `{}`, has every struct as an
1023instance. It can be considered the type of all structs.
1024
1025```
1026{a: 1} ⊑ {}
1027{a: 1, b: 1} ⊑ {a: 1}
1028{a: 1} ⊑ {a: int}
1029{a: 1, b: 1.0} ⊑ {a: int, b: number}
1030
1031{} ⋢ {a: 1}
1032{a: 2} ⋢ {a: 1}
1033{a: 1} ⋢ {b: 1}
1034```
1035
1036The successful unification of structs `a` and `b` is a new struct `c` which
1037has all fields of both `a` and `b`, where
1038the value of a field `f` in `c` is `a.f & b.f` if `f` is defined in both `a` and `b`,
1039or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
1040Any [references](#references) to `a` or `b`
1041in their respective field values need to be replaced with references to `c`.
1042The result of a unification is bottom (`_|_`) if any of its defined
1043fields evaluates to bottom, recursively.
1044
1045A struct literal may contain multiple fields with the same label,
1046the result of which is the unification of all those fields.
1047
1048```
1049StructLit = "{" { Declaration "," } "}" .
1050Declaration = Field | Ellipsis | Embedding | LetClause | attribute .
1051Ellipsis = "..." [ Expression ] .
1052Embedding = Comprehension | AliasExpr .
1053Field = Label ":" { Label ":" } AliasExpr { attribute } .
1054Label = [ identifier "=" ] LabelExpr .
1055LabelExpr = LabelName [ "?" | "!" ] | "[" AliasExpr "]" .
1056LabelName = identifier | simple_string_lit | "(" AliasExpr ")" .
1057
1058attribute = "@" identifier "(" attr_tokens ")" .
1059attr_tokens = { attr_token |
1060 "(" attr_tokens ")" |
1061 "[" attr_tokens "]" |
1062 "{" attr_tokens "}" } .
1063attr_token = /* any token except '(', ')', '[', ']', '{', or '}' */
1064```
1065
1066```
1067Expression Result
1068{a: int, a: 1} {a: 1}
1069{a: int} & {a: 1} {a: 1}
1070{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
1071{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
1072
1073{a: 1} & {b: 2} {a: 1, b: 2}
1074{a: 1, b: int} & {b: 2} {a: 1, b: 2}
1075
1076{a: 1} & {a: 2} _|_
1077```
1078
1079
1080#### Field constraints
1081
1082A struct may declare _field constraints_ which define values
1083that should be unified with a given field once it is defined.
1084The existence of a field constraint declares, but does not define, that field.
1085
1086Syntactically, a field is marked as a constraint
1087by following its label with an _optional_ marker `?`
1088or _required_ marker `!`.
1089These markers are not part of the field name.
1090
1091A struct that has a required field constraint with a bottom value
1092evaluates to bottom.
1093An optional field constraint with a bottom value does _not_ invalidate
1094the struct that contains it
1095as long as it is not unified with a defined field.
1096
1097The subsumption relation for fields with the various markers is defined as
1098```
1099{a?: x} ⊑ {a!: x} ⊑ {a: x}
1100```
1101for any given `x`.
1102
1103Implementations may error upon encountering a required field constraint
1104when manifesting CUE as data.
1105
1106```
1107Expression Result
1108{foo?: 3} & {foo: 3} {foo: 3}
1109{foo!: 3} & {foo: 3} {foo: 3}
1110
1111{foo!: int} & {foo: int} {foo: int}
1112{foo!: int} & {foo?: <1} {foo!: <1}
1113{foo!: int} & {foo: <=3} {foo: <=3}
1114{foo!: int} & {foo: 3} {foo: 3}
1115
1116{foo!: 3} & {foo: int} {foo: 3}
1117{foo!: 3} & {foo: <=4} {foo: 3}
1118
1119{foo?: 1} & {foo?: 2} {foo?: _|_} // No error
1120{foo?: 1} & {foo!: 2} _|_
1121{foo?: 1} & {foo: 2} _|_
1122```
1123
1124<!-- see https://github.com/cue-lang/proposal/blob/main/designs/1951-required-fields-v2.md -->
1125
1126<!--NOTE: About bottom values for optional fields being okay.
1127
1128The proposition ¬P is a close cousin of P → ⊥ and is often used
1129as an approximation to avoid the issues of using not.
1130Bottom (⊥) is also frequently used to mean undefined. This makes sense.
1131Consider `{a?: 2} & {a?: 3}`.
1132Both structs say `a` is optional; in other words, it may be omitted.
1133So we can still get a valid result by omitting `a`, even in
1134case of a conflict.
1135
1136Granted, this definition may lead to confusing results, especially in
1137definitions, when tightening an optional field leads to unintentionally
1138discarding it.
1139It could be a role of vet checkers to identify such cases (and suggest users
1140to explicitly use `_|_` to discard a field, for instance).
1141
1142TODO: These examples show also how field constraints interact with defaults.
1143Should we included this? Probably not necessary, as this is an orthogonal
1144concern.
1145```
1146Expression Result
1147a: { foo?: string } a: { foo?: string }
1148b: { foo: "bar" } b: { foo: "bar" }
1149c: { foo?: *"baz" | string } c: { foo?: *"baz" | string }
1150
1151d: a & b { foo: "bar" }
1152e: b & c { foo: "bar" }
1153f: a & c { foo?: *"baz" | string }
1154g: a & { foo?: number } { foo?: _|_ } // This is fine
1155h: b & { foo?: number } _|_
1156i: c & { foo: string } { foo: *"baz" | string }
1157```
1158-->
1159
1160
1161#### Dynamic fields
1162
1163A _dynamic field_ is a field whose label is determined by
1164an expression wrapped in parentheses.
1165A dynamic field may be marked as optional or required.
1166
1167```
1168Expression Result
1169a: "foo" a: "foo"
1170b: "bar" b: "bar"
1171(a): "baz" foo: "baz"
1172
1173(a+b): "qux" foobar: "qux"
1174
1175(a)?: string foo?: string
1176(b)!: string bar!: string
1177```
1178
1179
1180#### Pattern and default constraints
1181
1182A struct may define constraints that apply to a collection of fields.
1183
1184A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
1185is a value of type string, and a value to unify with fields whose label
1186unifies with the pattern.
1187For a given struct `a` with pattern constraint `[p]: v`, `v` is unified
1188with any field with name `f` in `a` for which `p & f` is not bottom.
1189When unifying struct `a` and `b`,
1190any pattern constraint declared in `a` and `b`
1191are also declared in the result of unification.
1192
1193<!-- TODO: Update grammar and support this.
1194A pattern constraints with a pattern preceded by `...` indicates
1195the pattern can only matches fields in `b` for which there
1196exists no field in `a` with the same label.
1197-->
1198
1199Additionally, a _default constraint_, denoted `...value`, defines a value
1200to unify with any field for which there is no other declaration in a struct.
1201When unifying structs `a` and `b`,
1202a default constraint `...v` declared in `a`
1203defines that the value `v` should unify with any field in the resulting struct `c`
1204whose label does not unify with any of the patterns of the pattern
1205constraints defined for `a` _and_ for which there exists no field declaration
1206in `a` with that label.
1207The token `...` is a shorthand for `..._`.
1208_Note_: default constraints of the form `..._` are not yet implemented.
1209
1210
1211```
1212a: {
1213 foo: string // foo is a string
1214 [=~"^i"]: int // all other fields starting with i are integers
1215 [=~"^b"]: bool // all other fields starting with b are booleans
1216 [>"c"]: string // all other fields lexically after c are strings
1217
1218 ...string // all other fields must be a string. Note: default constraints are not yet implemented.
1219}
1220
1221b: a & {
1222 i3: 3
1223 bar: true
1224 other: "a string"
1225}
1226```
1227
1228<!--
1229TODO: are these two equivalent? Rog says that maybe you'll be able to refer
1230to optional fields at some point, which will never make sense for patterns.
1231Marcel says this is already mentioned elsewhere.
1232
1233a: {
1234 ["foo"]: int
1235 foo?: int
1236}
1237-->
1238
1239Concrete field labels may be an identifier or string, the latter of which may be
1240interpolated.
1241Fields with identifier labels can be referred to within the scope they are
1242defined, string labels cannot.
1243References within such interpolated strings are resolved within
1244the scope of the struct in which the label sequence is
1245defined and can reference concrete labels lexically preceding
1246the label within a label sequence.
1247<!-- We allow this so that rewriting a CUE file to collapse or expand
1248field sequences has no impact on semantics.
1249-->
1250
1251<!--TODO: first implementation round will not yet have expression labels
1252
1253An ExpressionLabel sets a collection of optional fields to a field value.
1254By default it defines this value for all possible string labels.
1255An optional expression limits this to the set of optional fields which
1256labels match the expression.
1257-->
1258
1259
1260<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
1261
1262
1263<!-- NOTE:
1264A DefinitionDecl does not allow repeated labels. This is to avoid
1265any ambiguity or confusion about whether earlier path components
1266are to be interpreted as declarations or normal fields (they should
1267always be normal fields.)
1268-->
1269
1270<!--NOTE:
1271The syntax has been deliberately restricted to allow for the following
1272future extensions and relaxations:
1273 - Allow omitting a "?" in an expression label to indicate a concrete
1274 string value (but maybe we want to use () for that).
1275 - Make the "?" in expression label optional if expression labels
1276 are always optional.
1277 - Or allow eliding the "?" if the expression has no references and
1278 is obviously not concrete (such as `[string]`).
1279 - The expression of an expression label may also indicate a struct with
1280 integer or even number labels
1281 (beware of imprecise computation in the latter).
1282 e.g. `{ [int]: string }` is a map of integers to strings.
1283 - Allow for associative lists (`foo [@.field]: {field: string}`)
1284 - The `...` notation can be extended analogously to that of a ListList,
1285 by allowing it to follow with an expression for the remaining properties.
1286 In that case it is no longer a shorthand for `[string]: _`, but rather
1287 would define the value for any other value for which there is no field
1288 defined.
1289 Like the definition with List, this is somewhat odd, but it allows the
1290 encoding of JSON schema's and (non-structural) OpenAPI's
1291 additionalProperties and additionalItems.
1292-->
1293
1294```
1295intMap: [string]: int
1296intMap: {
1297 t1: 43
1298 t2: 2.4 // error: 2.4 is not an integer
1299}
1300
1301nameMap: [string]: {
1302 firstName: string
1303 nickName: *firstName | string
1304}
1305
1306nameMap: hank: firstName: "Hank"
1307```
1308
1309The optional field set defined by `nameMap` matches every field,
1310in this case just `hank`, and unifies the associated constraint
1311with the matched field, resulting in:
1312
1313```
1314nameMap: hank: {
1315 firstName: "Hank"
1316 nickName: "Hank"
1317}
1318```
1319
1320
1321#### Closed structs
1322
1323By default, structs are open to adding fields.
1324Instances of an open struct `p` may contain fields not defined in `p`.
1325This is makes it easy to add fields, but can lead to bugs:
1326
1327```
1328S: {
1329 field1: string
1330}
1331
1332S1: S & { field2: "foo" }
1333
1334// S1 is { field1: string, field2: "foo" }
1335
1336
1337A: {
1338 field1: string
1339 field2: string
1340}
1341
1342A1: A & {
1343 feild1: "foo" // "field1" was accidentally misspelled
1344}
1345
1346// A1 is
1347// { field1: string, field2: string, feild1: "foo" }
1348// not the intended
1349// { field1: "foo", field2: string }
1350```
1351
1352A _closed struct_ `c` is a struct whose instances may not declare any field
1353with a name that does not match the name of a field
1354or the pattern of a pattern constraint defined in `c`.
1355Hidden fields are excluded from this limitation.
1356A struct that is the result of unifying any struct with a [`...`](#structs)
1357declaration is defined for all regular fields.
1358Closing a struct is equivalent to adding `..._|_` to it.
1359
1360Syntactically, structs are closed explicitly with the `close` builtin or
1361implicitly and recursively by [definitions](#definitions-and-hidden-fields).
1362
1363
1364```
1365A: close({
1366 field1: string
1367 field2: string
1368})
1369
1370A1: A & {
1371 feild1: string
1372} // _|_ feild1 not defined for A
1373
1374A2: A & {
1375 for k,v in { feild1: string } {
1376 k: v
1377 }
1378} // _|_ feild1 not defined for A
1379
1380C: close({
1381 [_]: _
1382})
1383
1384C2: C & {
1385 for k,v in { thisIsFine: string } {
1386 "\(k)": v
1387 }
1388}
1389
1390D: close({
1391 // Values generated by comprehensions are treated as embeddings.
1392 for k,v in { x: string } {
1393 "\(k)": v
1394 }
1395})
1396```
1397
1398<!-- (jba) Somewhere it should be said that optional fields are only
1399 interesting inside closed structs. -->
1400
1401<!-- TODO: move embedding section to above the previous one -->
1402
1403#### Embedding
1404
1405A struct may contain an _embedded value_, an operand used as a declaration.
1406An embedded value of type struct is unified with the struct in which it is
1407embedded, but disregarding the restrictions imposed by closed structs.
1408So if an embedding resolves to a closed struct, the corresponding enclosing
1409struct will also be closed, but may have fields that are not allowed if
1410normal rules for closed structs were observed.
1411
1412If an embedded value is not of type struct, the struct may only have
1413definitions or hidden fields. Regular fields are not allowed in such case.
1414
1415The result of `{ A }` is `A` for any `A` (including definitions).
1416
1417Syntactically, embeddings may be any expression.
1418
1419```
1420S1: {
1421 a: 1
1422 b: 2
1423 {
1424 c: 3
1425 }
1426}
1427// S1 is { a: 1, b: 2, c: 3 }
1428
1429S2: close({
1430 a: 1
1431 b: 2
1432 {
1433 c: 3
1434 }
1435})
1436// same as close(S1)
1437
1438S3: {
1439 a: 1
1440 b: 2
1441 close({
1442 c: 3
1443 })
1444}
1445// same as S2
1446```
1447
1448
1449#### Definitions and hidden fields
1450
1451A field is a _definition_ if its identifier starts with `#` or `_#`.
1452A field is _hidden_ if its identifier starts with a `_`.
1453All other fields are _regular_.
1454
1455Definitions and hidden fields are not emitted when converting a CUE program
1456to data and are never required to be concrete.
1457
1458Referencing a definition will recursively [close](#closed-structs) it.
1459That is, a referenced definition will not unify with a struct
1460that would add a field anywhere within the definition that it does not
1461already define or explicitly allow with a pattern constraint or `...`.
1462[Embedding](#embedding) allows bypassing this check.
1463
1464If referencing a definition would always result in an error, implementations
1465may report this inconsistency at the point of its declaration.
1466
1467```
1468#MyStruct: {
1469 sub: field: string
1470}
1471
1472#MyStruct: {
1473 sub: enabled?: bool
1474}
1475
1476myValue: #MyStruct & {
1477 sub: feild: 2 // error, feild not defined in #MyStruct
1478 sub: enabled: true // okay
1479}
1480
1481#D: {
1482 #OneOf
1483
1484 c: int // adds this field.
1485}
1486
1487#OneOf: { a: int } | { b: int }
1488
1489
1490D1: #D & { a: 12, c: 22 } // { a: 12, c: 22 }
1491D2: #D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b`
1492```
1493
1494
1495```
1496#A: {a: int}
1497
1498B: {
1499 #A
1500 b: c: int
1501}
1502
1503x: B
1504x: d: 3 // not allowed, as closed by embedded #A
1505
1506y: B.b
1507y: d: 3 // allowed as nothing closes b
1508
1509#B: {
1510 #A
1511 b: c: int
1512}
1513
1514z: #B.b
1515z: d: 3 // not allowed, as referencing #B closes b
1516```
1517
1518
1519<!---
1520JSON fields are usual camelCase. Clashes can be avoided by adopting the
1521convention that definitions be TitleCase. Unexported definitions are still
1522subject to clashes, but those are likely easier to resolve because they are
1523package internal.
1524--->
1525
1526
1527#### Attributes
1528
1529Attributes allow associating meta information with values.
1530Their primary purpose is to define mappings between CUE and
1531other representations.
1532Attributes do not influence the evaluation of CUE.
1533
1534An attribute associates an identifier with a value, a balanced token sequence,
1535which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
1536The sequence may not contain interpolations.
1537
1538Fields, structs and packages can be associated with a set of attributes.
1539Attributes accumulate during unification, but implementations may remove
1540duplicates that have the same source string representation.
1541The interpretation of an attribute, including the handling of multiple
1542attributes for a given identifier, is up to the consumer of the attribute.
1543
1544Field attributes define additional information about a field,
1545such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
1546name of the field when mapping to a different language.
1547
1548
1549```
1550// Package attribute
1551@protobuf(proto3)
1552
1553myStruct1: {
1554 // Struct attribute:
1555 @jsonschema(id="https://example.org/mystruct1.json")
1556
1557 // Field attributes
1558 field: string @go(Field)
1559 attr: int @xml(,attr) @go(Attr)
1560}
1561
1562myStruct2: {
1563 field: string @go(Field)
1564 attr: int @xml(a1,attr) @go(Attr)
1565}
1566
1567Combined: myStruct1 & myStruct2
1568// field: string @go(Field)
1569// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
1570```
1571
1572
1573#### Aliases
1574
1575Aliases name values that can be referred to
1576within the [scope](#declarations-and-scopes) in which they are declared.
1577The name of an alias must be unique within its scope.
1578
1579```
1580AliasExpr = [ identifier "=" ] Expression .
1581```
1582
1583Aliases can appear in several positions:
1584
1585<!--- TODO: consider allowing this. It should be considered whether
1586having field aliases isn't already sufficient.
1587
1588As a declaration in a struct (`X=value`):
1589
1590- binds identifier `X` to a value embedded within the struct.
1591--->
1592
1593In front of a Label (`X=label: value`):
1594
1595- binds the identifier to the same value as `label` would be bound
1596 to if it were a valid identifier.
1597
1598In front of a dynamic field (`X=(label): value`):
1599
1600- binds the identifier to the same value as `label` if it were a valid
1601 static identifier.
1602
1603In front of a dynamic field expression (`(X=expr): value`):
1604
1605- binds the identifier to the concrete label resulting from evaluating `expr`.
1606
1607In front of a pattern constraint (`X=[expr]: value`):
1608
1609- binds the identifier to the same field as the matched by the pattern
1610 within the instance of the field value (`value`).
1611
1612In front of a pattern constraint expression (`[X=expr]: value`):
1613
1614- binds the identifier to the concrete label that matches `expr`
1615 within the instances of the field value (`value`).
1616
1617Before a value (`foo: X=x`)
1618
1619- binds the identifier to the value it precedes within the scope of that value.
1620
1621Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
1622
1623- binds the identifier to the list element it precedes within the scope of the
1624 list expression.
1625
1626<!-- TODO: explain the difference between aliases and definitions.
1627 Now that you have definitions, are aliases really necessary?
1628 Consider removing.
1629-->
1630
1631```
1632// A field alias
1633foo: X // 4
1634X="not an identifier": 4
1635
1636// A value alias
1637foo: X={x: X.a}
1638bar: foo & {a: 1} // {a: 1, x: 1}
1639
1640// A label alias
1641[Y=string]: { name: Y }
1642foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
1643```
1644
1645<!-- TODO: also allow aliases as lists -->
1646
1647
1648#### Let declarations
1649
1650_Let declarations_ bind an identifier to an expression.
1651The identifier is only visible within the [scope](#declarations-and-scopes)
1652in which it is declared.
1653The identifier must be unique within its scope.
1654
1655```
1656let x = expr
1657
1658a: x + 1
1659b: x + 2
1660```
1661
1662#### Shorthand notation for nested structs
1663
1664A field whose value is a struct with a single field may be written as
1665a colon-separated sequence of the two field names,
1666followed by a colon and the value of that single field.
1667
1668```
1669job: myTask: replicas: 2
1670```
1671expands to
1672```
1673job: {
1674 myTask: {
1675 replicas: 2
1676 }
1677}
1678```
1679
1680<!-- OPTIONAL FIELDS:
1681
1682The optional marker solves the issue of having to print large amounts of
1683boilerplate when dealing with large types with many optional or default
1684values (such as Kubernetes).
1685Writing such optional values in terms of *null | value is tedious,
1686unpleasant to read, and as it is not well defined what can be dropped or not,
1687all null values have to be emitted from the output, even if the user
1688doesn't override them.
1689Part of the issue is how null is defined. We could adopt a Typescript-like
1690approach of introducing "void" or "undefined" to mean "not defined and not
1691part of the output". But having all of null, undefined, and void can be
1692confusing. If these ever are introduced anyway, the ? operator could be
1693expressed along the lines of
1694 foo?: bar
1695being a shorthand for
1696 foo: void | bar
1697where void is the default if no other default is given.
1698
1699The current mechanical definition of "?" is straightforward, though, and
1700probably avoids the need for void, while solving a big issue.
1701
1702Caveats:
1703[1] this definition requires explicitly defined fields to be emitted, even
1704if they could be elided (for instance if the explicit value is the default
1705value defined an optional field). This is probably a good thing.
1706
1707[2] a default value may still need to be included in an output if it is not
1708the zero value for that field and it is not known if any outside system is
1709aware of defaults. For instance, which defaults are specified by the user
1710and which by the schema understood by the receiving system.
1711The use of "?" together with defaults should therefore be used carefully
1712in non-schema definitions.
1713Problematic cases should be easy to detect by a vet-like check, though.
1714
1715[3] It should be considered how this affects the trim command.
1716Should values implied by optional fields be allowed to be removed?
1717Probably not. This restriction is unlikely to limit the usefulness of trim,
1718though.
1719
1720[4] There should be an option to emit all concrete optional values.
1721```
1722-->
1723
1724### Lists
1725
1726A list literal defines a new value of type list.
1727A list may be open or closed.
1728An open list is indicated with a `...` at the end of an element list,
1729optionally followed by a value for the remaining elements.
1730
1731The length of a closed list is the number of elements it contains.
1732The length of an open list is the number of elements as a lower bound
1733and an unlimited number of elements as its upper bound.
1734
1735```
1736ListLit = "[" [ ElementList [ "," ] ] "]" .
1737ElementList = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] .
1738```
1739
1740Lists can be thought of as structs:
1741
1742```
1743List: *null | {
1744 Elem: _
1745 Tail: List
1746}
1747```
1748
1749For closed lists, `Tail` is `null` for the last element, for open lists it is
1750`*null | List`, defaulting to the shortest variant.
1751For instance, the open list [ 1, 2, ... ] can be represented as:
1752```
1753open: List & { Elem: 1, Tail: { Elem: 2 } }
1754```
1755and the closed version of this list, [ 1, 2 ], as
1756```
1757closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1758```
1759
1760Using this representation, the subsumption rule for lists can
1761be derived from those of structs.
1762Implementations are not required to implement lists as structs.
1763The `Elem` and `Tail` fields are not special and `len` will not work as
1764expected in these cases.
1765
1766
1767## Declarations and Scopes
1768
1769
1770### Blocks
1771
1772A _block_ is a possibly empty sequence of declarations.
1773The braces of a struct literal `{ ... }` form a block, but there are
1774others as well:
1775
1776- The _universe block_ encompasses all CUE source text.
1777- Each [package](#modules-instances-and-packages) has a _package block_
1778 containing all CUE source text in that package.
1779- Each file has a _file block_ containing all CUE source text in that file.
1780- Each `for` and `let` clause in a [comprehension](#comprehensions)
1781 is considered to be its own implicit block.
1782
1783Blocks nest and influence scoping.
1784
1785
1786### Declarations and scope
1787
1788A _declaration_ may bind an identifier to a field, alias, or package.
1789Every identifier in a program must be declared.
1790Other than for fields,
1791no identifier may be declared twice within the same block.
1792For fields, an identifier may be declared more than once within the same block,
1793resulting in a field with a value that is the result of unifying the values
1794of all fields with the same identifier.
1795String labels do not bind an identifier to the respective field.
1796
1797The _scope_ of a declared identifier is the extent of source text in which the
1798identifier denotes the specified field, alias, or package.
1799
1800CUE is lexically scoped using blocks:
1801
18021. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
18031. The scope of an identifier denoting a field
1804 declared at top level (outside any struct literal) is the package block.
18051. The scope of an identifier denoting an alias
1806 declared at top level (outside any struct literal) is the file block.
18071. The scope of a let identifier
1808 declared at top level (outside any struct literal) is the file block.
18091. The scope of the package name of an imported package is the file block of the
1810 file containing the import declaration.
18111. The scope of a field, alias or let identifier declared inside a struct
1812 literal is the innermost containing block.
1813
1814An identifier declared in a block may be redeclared in an inner block.
1815While the identifier of the inner declaration is in scope, it denotes the entity
1816declared by the inner declaration.
1817
1818The package clause is not a declaration;
1819the package name does not appear in any scope.
1820Its purpose is to identify the files belonging to the same package
1821and to specify the default name for import declarations.
1822
1823
1824### Predeclared identifiers
1825
1826CUE predefines a set of types and builtin functions.
1827For each of these there is a corresponding keyword which is the name
1828of the predefined identifier, prefixed with `__`.
1829
1830```
1831Functions
1832len close and or
1833
1834Types
1835null The null type and value
1836bool All boolean values
1837int All integral numbers
1838float All decimal floating-point numbers
1839string Any valid UTF-8 sequence
1840bytes Any valid byte sequence
1841
1842Derived Value
1843number int | float
1844uint >=0
1845uint8 >=0 & <=255
1846int8 >=-128 & <=127
1847uint16 >=0 & <=65535
1848int16 >=-32_768 & <=32_767
1849rune >=0 & <=0x10FFFF
1850uint32 >=0 & <=4_294_967_295
1851int32 >=-2_147_483_648 & <=2_147_483_647
1852uint64 >=0 & <=18_446_744_073_709_551_615
1853int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
1854uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
1855int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
1856 <=170_141_183_460_469_231_731_687_303_715_884_105_727
1857float32 >=-3.40282346638528859811704183484516925440e+38 &
1858 <=3.40282346638528859811704183484516925440e+38
1859float64 >=-1.797693134862315708145274237317043567981e+308 &
1860 <=1.797693134862315708145274237317043567981e+308
1861```
1862
1863
1864### Exported identifiers
1865
1866<!-- move to a more logical spot -->
1867
1868An identifier of a package may be exported to permit access to it
1869from another package.
1870All identifiers not starting with `_` (so all regular fields and definitions
1871starting with `#`) are exported.
1872Any identifier starting with `_` is not visible outside the package and resides
1873in a separate namespace than namesake identifiers of other packages.
1874
1875```
1876package mypackage
1877
1878foo: string // visible outside mypackage
1879"bar": string // visible outside mypackage
1880
1881#Foo: { // visible outside mypackage
1882 a: 1 // visible outside mypackage
1883 _b: 2 // not visible outside mypackage
1884
1885 #C: { // visible outside mypackage
1886 d: 4 // visible outside mypackage
1887 }
1888 _#E: foo // not visible outside mypackage
1889}
1890```
1891
1892
1893### Uniqueness of identifiers
1894
1895Given a set of identifiers, an identifier is called unique if it is different
1896from every other in the set, after applying normalization following
1897[Unicode Annex #31](https://unicode.org/reports/tr31/).
1898Two identifiers are different if they are spelled differently
1899or if they appear in different packages and are not exported.
1900Otherwise, they are the same.
1901
1902
1903### Field declarations
1904
1905A field associates the value of an expression to a label within a struct.
1906If this label is an identifier, it binds the field to that identifier,
1907so the field's value can be referenced by writing the identifier.
1908String labels are not bound to fields.
1909```
1910a: {
1911 b: 2
1912 "s": 3
1913
1914 c: b // 2
1915 d: s // _|_ unresolved identifier "s"
1916 e: a.s // 3
1917}
1918```
1919
1920If an expression may result in a value associated with a default value
1921as described in [default values](#default-values), the field binds to this
1922value-default pair.
1923
1924
1925<!-- TODO: disallow creating identifiers starting with __
1926...and reserve them for builtin values.
1927
1928The issue is with code generation. As no guarantee can be given that
1929a predeclared identifier is not overridden in one of the enclosing scopes,
1930code will have to handle detecting such cases and renaming them.
1931An alternative is to have the predeclared identifiers be aliases for namesake
1932equivalents starting with a double underscore (e.g. string -> __string),
1933allowing generated code (normal code would keep using `string`) to refer
1934to these directly.
1935-->
1936
1937
1938### Let declarations
1939
1940<!--
1941TODO: why are there two "Let declarations" sections?
1942-->
1943
1944Within a struct, a let clause binds an identifier to the given expression.
1945
1946Within the scope of the identifier, the identifier refers to the
1947_locally declared_ expression.
1948The expression is evaluated in the scope it was declared.
1949
1950
1951## Expressions
1952
1953An expression specifies the computation of a value by applying operators and
1954builtin functions to operands.
1955
1956Expressions that require concrete values are called _incomplete_ if any of
1957their operands are not concrete, but define a value that would be legal for
1958that expression.
1959Incomplete expressions may be left unevaluated until a concrete value is
1960requested at the application level.
1961
1962### Operands
1963
1964Operands denote the elementary values in an expression.
1965An operand may be a literal, a (possibly qualified) identifier denoting
1966a field, alias, or let declaration, or a parenthesized expression.
1967
1968```
1969Operand = Literal | OperandName | "(" Expression ")" .
1970Literal = BasicLit | ListLit | StructLit .
1971BasicLit = int_lit | float_lit | string_lit |
1972 null_lit | bool_lit | bottom_lit .
1973OperandName = identifier | QualifiedIdent .
1974```
1975
1976### Qualified identifiers
1977
1978A qualified identifier is an identifier qualified with a package name prefix.
1979
1980```
1981QualifiedIdent = PackageName "." identifier .
1982```
1983
1984A qualified identifier accesses an identifier in a different package,
1985which must be [imported](#import-declarations).
1986The identifier must be declared in the [package block](#blocks) of that package.
1987
1988```
1989math.Sin // denotes the Sin function in package math
1990```
1991
1992### References
1993
1994An identifier operand refers to a field and is called a reference.
1995The value of a reference is a copy of the expression associated with the field
1996that it is bound to,
1997with any references within that expression bound to the respective copies of
1998the fields they were originally bound to.
1999Implementations may use a different mechanism to evaluate as long as
2000these semantics are maintained.
2001
2002```
2003a: {
2004 place: string
2005 greeting: "Hello, \(place)!"
2006}
2007
2008b: a & { place: "world" }
2009c: a & { place: "you" }
2010
2011d: b.greeting // "Hello, world!"
2012e: c.greeting // "Hello, you!"
2013```
2014
2015
2016
2017### Primary expressions
2018
2019Primary expressions are the operands for unary and binary expressions.
2020
2021```
2022PrimaryExpr =
2023 Operand |
2024 PrimaryExpr Selector |
2025 PrimaryExpr Index |
2026 PrimaryExpr Arguments .
2027
2028Selector = "." (identifier | simple_string_lit) .
2029Index = "[" Expression "]" .
2030Argument = Expression .
2031Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
2032```
2033<!---
2034TODO:
2035 PrimaryExpr Query |
2036Query = "." Filters .
2037Filters = Filter { Filter } .
2038Filter = "[" [ "?" ] AliasExpr "]" .
2039
2040TODO: maybe reintroduce slices, as they are useful in queries, probably this
2041time with Python semantics.
2042 PrimaryExpr Slice |
2043Slice = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
2044
2045Argument = Expression | ( identifier ":" Expression ).
2046
2047// & expression type
2048// string_lit: same as label. Arguments is current node.
2049// If selector is applied to list, it performs the operation for each
2050// element.
2051
2052TODO: considering allowing decimal_lit for selectors.
2053--->
2054
2055```
2056x
20572
2058(s + ".txt")
2059f(3.1415, true)
2060m["foo"]
2061obj.color
2062f.p[i].x
2063```
2064
2065
2066### Selectors
2067
2068For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
2069the selector expression
2070
2071```
2072x.f
2073```
2074
2075denotes the element of a <!--list or -->struct `x` identified by `f`.
2076<!--For structs, -->
2077`f` must be an identifier or a string literal identifying
2078any definition or regular non-optional field.
2079The identifier `f` is called the field selector.
2080
2081<!--
2082Allowing strings to be used as field selectors obviates the need for
2083backquoted identifiers. Note that some standards use names for structs that
2084are not standard identifiers (such "Fn::Foo"). Note that indexing does not
2085allow access to identifiers.
2086-->
2087
2088<!--
2089For lists, `f` must be an integer and follows the same lookup rules as
2090for the index operation.
2091The type of the selector expression is the type of `f`.
2092-->
2093
2094If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
2095
2096<!--
2097TODO: consider allowing this and also for selectors. It needs to be considered
2098how defaults are carried forward in cases like:
2099
2100 x: { a: string | *"foo" } | *{ a: int | *4 }
2101 y: x.a & string
2102
2103What is y in this case?
2104 (x.a & string, _|_)
2105 (string|"foo", _|_)
2106 (string|"foo", "foo)
2107If the latter, then why?
2108
2109For a disjunction of the form `x1 | ... | xn`,
2110the selector is applied to each element `x1.f | ... | xn.f`.
2111-->
2112
2113Otherwise, if `x` is not a <!--list or -->struct,
2114or if `f` does not exist in `x`,
2115the result of the expression is bottom (an error).
2116In the latter case the expression is incomplete.
2117The operand of a selector may be associated with a default.
2118
2119```
2120T: {
2121 x: int
2122 y: 3
2123 "x-y": 4
2124}
2125
2126a: T.x // int
2127b: T.y // 3
2128c: T.z // _|_ // field 'z' not found in T
2129d: T."x-y" // 4
2130
2131e: {a: 1|*2} | *{a: 3|*4}
2132f: e.a // 4 (default value)
2133```
2134
2135<!--
2136```
2137(v, d).f => (v.f, d.f)
2138
2139e: {a: 1|*2} | *{a: 3|*4}
2140f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
2141
2142```
2143-->
2144
2145
2146### Index expressions
2147
2148A primary expression of the form
2149
2150```
2151a[x]
2152```
2153
2154denotes the element of a list or struct `a` indexed by `x`.
2155The value `x` is called the index or field name, respectively.
2156The following rules apply:
2157
2158If `a` is not a struct:
2159
2160- `a` is a list (which need not be complete)
2161- the index `x` unified with `int` must be concrete.
2162- the index `x` is in range if `0 <= x < len(a)`, where only the
2163 explicitly defined values of an open-ended list are considered,
2164 otherwise it is out of range
2165
2166The result of `a[x]` is
2167
2168for `a` of list type:
2169
2170- the list element at index `x`, if `x` is within range
2171- bottom (an error), otherwise
2172
2173
2174for `a` of struct type:
2175
2176- the index `x` unified with `string` must be concrete.
2177- the value of the regular and non-optional field named `x` of struct `a`,
2178 if this field exists
2179- bottom (an error), otherwise
2180
2181
2182```
2183a: [ 1, 2 ][1] // 2
2184b: [ 1, 2 ][2] // _|_
2185c: [ 1, 2, ...][2] // _|_
2186
2187// Defaults are selected for both operand and index:
2188x: [1, 2] | *[3, 4]
2189y: int | *1
2190z: x[y] // 4
2191```
2192
2193### Operators
2194
2195Operators combine operands into expressions.
2196
2197```
2198Expression = UnaryExpr | Expression binary_op Expression .
2199UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
2200
2201binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op .
2202rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
2203add_op = "+" | "-" .
2204mul_op = "*" | "/" .
2205unary_op = "+" | "-" | "!" | "*" | rel_op .
2206```
2207
2208Comparisons are discussed [elsewhere](#comparison-operators).
2209For any binary operators, the operand types must unify.
2210
2211<!-- TODO: durations
2212 unless the operation involves durations.
2213
2214Except for duration operations, if one operand is an untyped [literal] and the
2215other operand is not, the constant is [converted] to the type of the other
2216operand.
2217-->
2218
2219<!--
2220Operands of unary and binary expressions may be associated with a default using
2221the following:
2222
2223```
2224O1: op (v1, d1) => (op v1, op d1)
2225
2226O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
2227and because v => (v, v)
2228O3: v1 op (v2, d2) => (v1 op v2, v1 op d2)
2229O4: (v1, d1) op v2 => (v1 op v2, d1 op v2)
2230```
2231
2232```
2233Field Resulting Value-Default pair
2234a: *1|2 (1|2, 1)
2235b: -a (-a, -1)
2236
2237c: a + 2 (a+2, 3)
2238d: a + a (a+a, 2)
2239```
2240-->
2241
2242#### Operator precedence
2243
2244Unary operators have the highest precedence.
2245
2246There are eight precedence levels for binary operators.
2247Multiplication operators binds strongest, followed by
2248addition operators, comparison operators,
2249`&&` (logical AND), `||` (logical OR), `&` (unification),
2250and finally `|` (disjunction):
2251
2252```
2253Precedence Operator
2254 7 * /
2255 6 + -
2256 5 == != < <= > >= =~ !~
2257 4 &&
2258 3 ||
2259 2 &
2260 1 |
2261```
2262
2263Binary operators of the same precedence associate from left to right.
2264For instance, `x / y * z` is the same as `(x / y) * z`.
2265
2266```
2267+x
226823 + 3*x[i]
2269x <= f()
2270f() || g()
2271x == y+1 && y == z-1
22722 | int
2273{ a: 1 } & { b: 2 }
2274```
2275
2276#### Arithmetic operators
2277
2278Arithmetic operators apply to numeric values and yield a result of the same type
2279as the first operand. The four standard arithmetic operators
2280`(+, -, *, /)` apply to integer and decimal floating-point types;
2281`+` and `*` also apply to strings and bytes.
2282
2283```
2284+ sum integers, floats, strings, bytes
2285- difference integers, floats
2286* product integers, floats, strings, bytes
2287/ quotient integers, floats
2288```
2289
2290For any operator that accepts operands of type `float`, any operand may be
2291of type `int` or `float`, in which case the result will be `float`
2292if it cannot be represented as an `int` or if any of the operands are `float`,
2293or `int` otherwise.
2294So the result of `1 / 2` is `0.5` and is of type `float`.
2295
2296The result of division by zero is bottom (an error).
2297<!-- TODO: consider making it +/- Inf -->
2298Integer division is implemented through the builtin functions
2299`quo`, `rem`, `div`, and `mod`.
2300
2301The unary operators `+` and `-` are defined for numeric values as follows:
2302
2303```
2304+x is 0 + x
2305-x negation is 0 - x
2306```
2307
2308#### String operators
2309
2310Strings can be concatenated using the `+` operator:
2311```
2312s: "hi " + name + " and good bye"
2313```
2314String addition creates a new string by concatenating the operands.
2315
2316A string can be repeated by multiplying it:
2317
2318```
2319s: "etc. "*3 // "etc. etc. etc. "
2320```
2321
2322<!-- jba: Do these work for byte sequences? If not, why not? -->
2323
2324
2325##### Comparison operators
2326
2327Comparison operators compare two operands and yield an untyped boolean value.
2328
2329```
2330== equal
2331!= not equal
2332< less
2333<= less or equal
2334> greater
2335>= greater or equal
2336=~ matches regular expression
2337!~ does not match regular expression
2338```
2339
2340<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
2341
2342In any comparison, the types of the two operands must unify or one of the
2343operands must be null.
2344
2345The equality operators `==` and `!=` apply to operands that are comparable.
2346The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
2347The matching operators `=~` and `!~` apply to a string and a regular
2348expression operand.
2349These terms and the result of the comparisons are defined as follows:
2350
2351- Null is comparable with itself and any other type.
2352 Two null values are always equal, null is unequal with anything else.
2353- Boolean values are comparable.
2354 Two boolean values are equal if they are either both true or both false.
2355- Integer values are comparable and ordered, in the usual way.
2356- Floating-point values are comparable and ordered, as per the definitions
2357 for binary coded decimals in the IEEE-754-2008 standard.
2358- Floating point numbers may be compared with integers.
2359- String and bytes values are comparable and ordered lexically byte-wise.
2360- Struct are not comparable.
2361- Lists are not comparable.
2362- The regular expression syntax is the one accepted by RE2,
2363 described in https://github.com/google/re2/wiki/Syntax,
2364 except for `\C`.
2365- `s =~ r` is true if `s` matches the regular expression `r`.
2366- `s !~ r` is true if `s` does not match regular expression `r`.
2367
2368<!--- TODO: consider the following
2369- For regular expression, named capture groups are interpreted as CUE references
2370 that must unify with the strings matching this capture group.
2371--->
2372<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
2373<!-- Consider implementing Level 2 of Unicode regular expression. -->
2374
2375```
23763 < 4 // true
23773 < 4.0 // true
2378null == 2 // false
2379null != {} // true
2380{} == {} // _|_: structs are not comparable against structs
2381
2382"Wild cats" =~ "cat" // true
2383"Wild cats" !~ "dog" // true
2384
2385"foo" =~ "^[a-z]{3}$" // true
2386"foo" =~ "^[a-z]{4}$" // false
2387```
2388
2389<!-- jba
2390I think I know what `3 < a` should mean if
2391
2392 a: >=1 & <=5
2393
2394It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
2395
2396But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
2397-->
2398
2399#### Logical operators
2400
2401Logical operators apply to boolean values and yield a result of the same type
2402as the operands. The right operand is evaluated conditionally.
2403
2404```
2405&& conditional AND p && q is "if p then q else false"
2406|| conditional OR p || q is "if p then true else q"
2407! NOT !p is "not p"
2408```
2409
2410
2411<!--
2412### TODO TODO TODO
2413
24143.14 / 0.0 // illegal: division by zero
2415Illegal conversions always apply to CUE.
2416
2417Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
2418-->
2419
2420<!--- TODO(mpvl): conversions
2421### Conversions
2422Conversions are expressions of the form `T(x)` where `T` and `x` are
2423expressions.
2424The result is always an instance of `T`.
2425
2426```
2427Conversion = Expression "(" Expression [ "," ] ")" .
2428```
2429--->
2430<!---
2431
2432A literal value `x` can be converted to type T if `x` is representable by a
2433value of `T`.
2434
2435As a special case, an integer literal `x` can be converted to a string type
2436using the same rule as for non-constant x.
2437
2438Converting a literal yields a typed value as result.
2439
2440```
2441uint(iota) // iota value of type uint
2442float32(2.718281828) // 2.718281828 of type float32
2443complex128(1) // 1.0 + 0.0i of type complex128
2444float32(0.49999999) // 0.5 of type float32
2445float64(-1e-1000) // 0.0 of type float64
2446string('x') // "x" of type string
2447string(0x266c) // "♬" of type string
2448MyString("foo" + "bar") // "foobar" of type MyString
2449string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
2450(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
2451int(1.2) // illegal: 1.2 cannot be represented as an int
2452string(65.0) // illegal: 65.0 is not an integer constant
2453```
2454--->
2455<!---
2456
2457A conversion is always allowed if `x` is an instance of `T`.
2458
2459If `T` and `x` of different underlying type, a conversion is allowed if
2460`x` can be converted to a value `x'` of `T`'s type, and
2461`x'` is an instance of `T`.
2462A value `x` can be converted to the type of `T` in any of these cases:
2463
2464- `x` is a struct and is subsumed by `T`.
2465- `x` and `T` are both integer or floating points.
2466- `x` is an integer or a byte sequence and `T` is a string.
2467- `x` is a string and `T` is a byte sequence.
2468
2469Specific rules apply to conversions between numeric types, structs,
2470or to and from a string type. These conversions may change the representation
2471of `x`.
2472All other conversions only change the type but not the representation of x.
2473
2474
2475#### Conversions between numeric ranges
2476For the conversion of numeric values, the following rules apply:
2477
24781. Any integer value can be converted into any other integer value
2479 provided that it is within range.
24802. When converting a decimal floating-point number to an integer, the fraction
2481 is discarded (truncation towards zero). TODO: or disallow truncating?
2482
2483```
2484a: uint16(int(1000)) // uint16(1000)
2485b: uint8(1000) // _|_ // overflow
2486c: int(2.5) // 2 TODO: TBD
2487```
2488
2489
2490#### Conversions to and from a string type
2491
2492Converting a list of bytes to a string type yields a string whose successive
2493bytes are the elements of the slice.
2494Invalid UTF-8 is converted to `"\uFFFD"`.
2495
2496```
2497string('hell\xc3\xb8') // "hellø"
2498string(bytes([0x20])) // " "
2499```
2500
2501As string value is always convertible to a list of bytes.
2502
2503```
2504bytes("hellø") // 'hell\xc3\xb8'
2505bytes("") // ''
2506```
2507
2508#### Conversions between list types
2509
2510Conversions between list types are possible only if `T` strictly subsumes `x`
2511and the result will be the unification of `T` and `x`.
2512
2513If we introduce named types this would be different from IP & [10, ...]
2514
2515Consider removing this until it has a different meaning.
2516
2517```
2518IP: 4*[byte]
2519Private10: IP([10, ...]) // [10, byte, byte, byte]
2520```
2521
2522#### Conversions between struct types
2523
2524A conversion from `x` to `T`
2525is applied using the following rules:
2526
25271. `x` must be an instance of `T`,
25282. all fields defined for `x` that are not defined for `T` are removed from
2529 the result of the conversion, recursively.
2530
2531<!-- jba: I don't think you say anywhere that the matching fields are unified.
2532mpvl: they are not, x must be an instance of T, in which case x == T&x,
2533so unification would be unnecessary.
2534-->
2535<!--
2536```
2537T: {
2538 a: { b: 1..10 }
2539}
2540
2541x1: {
2542 a: { b: 8, c: 10 }
2543 d: 9
2544}
2545
2546c1: T(x1) // { a: { b: 8 } }
2547c2: T({}) // _|_ // missing field 'a' in '{}'
2548c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
2549```
2550-->
2551
2552### Calls
2553
2554Calls can be made to core library functions, called builtins.
2555Given an expression `f` of function type F,
2556```
2557f(a1, a2, … an)
2558```
2559calls `f` with arguments `a1, a2, … an`. Arguments must be expressions
2560of which the values are an instance of the parameter types of `F`
2561and are evaluated before the function is called.
2562
2563```
2564a: math.Atan2(x, y)
2565```
2566
2567In a function call, the function value and arguments are evaluated in the usual
2568order.
2569After they are evaluated, the parameters of the call are passed by value
2570to the function and the called function begins execution.
2571The return parameters
2572of the function are passed by value back to the calling function when the
2573function returns.
2574
2575
2576### Comprehensions
2577
2578Lists and fields can be constructed using comprehensions.
2579
2580Comprehensions define a clause sequence that consists of a sequence of
2581`for`, `if`, and `let` clauses, nesting from left to right.
2582The sequence must start with a `for` or `if` clause.
2583The `for` and `let` clauses each define a new scope in which new values are
2584bound to be available for the next clause.
2585
2586The `for` clause binds the defined identifiers, on each iteration, to the next
2587value of some iterable value in a new scope.
2588A `for` clause may bind one or two identifiers.
2589If there is one identifier, it binds it to the value of
2590a list element or struct field value.
2591If there are two identifiers, the first value will be the key or index,
2592if available, and the second will be the value.
2593
2594For lists, `for` iterates over all elements in the list after closing it.
2595For structs, `for` iterates over all non-optional regular fields.
2596
2597An `if` clause, or guard, specifies an expression that terminates the current
2598iteration if it evaluates to false.
2599
2600The `let` clause binds the result of an expression to the defined identifier
2601in a new scope.
2602
2603A current iteration is said to complete if the innermost block of the clause
2604sequence is reached.
2605Syntactically, the comprehension value is a struct.
2606A comprehension can generate non-struct values by embedding such values within
2607this struct.
2608
2609Within lists, the values yielded by a comprehension are inserted in the list
2610at the position of the comprehension.
2611Within structs, the values yielded by a comprehension are embedded within the
2612struct.
2613Both structs and lists may contain multiple comprehensions.
2614
2615```
2616Comprehension = Clauses StructLit .
2617
2618Clauses = StartClause { [ "," ] Clause } .
2619StartClause = ForClause | GuardClause .
2620Clause = StartClause | LetClause .
2621ForClause = "for" identifier [ "," identifier ] "in" Expression .
2622GuardClause = "if" Expression .
2623LetClause = "let" identifier "=" Expression .
2624```
2625
2626```
2627a: [1, 2, 3, 4]
2628b: [for x in a if x > 1 { x+1 }] // [3, 4, 5]
2629
2630c: {
2631 for x in a
2632 if x < 4
2633 let y = 1 {
2634 "\(x)": x + y
2635 }
2636}
2637d: { "1": 2, "2": 3, "3": 4 }
2638```
2639
2640
2641### String interpolation
2642
2643String interpolation allows constructing strings by replacing placeholder
2644expressions with their string representation.
2645String interpolation may be used in single- and double-quoted strings, as well
2646as their multiline equivalent.
2647
2648A placeholder consists of `\(` followed by an expression and `)`.
2649The expression is evaluated in the scope within which the string is defined.
2650
2651The result of the expression is substituted as follows:
2652- string: as is
2653- bool: the JSON representation of the bool
2654- number: a JSON representation of the number that preserves the
2655precision of the underlying binary coded decimal
2656- bytes: as if substituted within single quotes or
2657converted to valid UTF-8 replacing the
2658maximal subpart of ill-formed subsequences with a single
2659replacement character (W3C encoding standard) otherwise
2660- list: illegal
2661- struct: illegal
2662
2663
2664```
2665a: "World"
2666b: "Hello \( a )!" // Hello World!
2667```
2668
2669
2670## Builtin Functions
2671
2672Builtin functions are predeclared. They are called like any other function.
2673
2674
2675### `len`
2676
2677The builtin function `len` takes arguments of various types and returns
2678a result of type int.
2679
2680```
2681Argument type Result
2682
2683bytes length of byte sequence
2684list list length, smallest length for an open list
2685struct number of distinct data fields, excluding field constraints
2686```
2687<!-- TODO: consider not supporting len, but instead rely on more
2688precisely named builtin functions:
2689 - strings.RuneLen(x)
2690 - bytes.Len(x) // x may be a string
2691 - struct.NumFooFields(x)
2692 - list.Len(x)
2693-->
2694
2695```
2696Expression Result
2697len("Hellø") 6
2698len([1, 2, 3]) 3
2699len([1, 2, ...]) 2
2700```
2701
2702
2703### `close`
2704
2705The builtin function `close` converts a partially defined, or open, struct
2706to a fully defined, or closed, struct.
2707
2708
2709### `and`
2710
2711The builtin function `and` takes a list and returns the result of applying
2712the `&` operator to all elements in the list.
2713It returns top for the empty list.
2714
2715```
2716Expression: Result
2717and([a, b]) a & b
2718and([a]) a
2719and([]) _
2720```
2721
2722### `or`
2723
2724The builtin function `or` takes a list and returns the result of applying
2725the `|` operator to all elements in the list.
2726It returns bottom for the empty list.
2727
2728```
2729Expression: Result
2730or([a, b]) a | b
2731or([a]) a
2732or([]) _|_
2733```
2734
2735### `div`, `mod`, `quo` and `rem`
2736
2737For two integer values `x` and `y`,
2738the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)`
2739implement Euclidean division and
2740satisfy the following relationship:
2741
2742```
2743r = x - y*q with 0 <= r < |y|
2744```
2745where `|y|` denotes the absolute value of `y`.
2746
2747```
2748 x y div(x, y) mod(x, y)
2749 5 3 1 2
2750-5 3 -2 1
2751 5 -3 -1 2
2752-5 -3 2 1
2753```
2754
2755For two integer values `x` and `y`,
2756the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)`
2757implement truncated division and
2758satisfy the following relationship:
2759
2760```
2761x = q*y + r and |r| < |y|
2762```
2763
2764with `quo(x, y)` truncated towards zero.
2765
2766```
2767 x y quo(x, y) rem(x, y)
2768 5 3 1 2
2769-5 3 -1 -2
2770 5 -3 -1 2
2771-5 -3 1 -2
2772```
2773
2774A zero divisor in either case results in bottom (an error).
2775
2776
2777## Cycles
2778
2779Implementations are required to interpret or reject cycles encountered
2780during evaluation according to the rules in this section.
2781
2782
2783### Reference cycles
2784
2785A _reference cycle_ occurs if a field references itself, either directly or
2786indirectly.
2787
2788```
2789// x references itself
2790x: x
2791
2792// indirect cycles
2793b: c
2794c: d
2795d: b
2796```
2797
2798Implementations should treat these as `_`.
2799Two particular cases are discussed below.
2800
2801
2802#### Expressions that unify an atom with an expression
2803
2804An expression of the form `a & e`, where `a` is an atom
2805and `e` is an expression, always evaluates to `a` or bottom.
2806As it does not matter how we fail, we can assume the result to be `a`
2807and postpone validating `a == e` until after all references
2808in `e` have been resolved.
2809
2810```
2811// Config Evaluates to (requiring concrete values)
2812x: { x: {
2813 a: b + 100 a: _|_ // cycle detected
2814 b: a - 100 b: _|_ // cycle detected
2815} }
2816
2817y: x & { y: {
2818 a: 200 a: 200 // asserted that 200 == b + 100
2819 b: 100
2820} }
2821```
2822
2823
2824#### Field values
2825
2826A field value of the form `r & v`,
2827where `r` evaluates to a reference cycle and `v` is a concrete value,
2828evaluates to `v`.
2829Unification is idempotent and unifying a value with itself ad infinitum,
2830which is what the cycle represents, results in this value.
2831Implementations should detect cycles of this kind, ignore `r`,
2832and take `v` as the result of unification.
2833
2834<!-- Tomabechi's graph unification algorithm
2835can detect such cycles at near-zero cost. -->
2836
2837```
2838Configuration Evaluated
2839// c Cycles in nodes of type struct evaluate
2840// ↙︎ ↖ to the fixed point of unifying their
2841// a → b values ad infinitum.
2842
2843a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
2844b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
2845c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
2846
2847// resolve a b & {x:1}
2848// substitute b c & {y:2} & {x:1}
2849// substitute c a & {z:3} & {y:2} & {x:1}
2850// eliminate a (cycle) {z:3} & {y:2} & {x:1}
2851// simplify {x:1,y:2,z:3}
2852```
2853
2854This rule also applies to field values that are disjunctions of unification
2855operations of the above form.
2856
2857```
2858a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
2859b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
2860c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
2861
2862
2863// resolving a b&{x:1} | {y:1}
2864// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
2865// simplify c&{z:2}&{x:1} | {y:1}
2866// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
2867// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2868// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2869// expand {x:1,y:3,z:2} | {y:1}
2870```
2871
2872Note that all nodes that form a reference cycle to form a struct will evaluate
2873to the same value.
2874If a field value is a disjunction, any element that is part of a cycle will
2875evaluate to this value.
2876
2877
2878### Structural cycles
2879
2880A structural cycle is when a node references one of its ancestor nodes.
2881It is possible to construct a structural cycle by unifying two acyclic values:
2882```
2883// acyclic
2884y: {
2885 f: h: g
2886 g: _
2887}
2888// acyclic
2889x: {
2890 f: _
2891 g: f
2892}
2893// introduces structural cycle
2894z: x & y
2895```
2896Implementations should be able to detect such structural cycles dynamically.
2897
2898A structural cycle can result in infinite structure or evaluation loops.
2899```
2900// infinite structure
2901a: b: a
2902
2903// infinite evaluation
2904f: {
2905 n: int
2906 out: n + (f & {n: 1}).out
2907}
2908```
2909CUE must allow or disallow structural cycles under certain circumstances.
2910
2911If a node `a` references an ancestor node, we call it and any of its
2912field values `a.f` _cyclic_.
2913So if `a` is cyclic, all of its descendants are also regarded as cyclic.
2914A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
2915is valid if any of its conjuncts is not cyclic.
2916
2917```
2918// Disallowed: a list of infinite length with all elements being 1.
2919#List: {
2920 head: 1
2921 tail: #List
2922}
2923
2924// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2925a: {
2926 b: c
2927}
2928c: {
2929 d: a
2930}
2931
2932// #List defines a list of arbitrary length. Because the recursive reference
2933// is part of a disjunction, this does not result in a structural cycle.
2934#List: {
2935 head: _
2936 tail: null | #List
2937}
2938
2939// Usage of #List. The value of tail in the most deeply nested element will
2940// be `null`: as the value of the disjunct referring to list is the only
2941// conjunct, all conjuncts are cyclic and the value is invalid and so
2942// eliminated from the disjunction.
2943MyList: #List & { head: 1, tail: { head: 2 }}
2944```
2945
2946<!--
2947### Unused fields
2948
2949TODO: rules for detection of unused fields
2950
29511. Any alias value must be used
2952-->
2953
2954
2955## Modules, instances, and packages
2956
2957CUE configurations are constructed combining _instances_.
2958An instance, in turn, is constructed from one or more source files belonging
2959to the same _package_ that together declare the data representation.
2960Elements of this data representation may be exported and used
2961in other instances.
2962
2963### Source file organization
2964
2965Each source file consists of an optional package clause defining collection
2966of files to which it belongs,
2967followed by a possibly empty set of import declarations that declare
2968packages whose contents it wishes to use, followed by a possibly empty set of
2969declarations.
2970
2971Like with a struct, a source file may contain embeddings.
2972Unlike with a struct, the embedded expressions may be any value.
2973If the result of the unification of all embedded values is not a struct,
2974it will be output instead of its enclosing file when exporting CUE
2975to a data format
2976
2977```
2978SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
2979```
2980
2981```
2982"Hello \(#place)!"
2983
2984#place: "world"
2985
2986// Outputs "Hello world!"
2987```
2988
2989### Package clause
2990
2991A package clause is an optional clause that defines the package to which
2992a source file the file belongs.
2993
2994```
2995PackageClause = "package" PackageName .
2996PackageName = identifier .
2997```
2998
2999The PackageName must not be the blank identifier or a definition identifier.
3000
3001```
3002package math
3003```
3004
3005### Modules and instances
3006
3007A _module_ defines a tree of directories, rooted at the _module root_.
3008
3009All source files within a module with the same package name belong to the same
3010package.
3011<!-- jba: I can't make sense of the above sentence. -->
3012A module may define multiple packages.
3013
3014An _instance_ of a package is any subset of files belonging
3015to the same package.
3016<!-- jba: Are you saying that -->
3017<!-- if I have a package with files a, b and c, then there are 8 instances of -->
3018<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
3019<!-- purpose of that definition? -->
3020It is interpreted as the concatenation of these files.
3021
3022An implementation may impose conventions on the layout of package files
3023to determine which files of a package belongs to an instance.
3024For example, an instance may be defined as the subset of package files
3025belonging to a directory and all its ancestors.
3026<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
3027
3028
3029### Import declarations
3030
3031An import declaration states that the source file containing the declaration
3032depends on definitions of the _imported_ package
3033and enables access to exported identifiers of that package.
3034The import names an identifier (PackageName) to be used for access and an
3035ImportPath that specifies the package to be imported.
3036
3037```
3038ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
3039ImportSpec = [ PackageName ] ImportPath .
3040ImportLocation = { unicode_value } .
3041ImportPath = `"` ImportLocation [ ":" identifier ] `"` .
3042```
3043
3044The PackageName is used in qualified identifiers to access
3045exported identifiers of the package within the importing source file.
3046It is declared in the file block.
3047It defaults to the identifier specified in the package clause of the imported
3048package, which must match either the last path component of ImportLocation
3049or the identifier following it.
3050
3051<!--
3052Note: this deviates from the Go spec where there is no such restriction.
3053This restriction has the benefit of being to determine the identifiers
3054for packages from within the file itself. But for CUE it is has another benefit:
3055when using package hierarchies, one is more likely to want to include multiple
3056packages within the same directory structure. This mechanism allows
3057disambiguation in these cases.
3058-->
3059
3060The interpretation of the ImportPath is implementation-dependent but it is
3061typically either the path of a builtin package or a fully qualifying location
3062of a package within a source code repository.
3063
3064An ImportLocation must be a non-empty string using only characters belonging to
3065Unicode's L, M, N, P, and S general categories
3066(the Graphic characters without spaces)
3067and may not include the characters ``!"#$%&'()*,:;<=>?[\\]^`{|}``
3068or the Unicode replacement character U+FFFD.
3069
3070Assume we have package containing the package clause `package math`,
3071which exports function `Sin` at the path identified by `lib/math`.
3072This table illustrates how `Sin` is accessed in files
3073that import the package after the various types of import declaration.
3074
3075<!-- TODO: a better example than lib/math:math, where the suffix is a no-op -->
3076
3077```
3078Import declaration Local name of Sin
3079
3080import "lib/math" math.Sin
3081import "lib/math:math" math.Sin
3082import m "lib/math" m.Sin
3083```
3084
3085An import declaration declares a dependency relation between the importing and
3086imported package. It is illegal for a package to import itself, directly or
3087indirectly, or to directly import a package without referring to any of its
3088exported identifiers.
3089
3090
3091### An example package
3092
3093TODO
View as plain text