...

Text file src/go.starlark.net/doc/impl.md

Documentation: go.starlark.net/doc

     1
     2# Starlark in Go: Implementation
     3
     4This document (a work in progress) describes some of the design
     5choices of the Go implementation of Starlark.
     6
     7  * [Scanner](#scanner)
     8  * [Parser](#parser)
     9  * [Resolver](#resolver)
    10  * [Evaluator](#evaluator)
    11    * [Data types](#data-types)
    12    * [Freezing](#freezing)
    13  * [Testing](#testing)
    14
    15
    16## Scanner
    17
    18The scanner is derived from Russ Cox's
    19[buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier)
    20tool, which pretty-prints Bazel BUILD files.
    21
    22Most of the work happens in `(*scanner).nextToken`.
    23
    24## Parser
    25
    26The parser is hand-written recursive-descent parser. It uses the
    27technique of [precedence
    28climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing)
    29to reduce the number of productions.
    30
    31In some places the parser accepts a larger set of programs than are
    32strictly valid, leaving the task of rejecting them to the subsequent
    33resolver pass. For example, in the function call `f(a, b=c)` the
    34parser accepts any expression for `a` and `b`, even though `b` may
    35legally be only an identifier. For the parser to distinguish these
    36cases would require additional lookahead.
    37
    38## Resolver
    39
    40The resolver reports structural errors in the program, such as the use
    41of `break` and `continue` outside of a loop.
    42
    43Starlark has stricter syntactic limitations than Python. For example,
    44it does not permit `for` loops or `if` statements at top level, nor
    45does it permit global variables to be bound more than once.
    46These limitations come from the Bazel project's desire to make it easy
    47to identify the sole statement that defines each global, permitting
    48accurate cross-reference documentation.
    49
    50In addition, the resolver validates all variable names, classifying
    51them as references to universal, global, local, or free variables.
    52Local and free variables are mapped to a small integer, allowing the
    53evaluator to use an efficient (flat) representation for the
    54environment.
    55
    56Not all features of the Go implementation are "standard" (that is,
    57supported by Bazel's Java implementation), at least for now, so
    58non-standard features such as `set`
    59are flag-controlled.  The resolver reports
    60any uses of dialect features that have not been enabled.
    61
    62
    63## Evaluator
    64
    65### Data types
    66
    67<b>Integers:</b> Integers are representing using `big.Int`, an
    68arbitrary precision integer. This representation was chosen because,
    69for many applications, Starlark must be able to handle without loss
    70protocol buffer values containing signed and unsigned 64-bit integers,
    71which requires 65 bits of precision.
    72
    73Small integers (<256) are preallocated, but all other values require
    74memory allocation. Integer performance is relatively poor, but it
    75matters little for Bazel-like workloads which depend much
    76more on lists of strings than on integers. (Recall that a typical loop
    77over a list in Starlark does not materialize the loop index as an `int`.)
    78
    79An optimization worth trying would be to represent integers using
    80either an `int32` or `big.Int`, with the `big.Int` used only when
    81`int32` does not suffice. Using `int32`, not `int64`, for "small"
    82numbers would make it easier to detect overflow from operations like
    83`int32 * int32`, which would trigger the use of `big.Int`.
    84
    85<b>Floating point</b>:
    86Floating point numbers are represented using Go's `float64`.
    87Again, `float` support is required to support protocol buffers. The
    88existence of floating-point NaN and its infamous comparison behavior
    89(`NaN != NaN`) had many ramifications for the API, since we cannot
    90assume the result of an ordered comparison is either less than,
    91greater than, or equal: it may also fail.
    92
    93<b>Strings</b>:
    94
    95TODO: discuss UTF-8 and string.bytes method.
    96
    97<b>Dictionaries and sets</b>:
    98Starlark dictionaries have predictable iteration order.
    99Furthermore, many Starlark values are hashable in Starlark even though
   100the Go values that represent them are not hashable in Go: big
   101integers, for example.
   102Consequently, we cannot use Go maps to implement Starlark's dictionary.
   103
   104We use a simple hash table whose buckets are linked lists, each
   105element of which holds up to 8 key/value pairs. In a well-distributed
   106table the list should rarely exceed length 1. In addition, each
   107key/value item is part of doubly-linked list that maintains the
   108insertion order of the elements for iteration.
   109
   110<b>Struct:</b>
   111The `starlarkstruct` Go package provides a non-standard Starlark
   112extension data type, `struct`, that maps field identifiers to
   113arbitrary values. Fields are accessed using dot notation: `y = s.f`.
   114This data type is extensively used in Bazel, but its specification is
   115currently evolving.
   116
   117Starlark has no `class` mechanism, nor equivalent of Python's
   118`namedtuple`, though it is likely that future versions will support
   119some way to define a record data type of several fields, with a
   120representation more efficient than a hash table.
   121
   122
   123### Freezing
   124
   125All mutable values created during module initialization are _frozen_
   126upon its completion. It is this property that permits a Starlark module
   127to be referenced by two Starlark threads running concurrently (such as
   128the initialization threads of two other modules) without the
   129possibility of a data race.
   130
   131The Go implementation supports freezing by storing an additional
   132"frozen" Boolean variable in each mutable object. Once this flag is set,
   133all subsequent attempts at mutation fail. Every value defines a
   134Freeze method that sets its own frozen flag if not already set, and
   135calls Freeze for each value that it contains.
   136For example, when a list is frozen, it freezes each of its elements;
   137when a dictionary is frozen, it freezes each of its keys and values;
   138and when a function value is frozen, it freezes each of the free
   139variables and parameter default values implicitly referenced by its closure.
   140Application-defined types must also follow this discipline.
   141
   142The freeze mechanism in the Go implementation is finer grained than in
   143the Java implementation: in effect, the latter has one "frozen" flag
   144per module, and every value holds a reference to the frozen flag of
   145its module. This makes setting the frozen flag more efficient---a
   146simple bit flip, no need to traverse the object graph---but coarser
   147grained. Also, it complicates the API slightly because to construct a
   148list, say, requires a reference to the frozen flag it should use.
   149
   150The Go implementation would also permit the freeze operation to be
   151exposed to the program, for example as a built-in function.
   152This has proven valuable in writing tests of the freeze mechanism
   153itself, but is otherwise mostly a curiosity.
   154
   155
   156### Fail-fast iterators
   157
   158In some languages (such as Go), a program may mutate a data structure
   159while iterating over it; for example, a range loop over a map may
   160delete map elements. In other languages (such as Java), iterators do
   161extra bookkeeping so that modification of the underlying collection
   162invalidates the iterator, and the next attempt to use it fails.
   163This often helps to detect subtle mistakes.
   164
   165Starlark takes this a step further. Instead of mutation of the
   166collection invalidating the iterator, the act of iterating makes the
   167collection temporarily immutable, so that an attempt to, say, delete a
   168dict element while looping over the dict, will fail. The error is
   169reported against the delete operation, not the iteration.
   170
   171This is implemented by having each mutable iterable value record a
   172counter of active iterators. Starting a loop increments this counter,
   173and completing a loop decrements it. A collection with a nonzero
   174counter behaves as if frozen. If the collection is actually frozen,
   175the counter bookkeeping is unnecessary. (Consequently, iterator
   176bookkeeping is needed only while objects are still mutable, before
   177they can have been published to another thread, and thus no
   178synchronization is necessary.)
   179
   180A consequence of this design is that in the Go API, it is imperative
   181to call `Done` on each iterator once it is no longer needed.
   182
   183```
   184TODO
   185starlark.Value interface and subinterfaces
   186argument passing to builtins: UnpackArgs, UnpackPositionalArgs.
   187```
   188
   189<b>Evaluation strategy:</b>
   190The evaluator uses a simple recursive tree walk, returning a value or
   191an error for each expression. We have experimented with just-in-time
   192compilation of syntax trees to bytecode, but two limitations in the
   193current Go compiler prevent this strategy from outperforming the
   194tree-walking evaluator.
   195
   196First, the Go compiler does not generate a "computed goto" for a
   197switch statement ([Go issue
   1985496](https://github.com/golang/go/issues/5496)). A bytecode
   199interpreter's main loop is a for-loop around a switch statement with
   200dozens or hundreds of cases, and the speed with which each case can be
   201dispatched strongly affects overall performance.
   202Currently, a switch statement generates a binary tree of ordered
   203comparisons, requiring several branches instead of one.
   204
   205Second, the Go compiler's escape analysis assumes that the underlying
   206array from a `make([]Value, n)` allocation always escapes
   207([Go issue 20533](https://github.com/golang/go/issues/20533)).
   208Because the bytecode interpreter's operand stack has a non-constant
   209length, it must be allocated with `make`. The resulting allocation
   210adds to the cost of each Starlark function call; this can be tolerated
   211by amortizing one very large stack allocation across many calls.
   212More problematic appears to be the cost of the additional GC write
   213barriers incurred by every VM operation: every intermediate result is
   214saved to the VM's operand stack, which is on the heap.
   215By contrast, intermediate results in the tree-walking evaluator are
   216never stored to the heap.
   217
   218```
   219TODO
   220frames, backtraces, errors.
   221threads
   222Print
   223Load
   224```
   225
   226## Testing
   227
   228```
   229TODO
   230starlarktest package
   231`assert` module
   232starlarkstruct
   233integration with Go testing.T
   234```
   235
   236
   237## TODO
   238
   239
   240```
   241Discuss practical separation of code and data.
   242```

View as plain text