...

Text file src/github.com/bazelbuild/bazel-gazelle/Design.rst

Documentation: github.com/bazelbuild/bazel-gazelle

     1Architecture of Gazelle
     2=======================
     3
     4.. All external links are here.
     5
     6.. Godoc links
     7.. _buildifier build: https://godoc.org/github.com/bazelbuild/buildtools/build
     8.. _config: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/config
     9.. _go/build: https://godoc.org/go/build
    10.. _go/parser: https://godoc.org/go/parser
    11.. _merger: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/merger
    12.. _packages: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/packages
    13.. _resolve: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/resolve
    14.. _rules: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/rules
    15.. _CallExpr: https://godoc.org/github.com/bazelbuild/buildtools/build#CallExpr
    16.. _golang.org/x/tools/go/vcs: https://godoc.org/golang.org/x/tools/go/vcs
    17
    18.. Other documentation links
    19.. _buildifier: https://github.com/bazelbuild/buildtools/tree/master/buildifier
    20.. _config_setting: https://docs.bazel.build/versions/master/be/general.html#config_setting
    21.. _Fix command transformations: README.rst#fix-command-transformations
    22.. _full list of directives: README.rst#Directives
    23.. _select: https://docs.bazel.build/versions/master/skylark/lib/globals.html#select
    24
    25.. Issues
    26.. _#5: https://github.com/bazelbuild/bazel-gazelle/issues/5
    27.. _#7: https://github.com/bazelbuild/bazel-gazelle/issues/7
    28
    29.. Actual content is below
    30
    31Gazelle is a tool that generates and updates Bazel build files for Go projects
    32that follow the conventional "go build" project layout. It is intended to
    33simplify the maintenance of Bazel Go projects as much as possible.
    34
    35This document describes how Gazelle works. It should help users understand why
    36Gazelle behaves as it does, and it should help developers understand
    37how to modify Gazelle and how to write similar tools.
    38
    39.. contents::
    40
    41Overview
    42--------
    43
    44Gazelle generates and updates build files according the algorithm outlined
    45below. Each of the steps here is described in more detail in the sections below.
    46
    47* Build a configuration from command line arguments and special comments
    48  in the top-level build file. See Configuration_.
    49
    50* For each directory in the repository:
    51
    52  * Read the build file if one is present.
    53
    54  * If the build file should be updated (based on configuration):
    55
    56    * Apply transformations to the build file to migrate away from deprecated
    57      APIs. See `Fixing build files`_.
    58
    59    * Scan the source files and collect metadata needed to generate rules
    60      for the directory. See `Scanning source files`_.
    61
    62    * Generate new rules from the build metadata collected earlier. See
    63      `Generating rules`_.
    64
    65    * Merge the new rules into the directory's build file. Delete any rules
    66      which are now empty. See `Merging and deleting rules`_.
    67
    68  * Add the library rules in the directory's build file to a global table,
    69    indexed by import path.
    70
    71* For each updated build file:
    72
    73  * Use the library table to map import paths to Bazel labels for rules that 
    74    were added or merged earlier. See `Resolving dependencies`_.
    75
    76  * Merge the resolved rules back into the file.
    77
    78  * Format the file using buildifier_ and emit it according to the output mode:
    79    write to disk, print the whole file, or print the diff.
    80
    81Configuration
    82-------------
    83
    84Godoc: config_
    85
    86Gazelle stores configuration information in ``Config`` objects. These objects
    87contain settings that affect the behavior of most packages in the program.
    88For example:
    89
    90* The list of directories that Gazelle should update.
    91* The path of the repository root directory. Bazel package names are based
    92  on paths relative to this location.
    93* The current import path prefix and the directory where it was set.
    94  Gazelle uses this to infer import paths for ``go_library`` rules.
    95* A list of build tags that Gazelle considers to be true on all platforms.
    96
    97``Config`` objects apply to individual directories. Each directory inherits
    98the ``Config`` from its parent. Values in a ``Config`` may be modified within
    99a directory using *directives* written in the directory's build file. A
   100directive is a special comment formatted like this:
   101
   102::
   103
   104  # gazelle:key value
   105
   106Here are a few examples. See the `full list of directives`_.
   107
   108* ``# gazelle:prefix`` - sets the Go import path prefix for the current
   109  directory.
   110* ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers
   111  to be true on all platforms.
   112
   113There are a few directives which are not applied to the ``Config`` object but
   114are interpreted directly in packages where they are relevant.
   115
   116* ``# gazelle:ignore`` - the build file should not be updated by Gazelle.
   117  Gazelle may still index its contents so it can resolve dependencies in other
   118  build files.
   119* ``# gazelle:exclude path/to/file`` - the named file should not be read by
   120  Gazelle and should not be included in ``srcs`` lists. If this refers to
   121  a directory, Gazelle won't recurse into the directory. This directive may
   122  appear multiple times.
   123
   124Fixing build files
   125------------------
   126
   127Godoc: merger_
   128
   129From time to time, APIs in rules_go are changed or updated. Gazelle helps
   130users stay up to date with these changes by automatically fixing deprecated
   131usage.
   132
   133Minor fixes are applied by Gazelle automatically every time it runs. However,
   134some fixes may delete or rename existing rules. Users must run ``gazelle fix``
   135to apply these fixes. By default, Gazelle will only *warn* users that
   136``gazelle fix`` should be run.
   137
   138Here are a few of the fixes Gazelle performs. See `Fix command transformations`_
   139for a full list.
   140
   141* **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and
   142  merge their attributes into ``go_library`` rules that reference them.
   143  This is a major fix and is only applied with ``gazelle fix``.
   144* **Migrate library attributes:** Gazelle replaces ``library`` attributes
   145  with ``embed`` attributes. The only difference between these is that
   146  ``library`` (which is now deprecated) accepts a single label, while ``embed``
   147  accepts a list. This is a minor fix and is always applied.
   148
   149Users can prevent Gazelle from modifying rules, attributes, or individual
   150values by writing ``# keep`` comments above them.
   151
   152Scanning source files
   153---------------------
   154
   155Godoc: packages_
   156
   157Nearly all of the information needed to build a program with the standard Go SDK
   158is implied by directory structure, file names, and file contents. This is why
   159``go build`` doesn't require any sort of build file. The `go/build`_ package in
   160the standard library collects this information.
   161
   162Unfortunately, `go/build`_ can only collect information for one platform at
   163a time. Gazelle needs to generate build files that work on all platforms, so
   164we have our own implementation of this logic.
   165
   166Information extracted from files
   167~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   168
   169Gazelle extracts build metadata from source files and contents in much the
   170same way that the standard `go/build`_ package does. It gets the following
   171information from file names:
   172
   173* File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files
   174  are included in Go rules. If any cgo code is present, then C/C++ files are
   175  also included. .proto files are also used to build proto rules. Other files
   176  (e.g., .txt) are ignored.
   177* Test suffix. For example, if a file is named ``foo_test.go``, it will be
   178  included in a test target instead of a library or binary target.
   179* OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go``
   180  will be listed in the ``linux_amd64`` section of the target it belongs to.
   181
   182Gazelle gets the following information from file contents:
   183
   184* Package name. This is syntactically the first part of every .go file. All
   185  files in the same directory must have the same package name (except for
   186  external test sources, which have a package name ending with ``_test``). If
   187  there are multiple packages, Gazelle will choose one that matches the
   188  directory name (if present) or report an error.
   189* Imported libraries. Go import paths are usually URLs. Imports in
   190  platform-specific source files are also platform-specific.
   191* Build tags. The Go toolchain recognizes comments beginning with ``// +build``
   192  before the package declaration. These tags tell the build system that a file
   193  should only be built for specific platforms. See `this article 
   194  <https://dave.cheney.net/2013/10/12/how-to-use-conditional-compilation-with-the-go-build-tool>`_
   195  for more information.
   196* Whether cgo code is present. This affects how packages are built and
   197  whether C/C++ files are included.
   198* C/C++ compile and link options (specified in ``#cgo`` directives in cgo
   199  comments). These may be platform-specific.
   200
   201In most cases, only the top of the file is parsed. For Go files, we use the
   202standard `go/parser`_ package. For proto files, we use regular expressions that
   203match ``package``, ``go_package``, and ``import`` statements.
   204
   205The ``Package`` object
   206~~~~~~~~~~~~~~~~~~~~~~
   207
   208Gazelle stores build metadata in a ``Package`` object. Currently, we only
   209support one ``Package`` per directory (which is also what the Go SDK supports),
   210but this will be expanded in the future. ``Package`` objects contain some
   211top-level metadata (like the package name and directory path), along with
   212several target objects (``GoTarget`` and ``ProtoTarget``).
   213
   214Target objects correspond directly to rules that will be generated later. They
   215store lists of sources, imports, and flags in ``PlatformStrings`` objects.
   216
   217``PlatformStrings`` objects store strings in four sections: a generic list, an
   218OS-specific dictionary, an architecture-specific dictionary, and an
   219OS-and-architecture-specific dictionary. The keys in the dictionaries are OS
   220names, architecture names, or OS-and-architecture pairs; the values are lists of
   221strings. The same string may not appear more than once in a list and may not
   222appear in more than one section. This is due to a Bazel requirement: the same
   223label may not appear more than once in a ``deps`` list.
   224
   225Generating rules
   226----------------
   227
   228Godoc: rules_
   229
   230Once build metadata has been extracted from the sources in a directory,
   231Gazelle generates rules for building those sources.
   232
   233Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the
   234`buildifier build`_ library. This is the same library used to parse and format
   235build files. This lets us manipulate newly generated rules and existing rules
   236with the same code.
   237
   238We may generate the following rules:
   239
   240* ``proto_library`` and ``go_proto_library`` are generated if there was at
   241  least one .proto source file.
   242* ``go_library`` is generated if there was at least one non-test source. This
   243  may embed the ``go_proto_library`` if there was one.
   244* ``go_test`` rules are generated for internal and external tests. Internal
   245  tests embed the ``go_library`` while external tests depend on the
   246  ``go_library`` as a separate package.
   247* ``go_binary`` is generated if the package name was ``main``. It embeds the
   248  ``go_library``.
   249
   250Rules are named according to a pluggable naming policy, but there is currently
   251only one policy: libraries are named ``go_default_library``, tests are
   252named ``go_default_test``, and binaries are named after the directory. The
   253``go_default_library`` name is an historical artifact from before we had
   254index-based dependency resolution. We'll need to move away from this naming
   255scheme in the future (`#5`_) before we support multiple packages (`#7`_).
   256
   257Sources, imports, and flags within each target are converted to expressions in a
   258straightforward fashion. The lists within ``PlatformStrings`` are converted to
   259list expressions. Dictionaries are converted to calls to `select`_ expressions
   260(when Bazel evaluates a `select`_ expression, it will choose one of several
   261provided lists, based on `config_setting`_ rules). Lists and select expressions
   262may be added together. For example:
   263
   264.. code:: bzl
   265
   266  go_library(
   267      name = "go_default_library",
   268      srcs = [
   269          "terminal.go",
   270      ] + select({
   271          "@io_bazel_rules_go//go/platform:darwin": [
   272              "util.go",
   273              "util_bsd.go",
   274          ],
   275          "@io_bazel_rules_go//go/platform:linux": [
   276              "util.go",
   277              "util_linux.go",
   278          ],
   279          "@io_bazel_rules_go//go/platform:windows": [
   280              "util_windows.go",
   281          ],
   282          "//conditions:default": [],
   283      }),
   284      ...
   285  )
   286
   287At this point, Gazelle does not have enough information to generate expressions
   288``deps`` attributes. We only have a list of import strings extracted from source
   289files. These imports are stored temporarily in a special ``_gazelle_imports``
   290attribute in each rule. Later, the imports are converted to Bazel labels (see
   291`Resolving dependencies`_), and this attribute is replaced with ``deps``.
   292
   293Merging and deleting rules
   294--------------------------
   295
   296Godoc: merger_
   297
   298Merging is the process of combining generated rules with the corresponding
   299rules in an existing build file. If no build file exists in a directory, a
   300new file is created with generated rules, and no merging is performed.
   301
   302Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an
   303interdependence with dependency resolution. Dependency resolution uses a table
   304of *merged* library rules, so it can't be performed until the pre-resolve merge
   305has occurred. After dependency resolution, we need to merge newly generated
   306``deps`` attributes; this is done in the post-resolve merge. The two phases use
   307the same algorithm.
   308
   309During the merge process, Gazelle attempts to match generated rules with
   310existing rules that have the same name and same kind. Rules are only merged if
   311both name and kind match. If an existing rule has the same name as a generated
   312rule but a different kind, the generated rule will not be merged.  If no
   313existing rule matches a generated rule, the generated rule is simply appended to
   314the end of the file. Existing rules that don't match any generated rule are not
   315modified.
   316
   317When Gazelle identifies a matching pair of rules, it combines each attribute
   318according to the algorithm below. If an attribute is present in the generated
   319rule but not in the existing rule, it is copied to the merged rule verbatim. If
   320an attribute is present in the existing rule but not the generated rule, Gazelle
   321behaves as if the generated attribute were present but empty.
   322
   323* For each value in the existing rule's attribute:
   324
   325  * If the value also appears in the generated rule's attribute or is marked
   326    with a ``# keep`` comment, preserve it. Otherwise, delete it.
   327
   328* For each value in the generated rule's attribute:
   329
   330  * If the value appears in the generated rule's attribute, ignore it.
   331    Otherwise, add it to the merged rule.
   332
   333* If the merged attribute is empty, delete it.
   334
   335When a value is present in both the existing and generated attributes, we use
   336the existing value instead of the generated value, since this preserves
   337comments.
   338
   339Some attributes are considered *unmergeable*, for example, ``visibility`` and
   340``gc_goopts``. Gazelle may add these attributes to existing rules if they are
   341not already present, but existing values won't be modified or deleted.
   342
   343Preserving customizations
   344~~~~~~~~~~~~~~~~~~~~~~~~~
   345
   346Gazelle has several mechanisms for preserving manual modifications to build
   347files. Some of these mechanisms work automatically; others require explicit
   348comments.
   349
   350* Gazelle will not modify or delete rules that don't appear to have been
   351  generated by Gazelle.
   352* As mentioned above, some attributes are considered unmergeable. Gazelle may
   353  set initial values for these but won't delete or replace existing values.
   354* ``# keep`` comments may be attached to any rule, attribute, or value
   355  to prevent Gazelle from modifying it.
   356* ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from
   357  adding files to source lists (for example, checked-in .pb.go files). They
   358  can also prevent Gazelle from recursing into directories that contain
   359  unbuildable code (e.g., ``testdata``).
   360* ``# gazelle:ignore`` directives prevent Gazelle from making any modifications
   361  to build files that contain them.
   362
   363Deleting rules
   364~~~~~~~~~~~~~~
   365
   366Deletion is a special case of the merging algorithm.
   367
   368When Gazelle generates rules for a package (see `Generating rules`_), it
   369actually produces two lists of rules: a list of rules for buildable targets,
   370and a list of empty rules that may be deleted. The empty rules have no
   371attributes other than ``name``.
   372
   373The empty rules are merged using the same algorithm as the other generated
   374rules. If, after merging, an empty rule has no attributes that would make the
   375rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted.
   376
   377Resolving dependencies
   378----------------------
   379
   380Godoc: resolve_
   381
   382When Gazelle generates rules for a package (see `Generating
   383rules`_), it stores names of the libraries imported by each rule in a special
   384``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these
   385imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``.
   386
   387Before dependency resolution starts, Gazelle builds a table of all known
   388libraries. This includes ``go_library``, ``go_proto_library``, and
   389``proto_library`` rules. The table is populated by scanning build files after
   390the pre-resolve merge, so existing and newly generated rules are included
   391in the table, and deleted rules are excluded. Once all library rules have been
   392added, Gazelle indexes the table by language-specific import path.
   393
   394Gazelle resolves each import string in ``_gazelle_imports`` as follows:
   395
   396* If the import is part of the standard library, it is dropped. Standard
   397  library dependencies are implicit.
   398
   399* If the import is provided by exactly one rule in the library table, the label
   400  for that rule is used.
   401
   402* If the import is provided by multiple libraries, we attempt to resolve
   403  the ambiguity.
   404
   405  * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible
   406    outside of the vendor directory's parent.
   407
   408  * Go libraries that are embedded by other Go libraries are not considered.
   409    Embedded libraries may be incomplete.
   410
   411  * When an ambiguity can't be resolved, Gazelle logs an error and skips
   412    the dependency.
   413
   414* If the import is not provided by any rule in the import table, we attempt
   415  to resolve the dependency using heuristics:
   416
   417  * If the import path starts with the current prefix (set with a 
   418    ``# gazelle:prefix`` directive or on the command line), we construct a label
   419    by concatenating the prefix directory and the portion of the import path
   420    below the prefix into a package name.
   421
   422  * Otherwise, the import path is considered external and is resolved
   423    according to the external mode set on the command line.
   424
   425    * In ``external`` mode, Gazelle determines the portion of the import path
   426      that corresponds to a repository using `golang.org/x/tools/go/vcs`_. This
   427      part of the path is converted into a repository name (for example,
   428      ``@org_golang_x_tools_go_vcs``), and the rest is converted to a package name.
   429
   430    * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/``
   431      to the import path.
   432
   433Note that ``visibility`` attributes are not considered when resolving imports.
   434This was part of an initial prototype, but it was confusing in many situations.
   435
   436Building and running Gazelle
   437----------------------------
   438
   439Gazelle is a regular Go program. It can be built, installed, and run without
   440Bazel, using the regular Go SDK.
   441
   442.. code:: bash
   443
   444  $ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest
   445  $ gazelle -go_prefix example.com/project
   446
   447We lightly discourage this method of running Gazelle. All developers on a
   448project should use the same version of Gazelle to ensure the build files
   449they generate are consistent. The easiest way to accomplish this is to build
   450and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, 
   451built as a normal ``go_binary``, then installed or run from the ``bazel-bin/``
   452directory.
   453
   454.. code:: bash
   455
   456  $ bazel build @bazel_gazelle//cmd/gazelle
   457  $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project
   458
   459It's usually better to invoke Gazelle through a wrapper script though. This
   460saves typing and ensures Gazelle is run with a consistent set of arguments.
   461We provide a Bazel rule that generates such a wrapper script. Developers may
   462add a snippet like the one below to a build file:
   463
   464.. code:: bzl
   465
   466  load("@bazel_gazelle//:def.bzl", "gazelle")
   467
   468  gazelle(
   469      name = "gazelle",
   470      command = "fix",
   471      external = "vendored",
   472      prefix = "example.com/project",
   473  )
   474
   475This script may be built and executed in a single command with ``bazel run``.
   476
   477.. code:: bash
   478
   479  $ bazel run //:gazelle
   480
   481This is the most convenient way to run Gazelle, and it's what we recommend to
   482users. However, there are two issues with running Gazelle in this
   483fashion. First, binaries executed by ``bazel run`` are run in the Bazel
   484execroot, not the user's current directory. The wrapper script uses a hack
   485(dereferencing symlinks) to jump to the top of the workspace source tree before
   486running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output
   487directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands
   488like ``bazel query`` would be helpful for detecting generated code, but it's not
   489safe to use them.
   490
   491To avoid these limitations, the wrapper script may be copied to the workspace
   492and optionally checked into version control. When the wrapper script is run
   493directly (without ``bazel run``), it will rebuild itself to ensure no changes
   494are needed. If the rebuilt script differs from the running script, it will
   495prompt the user to copy the rebuilt script into the workspace again.
   496
   497.. code:: bash
   498
   499  $ bazel build //:gazelle
   500  Target //:gazelle up-to-date:
   501    bazel-bin/gazelle.bash
   502  ____Elapsed time: 1.326s, Critical Path: 0.00s
   503  $ cp bazel-bin/gazelle.bash gazelle.bash
   504  $ ./gazelle.bash
   505
   506Dependencies
   507------------
   508
   509Gazelle has the following dependencies:
   510
   511github.com/bazelbuild/bazel-skylib
   512  Skylark utility used to generate wrapper script in the ``gazelle`` rule.
   513github.com/bazelbuild/buildtools/build
   514  Used to parse and rewrite build files.
   515github.com/bazelbuild/rules_go
   516  Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its
   517  own with the Go SDK.
   518golang.org/x/tools/vcs
   519  Used during dependency resolution to determine the repository prefix for a
   520  given import path. This uses the network.

View as plain text