1Architecture of Gazelle
2=======================
3
4.. All external links are here.
5
6.. Godoc links
7.. _buildifier build: https://godoc.org/github.com/bazelbuild/buildtools/build
8.. _config: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/config
9.. _go/build: https://godoc.org/go/build
10.. _go/parser: https://godoc.org/go/parser
11.. _merger: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/merger
12.. _packages: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/packages
13.. _resolve: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/resolve
14.. _rules: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/rules
15.. _CallExpr: https://godoc.org/github.com/bazelbuild/buildtools/build#CallExpr
16.. _golang.org/x/tools/go/vcs: https://godoc.org/golang.org/x/tools/go/vcs
17
18.. Other documentation links
19.. _buildifier: https://github.com/bazelbuild/buildtools/tree/master/buildifier
20.. _config_setting: https://docs.bazel.build/versions/master/be/general.html#config_setting
21.. _Fix command transformations: README.rst#fix-command-transformations
22.. _full list of directives: README.rst#Directives
23.. _select: https://docs.bazel.build/versions/master/skylark/lib/globals.html#select
24
25.. Issues
26.. _#5: https://github.com/bazelbuild/bazel-gazelle/issues/5
27.. _#7: https://github.com/bazelbuild/bazel-gazelle/issues/7
28
29.. Actual content is below
30
31Gazelle is a tool that generates and updates Bazel build files for Go projects
32that follow the conventional "go build" project layout. It is intended to
33simplify the maintenance of Bazel Go projects as much as possible.
34
35This document describes how Gazelle works. It should help users understand why
36Gazelle behaves as it does, and it should help developers understand
37how to modify Gazelle and how to write similar tools.
38
39.. contents::
40
41Overview
42--------
43
44Gazelle generates and updates build files according the algorithm outlined
45below. Each of the steps here is described in more detail in the sections below.
46
47* Build a configuration from command line arguments and special comments
48 in the top-level build file. See Configuration_.
49
50* For each directory in the repository:
51
52 * Read the build file if one is present.
53
54 * If the build file should be updated (based on configuration):
55
56 * Apply transformations to the build file to migrate away from deprecated
57 APIs. See `Fixing build files`_.
58
59 * Scan the source files and collect metadata needed to generate rules
60 for the directory. See `Scanning source files`_.
61
62 * Generate new rules from the build metadata collected earlier. See
63 `Generating rules`_.
64
65 * Merge the new rules into the directory's build file. Delete any rules
66 which are now empty. See `Merging and deleting rules`_.
67
68 * Add the library rules in the directory's build file to a global table,
69 indexed by import path.
70
71* For each updated build file:
72
73 * Use the library table to map import paths to Bazel labels for rules that
74 were added or merged earlier. See `Resolving dependencies`_.
75
76 * Merge the resolved rules back into the file.
77
78 * Format the file using buildifier_ and emit it according to the output mode:
79 write to disk, print the whole file, or print the diff.
80
81Configuration
82-------------
83
84Godoc: config_
85
86Gazelle stores configuration information in ``Config`` objects. These objects
87contain settings that affect the behavior of most packages in the program.
88For example:
89
90* The list of directories that Gazelle should update.
91* The path of the repository root directory. Bazel package names are based
92 on paths relative to this location.
93* The current import path prefix and the directory where it was set.
94 Gazelle uses this to infer import paths for ``go_library`` rules.
95* A list of build tags that Gazelle considers to be true on all platforms.
96
97``Config`` objects apply to individual directories. Each directory inherits
98the ``Config`` from its parent. Values in a ``Config`` may be modified within
99a directory using *directives* written in the directory's build file. A
100directive is a special comment formatted like this:
101
102::
103
104 # gazelle:key value
105
106Here are a few examples. See the `full list of directives`_.
107
108* ``# gazelle:prefix`` - sets the Go import path prefix for the current
109 directory.
110* ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers
111 to be true on all platforms.
112
113There are a few directives which are not applied to the ``Config`` object but
114are interpreted directly in packages where they are relevant.
115
116* ``# gazelle:ignore`` - the build file should not be updated by Gazelle.
117 Gazelle may still index its contents so it can resolve dependencies in other
118 build files.
119* ``# gazelle:exclude path/to/file`` - the named file should not be read by
120 Gazelle and should not be included in ``srcs`` lists. If this refers to
121 a directory, Gazelle won't recurse into the directory. This directive may
122 appear multiple times.
123
124Fixing build files
125------------------
126
127Godoc: merger_
128
129From time to time, APIs in rules_go are changed or updated. Gazelle helps
130users stay up to date with these changes by automatically fixing deprecated
131usage.
132
133Minor fixes are applied by Gazelle automatically every time it runs. However,
134some fixes may delete or rename existing rules. Users must run ``gazelle fix``
135to apply these fixes. By default, Gazelle will only *warn* users that
136``gazelle fix`` should be run.
137
138Here are a few of the fixes Gazelle performs. See `Fix command transformations`_
139for a full list.
140
141* **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and
142 merge their attributes into ``go_library`` rules that reference them.
143 This is a major fix and is only applied with ``gazelle fix``.
144* **Migrate library attributes:** Gazelle replaces ``library`` attributes
145 with ``embed`` attributes. The only difference between these is that
146 ``library`` (which is now deprecated) accepts a single label, while ``embed``
147 accepts a list. This is a minor fix and is always applied.
148
149Users can prevent Gazelle from modifying rules, attributes, or individual
150values by writing ``# keep`` comments above them.
151
152Scanning source files
153---------------------
154
155Godoc: packages_
156
157Nearly all of the information needed to build a program with the standard Go SDK
158is implied by directory structure, file names, and file contents. This is why
159``go build`` doesn't require any sort of build file. The `go/build`_ package in
160the standard library collects this information.
161
162Unfortunately, `go/build`_ can only collect information for one platform at
163a time. Gazelle needs to generate build files that work on all platforms, so
164we have our own implementation of this logic.
165
166Information extracted from files
167~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
168
169Gazelle extracts build metadata from source files and contents in much the
170same way that the standard `go/build`_ package does. It gets the following
171information from file names:
172
173* File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files
174 are included in Go rules. If any cgo code is present, then C/C++ files are
175 also included. .proto files are also used to build proto rules. Other files
176 (e.g., .txt) are ignored.
177* Test suffix. For example, if a file is named ``foo_test.go``, it will be
178 included in a test target instead of a library or binary target.
179* OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go``
180 will be listed in the ``linux_amd64`` section of the target it belongs to.
181
182Gazelle gets the following information from file contents:
183
184* Package name. This is syntactically the first part of every .go file. All
185 files in the same directory must have the same package name (except for
186 external test sources, which have a package name ending with ``_test``). If
187 there are multiple packages, Gazelle will choose one that matches the
188 directory name (if present) or report an error.
189* Imported libraries. Go import paths are usually URLs. Imports in
190 platform-specific source files are also platform-specific.
191* Build tags. The Go toolchain recognizes comments beginning with ``// +build``
192 before the package declaration. These tags tell the build system that a file
193 should only be built for specific platforms. See `this article
194 <https://dave.cheney.net/2013/10/12/how-to-use-conditional-compilation-with-the-go-build-tool>`_
195 for more information.
196* Whether cgo code is present. This affects how packages are built and
197 whether C/C++ files are included.
198* C/C++ compile and link options (specified in ``#cgo`` directives in cgo
199 comments). These may be platform-specific.
200
201In most cases, only the top of the file is parsed. For Go files, we use the
202standard `go/parser`_ package. For proto files, we use regular expressions that
203match ``package``, ``go_package``, and ``import`` statements.
204
205The ``Package`` object
206~~~~~~~~~~~~~~~~~~~~~~
207
208Gazelle stores build metadata in a ``Package`` object. Currently, we only
209support one ``Package`` per directory (which is also what the Go SDK supports),
210but this will be expanded in the future. ``Package`` objects contain some
211top-level metadata (like the package name and directory path), along with
212several target objects (``GoTarget`` and ``ProtoTarget``).
213
214Target objects correspond directly to rules that will be generated later. They
215store lists of sources, imports, and flags in ``PlatformStrings`` objects.
216
217``PlatformStrings`` objects store strings in four sections: a generic list, an
218OS-specific dictionary, an architecture-specific dictionary, and an
219OS-and-architecture-specific dictionary. The keys in the dictionaries are OS
220names, architecture names, or OS-and-architecture pairs; the values are lists of
221strings. The same string may not appear more than once in a list and may not
222appear in more than one section. This is due to a Bazel requirement: the same
223label may not appear more than once in a ``deps`` list.
224
225Generating rules
226----------------
227
228Godoc: rules_
229
230Once build metadata has been extracted from the sources in a directory,
231Gazelle generates rules for building those sources.
232
233Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the
234`buildifier build`_ library. This is the same library used to parse and format
235build files. This lets us manipulate newly generated rules and existing rules
236with the same code.
237
238We may generate the following rules:
239
240* ``proto_library`` and ``go_proto_library`` are generated if there was at
241 least one .proto source file.
242* ``go_library`` is generated if there was at least one non-test source. This
243 may embed the ``go_proto_library`` if there was one.
244* ``go_test`` rules are generated for internal and external tests. Internal
245 tests embed the ``go_library`` while external tests depend on the
246 ``go_library`` as a separate package.
247* ``go_binary`` is generated if the package name was ``main``. It embeds the
248 ``go_library``.
249
250Rules are named according to a pluggable naming policy, but there is currently
251only one policy: libraries are named ``go_default_library``, tests are
252named ``go_default_test``, and binaries are named after the directory. The
253``go_default_library`` name is an historical artifact from before we had
254index-based dependency resolution. We'll need to move away from this naming
255scheme in the future (`#5`_) before we support multiple packages (`#7`_).
256
257Sources, imports, and flags within each target are converted to expressions in a
258straightforward fashion. The lists within ``PlatformStrings`` are converted to
259list expressions. Dictionaries are converted to calls to `select`_ expressions
260(when Bazel evaluates a `select`_ expression, it will choose one of several
261provided lists, based on `config_setting`_ rules). Lists and select expressions
262may be added together. For example:
263
264.. code:: bzl
265
266 go_library(
267 name = "go_default_library",
268 srcs = [
269 "terminal.go",
270 ] + select({
271 "@io_bazel_rules_go//go/platform:darwin": [
272 "util.go",
273 "util_bsd.go",
274 ],
275 "@io_bazel_rules_go//go/platform:linux": [
276 "util.go",
277 "util_linux.go",
278 ],
279 "@io_bazel_rules_go//go/platform:windows": [
280 "util_windows.go",
281 ],
282 "//conditions:default": [],
283 }),
284 ...
285 )
286
287At this point, Gazelle does not have enough information to generate expressions
288``deps`` attributes. We only have a list of import strings extracted from source
289files. These imports are stored temporarily in a special ``_gazelle_imports``
290attribute in each rule. Later, the imports are converted to Bazel labels (see
291`Resolving dependencies`_), and this attribute is replaced with ``deps``.
292
293Merging and deleting rules
294--------------------------
295
296Godoc: merger_
297
298Merging is the process of combining generated rules with the corresponding
299rules in an existing build file. If no build file exists in a directory, a
300new file is created with generated rules, and no merging is performed.
301
302Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an
303interdependence with dependency resolution. Dependency resolution uses a table
304of *merged* library rules, so it can't be performed until the pre-resolve merge
305has occurred. After dependency resolution, we need to merge newly generated
306``deps`` attributes; this is done in the post-resolve merge. The two phases use
307the same algorithm.
308
309During the merge process, Gazelle attempts to match generated rules with
310existing rules that have the same name and same kind. Rules are only merged if
311both name and kind match. If an existing rule has the same name as a generated
312rule but a different kind, the generated rule will not be merged. If no
313existing rule matches a generated rule, the generated rule is simply appended to
314the end of the file. Existing rules that don't match any generated rule are not
315modified.
316
317When Gazelle identifies a matching pair of rules, it combines each attribute
318according to the algorithm below. If an attribute is present in the generated
319rule but not in the existing rule, it is copied to the merged rule verbatim. If
320an attribute is present in the existing rule but not the generated rule, Gazelle
321behaves as if the generated attribute were present but empty.
322
323* For each value in the existing rule's attribute:
324
325 * If the value also appears in the generated rule's attribute or is marked
326 with a ``# keep`` comment, preserve it. Otherwise, delete it.
327
328* For each value in the generated rule's attribute:
329
330 * If the value appears in the generated rule's attribute, ignore it.
331 Otherwise, add it to the merged rule.
332
333* If the merged attribute is empty, delete it.
334
335When a value is present in both the existing and generated attributes, we use
336the existing value instead of the generated value, since this preserves
337comments.
338
339Some attributes are considered *unmergeable*, for example, ``visibility`` and
340``gc_goopts``. Gazelle may add these attributes to existing rules if they are
341not already present, but existing values won't be modified or deleted.
342
343Preserving customizations
344~~~~~~~~~~~~~~~~~~~~~~~~~
345
346Gazelle has several mechanisms for preserving manual modifications to build
347files. Some of these mechanisms work automatically; others require explicit
348comments.
349
350* Gazelle will not modify or delete rules that don't appear to have been
351 generated by Gazelle.
352* As mentioned above, some attributes are considered unmergeable. Gazelle may
353 set initial values for these but won't delete or replace existing values.
354* ``# keep`` comments may be attached to any rule, attribute, or value
355 to prevent Gazelle from modifying it.
356* ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from
357 adding files to source lists (for example, checked-in .pb.go files). They
358 can also prevent Gazelle from recursing into directories that contain
359 unbuildable code (e.g., ``testdata``).
360* ``# gazelle:ignore`` directives prevent Gazelle from making any modifications
361 to build files that contain them.
362
363Deleting rules
364~~~~~~~~~~~~~~
365
366Deletion is a special case of the merging algorithm.
367
368When Gazelle generates rules for a package (see `Generating rules`_), it
369actually produces two lists of rules: a list of rules for buildable targets,
370and a list of empty rules that may be deleted. The empty rules have no
371attributes other than ``name``.
372
373The empty rules are merged using the same algorithm as the other generated
374rules. If, after merging, an empty rule has no attributes that would make the
375rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted.
376
377Resolving dependencies
378----------------------
379
380Godoc: resolve_
381
382When Gazelle generates rules for a package (see `Generating
383rules`_), it stores names of the libraries imported by each rule in a special
384``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these
385imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``.
386
387Before dependency resolution starts, Gazelle builds a table of all known
388libraries. This includes ``go_library``, ``go_proto_library``, and
389``proto_library`` rules. The table is populated by scanning build files after
390the pre-resolve merge, so existing and newly generated rules are included
391in the table, and deleted rules are excluded. Once all library rules have been
392added, Gazelle indexes the table by language-specific import path.
393
394Gazelle resolves each import string in ``_gazelle_imports`` as follows:
395
396* If the import is part of the standard library, it is dropped. Standard
397 library dependencies are implicit.
398
399* If the import is provided by exactly one rule in the library table, the label
400 for that rule is used.
401
402* If the import is provided by multiple libraries, we attempt to resolve
403 the ambiguity.
404
405 * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible
406 outside of the vendor directory's parent.
407
408 * Go libraries that are embedded by other Go libraries are not considered.
409 Embedded libraries may be incomplete.
410
411 * When an ambiguity can't be resolved, Gazelle logs an error and skips
412 the dependency.
413
414* If the import is not provided by any rule in the import table, we attempt
415 to resolve the dependency using heuristics:
416
417 * If the import path starts with the current prefix (set with a
418 ``# gazelle:prefix`` directive or on the command line), we construct a label
419 by concatenating the prefix directory and the portion of the import path
420 below the prefix into a package name.
421
422 * Otherwise, the import path is considered external and is resolved
423 according to the external mode set on the command line.
424
425 * In ``external`` mode, Gazelle determines the portion of the import path
426 that corresponds to a repository using `golang.org/x/tools/go/vcs`_. This
427 part of the path is converted into a repository name (for example,
428 ``@org_golang_x_tools_go_vcs``), and the rest is converted to a package name.
429
430 * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/``
431 to the import path.
432
433Note that ``visibility`` attributes are not considered when resolving imports.
434This was part of an initial prototype, but it was confusing in many situations.
435
436Building and running Gazelle
437----------------------------
438
439Gazelle is a regular Go program. It can be built, installed, and run without
440Bazel, using the regular Go SDK.
441
442.. code:: bash
443
444 $ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest
445 $ gazelle -go_prefix example.com/project
446
447We lightly discourage this method of running Gazelle. All developers on a
448project should use the same version of Gazelle to ensure the build files
449they generate are consistent. The easiest way to accomplish this is to build
450and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file,
451built as a normal ``go_binary``, then installed or run from the ``bazel-bin/``
452directory.
453
454.. code:: bash
455
456 $ bazel build @bazel_gazelle//cmd/gazelle
457 $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project
458
459It's usually better to invoke Gazelle through a wrapper script though. This
460saves typing and ensures Gazelle is run with a consistent set of arguments.
461We provide a Bazel rule that generates such a wrapper script. Developers may
462add a snippet like the one below to a build file:
463
464.. code:: bzl
465
466 load("@bazel_gazelle//:def.bzl", "gazelle")
467
468 gazelle(
469 name = "gazelle",
470 command = "fix",
471 external = "vendored",
472 prefix = "example.com/project",
473 )
474
475This script may be built and executed in a single command with ``bazel run``.
476
477.. code:: bash
478
479 $ bazel run //:gazelle
480
481This is the most convenient way to run Gazelle, and it's what we recommend to
482users. However, there are two issues with running Gazelle in this
483fashion. First, binaries executed by ``bazel run`` are run in the Bazel
484execroot, not the user's current directory. The wrapper script uses a hack
485(dereferencing symlinks) to jump to the top of the workspace source tree before
486running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output
487directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands
488like ``bazel query`` would be helpful for detecting generated code, but it's not
489safe to use them.
490
491To avoid these limitations, the wrapper script may be copied to the workspace
492and optionally checked into version control. When the wrapper script is run
493directly (without ``bazel run``), it will rebuild itself to ensure no changes
494are needed. If the rebuilt script differs from the running script, it will
495prompt the user to copy the rebuilt script into the workspace again.
496
497.. code:: bash
498
499 $ bazel build //:gazelle
500 Target //:gazelle up-to-date:
501 bazel-bin/gazelle.bash
502 ____Elapsed time: 1.326s, Critical Path: 0.00s
503 $ cp bazel-bin/gazelle.bash gazelle.bash
504 $ ./gazelle.bash
505
506Dependencies
507------------
508
509Gazelle has the following dependencies:
510
511github.com/bazelbuild/bazel-skylib
512 Skylark utility used to generate wrapper script in the ``gazelle`` rule.
513github.com/bazelbuild/buildtools/build
514 Used to parse and rewrite build files.
515github.com/bazelbuild/rules_go
516 Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its
517 own with the Go SDK.
518golang.org/x/tools/vcs
519 Used during dependency resolution to determine the repository prefix for a
520 given import path. This uses the network.
View as plain text