1# CT Log Deployment (Manual)
2
3This document describes the individual steps and components involved in the
4deployment of a Trillian-based CT Log. These steps will gradually build up a
5system as shown in the diagram below.
6
7<img src="images/DeploymentFull.png" width="650">
8
9The text here describes the general approach, and details key options for
10the various binaries involved, but does not give full command-lines.
11To see complete details in a machine-executable format (which is therefore
12less likely to fall out of date), please consult the various testing
13shell scripts:
14
15 - [`trillian/integration/demo-script.sh`](../integration/demo-script.sh)
16 runs a simple single-instance test, corresponding to the first few sections
17 of this document.
18 - [`trillian/integration/ct_integration_test.sh`](../integration/ct_integration_test.sh)
19 runs a short integration test against a CT Log system.
20 - [`trillian/integration/ct_hammer_test.sh`](../integration/ct_hammer_test.sh)
21 runs a continuous integration test against a CT Log system.
22 - Both of the previous two tests allow for a more complex CT Log system, using
23 the shell functions in
24 [`trillian/integration/ct_functions.sh`](../integration/ct_functions.sh)
25 and
26 [github.com/google/trillian/integration/functions.sh](https://github.com/google/trillian/blob/master/integration/functions.sh).
27 - Multiple instances of each component can be invoked.
28 - A simple etcd instance can be enabled (by setting the `ETCD_DIR`
29 environment variable to the location of etcd binaries)
30 - A Prometheus instance can be enabled (by setting the `PROMETHEUS_DIR`
31 environment variable to the location of the Prometheus binary).
32
33**Cross-check**s are given throughout the document to allow
34confirmation of successful setup.
35
36 - [Data Storage](#data-storage)
37 - [Trillian Services](#trillian-services)
38 - [Tree Provisioning](#tree-provisioning)
39 - [CT Personality](#ct-personality)
40 - [Key Generation](#key-generation)
41 - [CA Certificates](#ca-certificates)
42 - [CTFE Configuration](#ctfe-configuration)
43 - [CTFE Start-up](#ctfe-start-up)
44 - [Distribution](#distribution)
45 - [Primary Signer Election](#primary-signer-election)
46 - [Load Balancing](#load-balancing)
47 - [Monitoring](#monitoring)
48 - [DoS Protection](#dos-protection)
49 - [Service Discovery](#service-discovery)
50
51## Data Storage
52
53Data storage for the logged certificates is the heart of a CT Log. The Trillian
54project has an internal storage interface that allows a variety of different
55implementations.
56
57This document uses the
58[MySQL storage implementation](https://github.com/google/trillian/blob/master/storage/mysql),
59which is set up according to the
60[instructions in the Trillian repo](https://github.com/google/trillian#mysql-setup);
61these instructions configure the Trillian database according to its
62[core schema file](http://github.com/google/trillian/blob/master/storage/mysql/schema/storage.sql).
63
64
65**Cross-check**: At this point, manually connecting to the MySQL database should
66succeed:
67```
68% mysql --host=127.0.0.1 --port=3306 --user=root --database=test
69Welcome to the MariaDB monitor. Commands end with ; or \g.
70Your MariaDB connection id is 764
71Server version: 10.1.29-MariaDB-6 Debian rodete
72
73Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
74
75Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
76
77MariaDB [test]> show tables;
78+-------------------+
79| Tables_in_test |
80+-------------------+
81| LeafData |
82| SequencedLeafData |
83| Subtree |
84| TreeControl |
85| TreeHead |
86| Trees |
87| Unsequenced |
88+-------------------+
897 rows in set (0.00 sec)
90
91MariaDB [test]> exit
92Bye
93```
94
95The setup so far is shown as:
96
97<img src="images/Deployment1DB.png" width="650">
98
99
100## Trillian Services
101
102The next step is to deploy two Trillian processes, the log server and the log
103signer. These binaries are not specific to CT or to WebPKI certificates; they
104provide a general mechanism for transparently recording data in a Merkle tree.
105
106The log server (`github.com/google/trillian/cmd/trillian_log_server`) exposes
107a gRPC interface that allows various primitives for querying and adding to the
108underlying Merkle tree. These operations are translated into operations on the
109storage layer, which are SQL operations in this example.
110
111 - The `--mysql_uri` option indicates where the MySQL database is available.
112 - The `--rpc_endpoint` option for the log server indicates the port that the
113 gRPC methods are available on.
114
115e.g.:
116```bash
117$ go run github.com/google/trillian/cmd/trillian_log_server --mysql_uri="root@tcp(localhost:3306)/test" --rpc_endpoint=:8080 --http_endpoint=:8081 --logtostderr
118I0424 18:36:20.378082 65882 main.go:97] **** Log Server Starting ****
119I0424 18:36:20.378732 65882 quota_provider.go:46] Using MySQL QuotaManager
120I0424 18:36:20.379453 65882 main.go:180] RPC server starting on :8080
121I0424 18:36:20.379522 65882 main.go:141] HTTP server starting on :8081
122I0424 18:36:20.379709 65882 main.go:188] Deleted tree GC started
123...
124```
125
126However, add operations are not immediately incorporated into the Merkle tree.
127Instead, pending add operations are queued up and a separate process, the log
128signer (`github.com/google/trillian/cmd/trillian_log_signer`) periodically
129reads pending entries from the queue. The signer gives these entries unique,
130monotonically increasing, sequence numbers and incorporates them into the
131Merkle tree.
132
133 - The `--mysql_uri` option indicates where the MySQL database is available.
134 - The `--sequencer_interval`, `--batch_size` and `--num_sequencers` options
135 provide control over the timing and batching of sequencing operations.
136 - The `--force_master` option allows the signer to assume that it is the only
137 instance running (more on this [later](#primary-signer-election)).
138 - The `--logtostderr` option emits more debug logging, which is helpful while
139 getting a deployment running.
140
141e.g.:
142```bash
143$ go run github.com/google/trillian/cmd/trillian_log_signer --mysql_uri="root@tcp(localhost:3306)/test" --force_master --rpc_endpoint=:8090 --http_endpoint=:8091 --logtostderr
144I0424 18:37:17.716095 66067 main.go:108] **** Log Signer Starting ****
145W0424 18:37:17.717141 66067 main.go:139] **** Acting as master for all logs ****
146I0424 18:37:17.717154 66067 quota_provider.go:46] Using MySQL QuotaManager
147I0424 18:37:17.717329 66067 operation_manager.go:328] Log operation manager starting
148I0424 18:37:17.717431 66067 main.go:180] RPC server starting on :8090
149I0424 18:37:17.717530 66067 main.go:141] HTTP server starting on :8091
150I0424 18:37:17.717794 66067 operation_manager.go:285] Acting as master for 0 / 0 active logs: master for:
151...
152```
153
154<img src="images/Deployment2Trillian.png" width="650">
155
156
157## Tree Provisioning
158
159The Trillian system is *multi-tenant*: a single Trillian system can support
160multiple independent Merkle trees. However, this means that our particular tree
161for holding Web PKI certificates needs to be provisioned in the system.
162
163The `github.com/google/trillian/cmd/createtree` tool performs this provisioning
164operation, and emits a **tree ID** that needs to be recorded for later in the
165deployment process.
166
167 - The `--admin_server` option for `createtree` indicates the address
168 (host:port) that tree creation gRPC requests should be sent to; it should
169 match the `--rpc_endpoint` for the log server.
170 - The `--max_root_duration` option should be set to less than the log's MMD.
171 This ensures that the log periodically produces a fresh STH even if there are
172 no updates. Make sure to leave a reasonable safety margin (e.g., 23h59m seems
173 risky for MMD=24h, while 1h or 12h feels safe).
174
175e.g.:
176```bash
177$ go run github.com/google/trillian/cmd/createtree --admin_server=:8080
178I0424 18:40:27.992970 66832 main.go:106] Creating tree tree_state:ACTIVE tree_type:LOG max_root_duration:{seconds:3600}
179W0424 18:40:27.993107 66832 rpcflags.go:36] Using an insecure gRPC connection to Trillian
180I0424 18:40:27.993276 66832 admin.go:50] CreateTree...
181I0424 18:40:27.997381 66832 admin.go:95] Initialising Log 3871182205569895248...
182I0424 18:40:28.000074 66832 admin.go:106] Initialised Log (3871182205569895248) with new SignedTreeHead:
183log_root:"\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00 \xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4șo\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U\x17Xﶃ\xe3\xf3=\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
1843871182205569895248
185```
186
187**Cross-check**: Once a new tree has been provisioned, the debug logging for the running
188`trillian_log_signer` should include a mention of the new tree.
189```
190I1011 16:44:16.160069 176101 log_operation_manager.go:210] create master election goroutine for 2385931157013381257
191I1011 16:44:17.160875 176101 log_operation_manager.go:246] now acting as master for 1 / 1, master for: <log-2385931157013381257>
192```
193
194
195## CT Personality
196
197Trillian provides a general gRPC API for Merkle tree operations, but relies on a
198*personality* to perform operations that are specific to the particular
199[transparency application](https://github.com/google/trillian/blob/master/docs/TransparentLogging.md).
200
201For Certificate Transparency, the `ctfe/` directory holds a Trillian personality
202which:
203 - provides the HTTP/JSON API entrypoints described by [RFC 6962](https://tools.ietf.org/html/rfc6962)
204 - checks that submissions to the Log are valid X.509 certificates, with a
205 chain of signatures that reaches an acceptable root.
206
207The CTFE personality is generally stateless, and is controlled by a
208[configuration file](#ctfe-configuration); the following subsections describe
209the key components of this file.
210
211As with the Trillian services, the CTFE is multi-tenant and supports parallel
212log instances, each configured separately in the config file.
213
214### Key Generation
215
216Each CT Log needs to have a unique private key that is used to sign
217cryptographic content from the Log. The [OpenSSL](https://www.openssl.org/)
218command line can be used to
219[generate](https://wiki.openssl.org/index.php/Command_Line_Elliptic_Curve_Operations#Generating_EC_Keys_and_Parameters)
220a suitable private key.
221
222```bash
223% openssl ecparam -name prime256v1 > privkey.pem # generate parameters file
224% openssl ecparam -in privkey.pem -genkey -noout >> privkey.pem # generate and append private key
225% openssl ec -in privkey.pem -pubout -out pubkey.pem # generate corresponding public key
226```
227
228The private key must either be for elliptic curves using NIST P-256 (as shown
229here), or for RSA signatures with SHA-256 and a 2048 bit (or larger) key
230([RFC 6962 s2.1.4](https://tools.ietf.org/html/rfc6962#section-2.1.4)).
231
232**Cross-check**: Confirm that the key is well-formed and readable:
233```bash
234% openssl ec -in privkey.pem -noout -text # check key is readable
235read EC key
236Private-Key: (256 bit)
237priv:
238 00:b5:99:8c:7b:f2:5b:0c:a1:3a:26:b0:12:e2:b7:
239 dd:c6:89:a6:49:3c:1d:26:70:44:ad:4a:34:91:2d:
240 b6:33:a3
241pub:
242 04:16:44:9b:04:47:ae:93:f4:14:94:7b:f7:ba:ae:
243 5e:6b:53:e3:b4:85:55:ab:f4:06:0f:65:36:bd:f7:
244 5f:d7:74:0c:e5:30:c6:a9:0e:0d:40:70:5d:b2:70:
245 92:cc:b9:bc:c7:d4:16:7e:96:24:52:6e:1a:a4:28:
246 43:d0:b5:97:72
247ASN1 OID: prime256v1
248NIST CURVE: P-256
249% openssl pkey -pubin -in pubkey.pem -text -noout
250Public-Key: (256 bit)
251pub:
252 04:16:44:9b:04:47:ae:93:f4:14:94:7b:f7:ba:ae:
253 5e:6b:53:e3:b4:85:55:ab:f4:06:0f:65:36:bd:f7:
254 5f:d7:74:0c:e5:30:c6:a9:0e:0d:40:70:5d:b2:70:
255 92:cc:b9:bc:c7:d4:16:7e:96:24:52:6e:1a:a4:28:
256 43:d0:b5:97:72
257ASN1 OID: prime256v1
258NIST CURVE: P-256
259```
260
261**Cross-check**: Once the CTFE is configured and running
262([below](#ctfe-start-up)), the `ctclient` command-line tool allows signature
263checking against the public key with the `--pub_key` option:
264```bash
265% go install github.com/google/certificate-transparency-go/client/ctclient
266% ctclient --log_uri http://localhost:6966/aramis --pub_key pubkey.pem sth
2672018-10-12 11:28:08.544 +0100 BST (timestamp 1539340088544): Got STH for V1 log (size=11718) at http://localhost:6966/aramis, hash 6fb36fcca60d61aa85e04ff0c34a87782f12d08568118602eec0208d85c3a40d
268Signature: Hash=SHA256 Sign=ECDSA
269Value=3045022100df855f0fd097a45070e2eb244c7cb63effda942f2d30308e3b84a72e1d16118b0220038e55f142501402cf03790b3997081f82ffe47f2d3f3b667e1c484aecf40a33
270```
271
272### CA Certificates
273
274Each Log must decide on its own policy about which CA's certificates are to be
275accepted for inclusion in the Log; this section therefore just provides an
276*example* of the process of configuring this set for the CT Log software.
277
278On a Debian-based system, the `ca-certificates` package includes a collection
279of CA certificates under `/etc/ssl/certs/`. A set of certificates suitable
280for feeding to `ct-server` can thus be produced with:
281
282```bash
283% sudo apt-get install -qy ca-certificates
284% sudo update-ca-certificates
285% cat /etc/ssl/certs/* > ca-roots.pem
286```
287
288### CTFE Configuration
289
290The information from the previous steps now needs to be assembled into a
291configuration file for the CTFE, in
292[text protocol buffer format](https://developers.google.com/protocol-buffers/docs/overview).
293
294Each Log instance needs configuration for:
295 - `log_id`: The Trillian tree ID from an [earlier step](#tree-provisioning)
296 - `prefix`: The path prefix the log will be served at.
297 - `max_merge_delay_sec`: The MMD for the log (typically 86400, which is 24 hours).
298 - `roots_pem_file`: The files holding
299 [accepted root CA certificates](#ca-certificates) (repeated).
300 - `private_key`: The [private key](#key-generation) for the log instance.
301 For a private key held in an external PEM file, this is of the form:
302 ```
303 private_key: {
304 [type.googleapis.com/keyspb.PEMKeyFile] {
305 path: "privkey.pem"
306 }
307 }
308 ```
309 - `public_key`: The corresponding public key for the log instance. When
310 both the public and private keys are specified, they will be checked for
311 consistency. (The public key is also worth including for reference and for
312 use by test tools.)
313
314**Cross-check**: The config file should be accepted at start-up by the
315`ct_server` binary, with the `--log_config` option.
316
317
318### CTFE Start-up
319
320Once the CTFE config file has been assembled, the CTFE personality
321(`github.com/google/certificate-transparency-go/trillian/ctfe/ct_server`)
322can be started.
323
324 - The `--log_config` option gives the location of the configuration file.
325 - The `--log_rpc_server` option gives the location of the Trillian log server;
326 it should match the `--rpc_endpoint` for the [log server](#trillian-services).
327 - The `--http_endpoint` option indicates the port that the CTFE should respond
328 to HTTP(S) requests on.
329
330 e.g.
331 ```bash
332 CTFE_CONFIG=/path/to/your/ctfe_config_file
333 TRILLIAN_LOG_SERVER_RPC_ENDPOINT=localhost:8080
334 go run github.com/google/certificate-transparency-go/trillian/ctfe/ct_server --log_config ${CTFE_CONFIG} --http_endpoint=localhost:6966 --log_rpc_server ${TRILLIAN_LOG_SERVER_RPC_ENDPOINT} --logtostderr
335
336 ```
337
338At this point, a complete (but minimal) CT Log setup is available. The manual
339set up steps up to this point match the
340[integration demo script](../integration/demo-script.sh); the contents of that
341script should (mostly) make sense.
342
343**Cross-check**: Opening `http://localhost:<port>/<prefix>/ct/v1/get-sth` in a
344browser should show JSON that indicates an empty tree.
345
346Alternatively, the `ctclient` command-line tool shows the same information:
347e.g.
348```bash
349go run github.com/google/certificate-transparency-go/client/ctclient@master get-sth --log_uri http://localhost:6966/aramis
3502018-10-12 11:28:08.544 +0100 BST (timestamp 1539340088544): Got STH for V1 log (size=11718) at http://localhost:6966/aramis, hash 6fb36fcca60d61aa85e04ff0c34a87782f12d08568118602eec0208d85c3a40d
351Signature: Hash=SHA256 Sign=ECDSA
352Value=3045022100df855f0fd097a45070e2eb244c7cb63effda942f2d30308e3b84a72e1d16118b0220038e55f142501402cf03790b3997081f82ffe47f2d3f3b667e1c484aecf40a33
353```
354
355**Cross-check**: Once the CTFE is configured and running, opening
356`http://localhost:<port>/<prefix>/ct/v1/get-roots` shows the configured roots.
357
358Alternatively, the `ctclient` command-line tool shows the same information in a
359more friendly way:
360e.g.
361```bash
362go run github.com/google/certificate-transparency-go/client/ctclient@master get-roots --log_uri http://localhost:6966/aramis
363Certificate:
364 Data:
365 Version: 3 (0x2)
366 Serial Number: 67554046 (0x406cafe)
367 Signature Algorithm: ECDSA-SHA256
368...
369```
370
371
372<img src="images/Deployment3CTFE.png" width="650">
373
374
375## Distribution
376
377For any real-world deployment, running a single instance of each binary in the
378system is not enough – let alone for a CT Log that will form part of the
379WebPKI ecosystem.
380
381 - Running multiple binary instances allows the Log to scale with traffic
382 levels, by adjusting the number of instances.
383 - Running instances in distinct locations reduces the chance of a
384 single external event affecting all instances simultaneously. (In terms of
385 cloud computing providers, this means that instances should be run in
386 different zones/regions/availability zones.)
387
388For the [CTFE personality](#ctfe-start-up), running multiple instances is
389straightforward: just run more copies of the `ct_server` binary.
390
391> Note that for a test of this with multiple *local* instances, each instance
392> will need to be configured to listen on a distinct port.
393
394Running multiple instances of the [log server](#trillian-services) process also
395just involves running more copies of the `trillian_log_server` binary.
396However, this does need the CTFE personality to be configured with the locations
397of all of the different log server instances.
398
399The simplest (but not very flexible) way to do this is a comma-separated list:
400```
401go run github.com/google/certificate-transparency-go/trillian/ctfe/ct_server --log_rpc_server host1:port1,host2:port2,host3:port3
402```
403
404(More flexible approaches are discussed [below](#service-discovery).)
405
406<img src="images/Deployment4Distribute.png" width="650">
407
408
409## Primary Signer Election
410
411The Trillian log signer requires more care to convert to a multiple-instance
412system. The underlying Merkle tree relies on there being a unique
413sequencing of the entries in the tree, and the signer is responsible for
414generating that sequence.
415
416As a result, multiple instances of the log signer are run to improve resilience,
417not scalability. At any time only a single signer instance is responsible
418for the sequencing of a particular Merkle tree.
419
420This single-signer constraint is implemented as an *election* process, and the
421provided implementation of this process relies on an
422[etcd](https://coreos.com/etcd/) cluster to provide data synchronization and
423replication facilities.
424
425For resilience, the `etcd` cluster for a CT Log should have multiple `etcd`
426instances, but does not need large numbers of instances (and in fact large
427numbers of `etcd` instances will slow down replication).
428
429The
430[CoreOS `etcd` documentation](https://coreos.com/etcd/docs/latest/clustering.html)
431covers the process of setting up an `etcd` cluster. Once this is set up,
432multiple instances of the `trillian_log_signer` binary can be run with:
433
434 - The `--etcd_servers` option set to the location of the etcd cluster (as a
435 comma-separated list of host:port pairs).
436 - The `--force_master` option removed.
437
438<img src="images/Deployment5Election.png" width="650">
439
440
441## Load Balancing
442
443The deployment described so far involves a collection of CTFE personalities, each
444serving HTTP(S) at a particular end-point. A real deployment is likely to
445involve front-end load balancing between these instances, possibly also
446including SSL termination for HTTPS.
447
448Setup and configuration of these reverse-proxy instances is beyond the scope of
449this document, but note that cloud environments often provide this functionality
450(e.g. [Google Cloud Platform](https://cloud.google.com/compute/docs/load-balancing/http/),
451[Amazon EC2](http://aws.amazon.com/documentation/elastic-load-balancing/)).
452
453<img src="images/Deployment6LB.png" width="650">
454
455
456## Monitoring
457
458A live CT Log deployment needs to be monitored so that availability and
459performance can be tracked, and alerts generated for failure conditions.
460
461Monitoring can be broken down into two main styles:
462 - *Black-box* monitoring, which queries the system from the outside, using the
463 same mechanisms that real user traffic uses.
464 - *White-box* monitoring, which queries the internal state of the system,
465 using information that is not available to external users.
466
467Black-box monitoring is beyond the scope of this document, but tools such as
468[Blackbox exporter](https://github.com/prometheus/blackbox_exporter) can be
469used to (say) check the `https://<log>/ct/v1/get-sth` entrypoint and export the
470resulting data to Prometheus.
471
472For white-box monitoring, all of the binaries in the system export metrics via
473`/metrics` an HTTP server, provided that the `--http_endpoint` option was
474specified on their invocation.
475
476This allows a pull-based monitoring system such as
477[Prometheus](https://prometheus.io/) to poll for information/statistics, which
478can then feed into alerts and dashboards.
479
480Configuration of Prometheus is beyond the scope of this document, but a minimal
481sample console is [available](../integration/consoles/)
482
483**Cross-check**: Once running, Prometheus shows the expected collection of
484targets to monitor under `http:<prometheus-host>:9090/targets`.
485
486<img src="images/PrometheusTargets.png" width="650">
487
488**Cross-check**: Once running, Prometheus shows the a sensible tree size graph
489under `http:<prometheus-host>:9090/consoles/trillian.html`.
490
491<img src="images/PrometheusTargets.png" width="650">
492
493The addition of Prometheus for monitoring yields a system setup as shown.
494
495<img src="images/Deployment7Prometheus.png" width="650">
496
497
498## DoS Protection
499
500A live production system that is exposed to the general Internet needs
501protection against traffic overload and denial-of-service attacks.
502
503The `--quota_system=etcd` option (which requires the `--etcd_servers` option)
504for the log server and log signer enables a simple etcd-based quota system,
505[documented here](https://github.com/google/trillian/blob/master/quota/etcd/README.md).
506
507At this point, we have configured a full CT Log system.
508
509<img src="images/DeploymentFull.png" width="650">
510
511
512## Service Discovery
513
514The distributed configuration described in the previous sections was not very
515flexible, as it involved lists of host:port entries. However, now that an
516etcd cluster is [available](#primary-signer-election), it can be used to allow
517more dynamic discovery of running services: services register themselves with
518etcd so that other services can find their locations.
519
520A log server executable that has the `--etcd_servers` option can also take
521an `--etcd_service` option which indicates which service name it registers
522against. Likewise, if the CTFE is run with the `--etcd_servers` option, the
523`--log_rpc_server` argument is interpreted as an etcd service name to query for
524gRPC endpoint resolution.
525
526Similarly, the set of metrics-displaying targets that are available for
527monitoring can be registered as an etcd service using the `--etcd_http_service`
528option to indicate the relevant service name.
529
530**Cross-check**: The current registered endpoints for a service can be queried
531with the `etcdctl` tool:
532```bash
533% export ETCDCTL_API=3
534% etcdctl get trillian-logserver/ --prefix
535trillian-logserver/localhost:6962
536{"Op":0,"Addr":"localhost:6962","Metadata":null}
537```
View as plain text