1=====================================
2Server Discovery And Monitoring Tests
3=====================================
4
5.. contents::
6
7----
8
9The YAML and JSON files in this directory tree are platform-independent tests
10that drivers can use to prove their conformance to the
11Server Discovery And Monitoring Spec.
12
13Additional prose tests, that cannot be represented as spec tests, are
14described and MUST be implemented.
15
16Version
17-------
18
19Files in the "specifications" repository have no version scheme. They are not
20tied to a MongoDB server version.
21
22Format
23------
24
25Each YAML file has the following keys:
26
27- description: A textual description of the test.
28- uri: A connection string.
29- phases: An array of "phase" objects.
30 A phase of the test optionally sends inputs to the client,
31 then tests the client's resulting TopologyDescription.
32
33Each phase object has the following keys:
34
35- description: (optional) A textual description of this phase.
36- responses: (optional) An array of "response" objects. If not provided,
37 the test runner should construct the client and perform assertions specified
38 in the outcome object without processing any responses.
39- applicationErrors: (optional) An array of "applicationError" objects.
40- outcome: An "outcome" object representing the TopologyDescription.
41
42A response is a pair of values:
43
44- The source, for example "a:27017".
45 This is the address the client sent the "hello" or legacy hello command to.
46- A hello or legacy hello response, for example ``{ok: 1, helloOk: true, isWritablePrimary: true}``.
47 If the response includes an electionId it is shown in extended JSON like
48 ``{"$oid": "000000000000000000000002"}``.
49 The empty response `{}` indicates a network error
50 when attempting to call "hello" or legacy hello.
51
52An "applicationError" object has the following keys:
53
54- address: The source address, for example "a:27017".
55- generation: (optional) The error's generation number, for example ``1``.
56 When absent this value defaults to the pool's current generation number.
57- maxWireVersion: The ``maxWireVersion`` of the connection the error occurs
58 on, for example ``9``. Added to support testing the behavior of "not writable primary"
59 errors on <4.2 and >=4.2 servers.
60- when: A string describing when this mock error should occur. Supported
61 values are:
62
63 - "beforeHandshakeCompletes": Simulate this mock error as if it occurred
64 during a new connection's handshake for an application operation.
65 - "afterHandshakeCompletes": Simulate this mock error as if it occurred
66 on an established connection for an application operation (i.e. after
67 the connection pool check out succeeds).
68
69- type: The type of error to mock. Supported values are:
70
71 - "command": A command error. Always accompanied with a "response".
72 - "network": A non-timeout network error.
73 - "timeout": A network timeout error.
74
75- response: (optional) A command error response, for example
76 ``{ok: 0, errmsg: "not primary"}``. Present if and only if ``type`` is
77 "command". Note the server only returns "not primary" if the "hello" command
78 has been run on this connection. Otherwise the legacy error message is returned.
79
80In non-monitoring tests, an "outcome" represents the correct
81TopologyDescription that results from processing the responses in the phases
82so far. It has the following keys:
83
84- topologyType: A string like "ReplicaSetNoPrimary".
85- setName: A string with the expected replica set name, or null.
86- servers: An object whose keys are addresses like "a:27017", and whose values
87 are "server" objects.
88- logicalSessionTimeoutMinutes: null or an integer.
89- maxSetVersion: absent or an integer.
90- maxElectionId: absent or a BSON ObjectId.
91- compatible: absent or a bool.
92
93A "server" object represents a correct ServerDescription within the client's
94current TopologyDescription. It has the following keys:
95
96- type: A ServerType name, like "RSSecondary".
97- setName: A string with the expected replica set name, or null.
98- setVersion: absent or an integer.
99- electionId: absent, null, or an ObjectId.
100- logicalSessionTimeoutMinutes: absent, null, or an integer.
101- minWireVersion: absent or an integer.
102- maxWireVersion: absent or an integer.
103- topologyVersion: absent, null, or a topologyVersion document.
104- pool: (optional) A "pool" object.
105
106A "pool" object represents a correct connection pool for a given server.
107It has the following keys:
108
109- generation: This server's expected pool generation, like ``0``.
110
111In monitoring tests, an "outcome" contains a list of SDAM events that should
112have been published by the client as a result of processing hello or legacy hello
113responses in the current phase. Any SDAM events published by the client during its
114construction (that is, prior to processing any of the responses) should be
115combined with the events published during processing of hello or legacy hello
116responses of the first phase of the test. A test MAY explicitly verify events
117published during client construction by providing an empty responses array for the
118first phase.
119
120
121Use as unittests
122----------------
123
124Mocking
125~~~~~~~
126
127Drivers should be able to test their server discovery and monitoring logic without
128any network I/O, by parsing hello (or legacy hello) and application error from the
129test file and passing them into the driver code. Parts of the client and
130monitoring code may need to be mocked or subclassed to achieve this.
131`A reference implementation for PyMongo 3.10.1 is available here
132<https://github.com/mongodb/mongo-python-driver/blob/3.10.1/test/test_discovery_and_monitoring.py>`_.
133
134Initialization
135~~~~~~~~~~~~~~
136
137For each file, create a fresh client object initialized with the file's "uri".
138
139All files in the "single" directory include a connection string with one host
140and no "replicaSet" option.
141Set the client's initial TopologyType to Single, however that is achieved using the client's API.
142(The spec says "The user MUST be able to set the initial TopologyType to Single"
143without specifying how.)
144
145All files in the "sharded" directory include a connection string with multiple hosts
146and no "replicaSet" option.
147Set the client's initial TopologyType to Unknown or Sharded, depending on the client's API.
148
149All files in the "rs" directory include a connection string with a "replicaSet" option.
150Set the client's initial TopologyType to ReplicaSetNoPrimary.
151(For most clients, parsing a connection string with a "replicaSet" option
152automatically sets the TopologyType to ReplicaSetNoPrimary.)
153
154Set up a listener to collect SDAM events published by the client, including
155events published during client construction.
156
157Test Phases
158~~~~~~~~~~~
159
160For each phase in the file:
161
162#. Parse the "responses" array. Pass in the responses in order to the driver
163 code. If a response is the empty object ``{}``, simulate a network error.
164
165#. Parse the "applicationErrors" array. For each element, simulate the given
166 error as if it occurred while running an application operation. Note that
167 it is sufficient to construct a mock error and call the procedure which
168 updates the topology, e.g.
169 ``topology.handleApplicationError(address, generation, maxWireVersion, error)``.
170
171For non-monitoring tests,
172once all responses are processed, assert that the phase's "outcome" object
173is equivalent to the driver's current TopologyDescription.
174
175For monitoring tests, once all responses are processed, assert that the
176events collected so far by the SDAM event listener are equivalent to the
177events specified in the phase.
178
179Some fields such as "logicalSessionTimeoutMinutes", "compatible", and
180"topologyVersion" were added later and haven't been added to all test files.
181If these fields are present, test that they are equivalent to the fields of
182the driver's current TopologyDescription or ServerDescription.
183
184For monitoring tests, clear the list of events collected so far.
185
186Continue until all phases have been executed.
187
188Integration Tests
189-----------------
190
191Integration tests are provided in the "integration" directory.
192
193Test Format
194~~~~~~~~~~~
195
196The same as the `Transactions Spec Test format
197</source/transactions/tests/README.rst#test-format>`_ with the following
198additions:
199
200- The ``runOn`` requirement gains a new field:
201
202 - ``authEnabled`` (optional): If True, skip this test if auth is not enabled.
203 If False, skip this test if auth is enabled. If this field is omitted,
204 this test can be run on clusters with or without auth.
205
206Special Test Operations
207~~~~~~~~~~~~~~~~~~~~~~~
208
209Certain operations that appear in the "operations" array do not correspond to
210API methods but instead represent special test operations. Such operations are
211defined on the "testRunner" object and are documented in the
212`Transactions Spec Test
213</source/transactions/tests/README.rst#special-test-operations>`_.
214
215Additional, SDAM test specific operations are documented here:
216
217configureFailPoint
218''''''''''''''''''
219
220The "configureFailPoint" operation instructs the test runner to configure
221the given server failpoint on the "admin" database. The runner MUST disable
222this failpoint at the end of the test. For example::
223
224 - name: configureFailPoint
225 object: testRunner
226 arguments:
227 failPoint:
228 configureFailPoint: failCommand
229 mode: { times: 1 }
230 data:
231 failCommands: ["insert"]
232 closeConnection: true
233
234Tests that use the "configureFailPoint" operation do not include
235``configureFailPoint`` commands in their command expectations. Drivers MUST
236ensure that ``configureFailPoint`` commands do not appear in the list of logged
237commands, either by manually filtering it from the list of observed commands or
238by using a different MongoClient to execute ``configureFailPoint``.
239
240Note, similar to the ``tests.failPoint`` field described in the `Transactions
241Spec Test format </source/transactions/tests/README.rst#test-format>`_ tests
242with ``useMultipleMongoses: true`` will not contain a ``configureFailPoint``
243operation.
244
245wait
246''''
247
248The "wait" operation instructs the test runner to sleep for "ms"
249milliseconds. For example::
250
251 - name: wait
252 object: testRunner
253 arguments:
254 ms: 1000
255
256waitForEvent
257''''''''''''
258
259The "waitForEvent" operation instructs the test runner to wait until the test's
260MongoClient has published a specific event a given number of times. For
261example, the following instructs the test runner to wait for at least one
262PoolClearedEvent to be published::
263
264 - name: waitForEvent
265 object: testRunner
266 arguments:
267 event: PoolClearedEvent
268 count: 1
269
270Note that "count" includes events that were published while running previous
271operations.
272
273If the "waitForEvent" operation is not satisfied after 10 seconds, the
274operation is considered an error.
275
276ServerMarkedUnknownEvent
277````````````````````````
278
279The ServerMarkedUnknownEvent may appear as an event in `waitForEvent`_ and
280`assertEventCount`_. This event is defined as ServerDescriptionChangedEvent
281where newDescription.type is ``Unknown``.
282
283assertEventCount
284''''''''''''''''
285
286The "assertEventCount" operation instructs the test runner to assert the test's
287MongoClient has published a specific event a given number of times. For
288example, the following instructs the test runner to assert that a single
289PoolClearedEvent was published::
290
291 - name: assertEventCount
292 object: testRunner
293 arguments:
294 event: PoolClearedEvent
295 count: 1
296
297recordPrimary
298'''''''''''''
299
300The "recordPrimary" operation instructs the test runner to record the current
301primary of the test's MongoClient. For example::
302
303 - name: recordPrimary
304 object: testRunner
305
306runAdminCommand
307'''''''''''''''
308
309The "runAdminCommand" operation instructs the test runner to run the given
310command on the admin database. Drivers MUST run this command on a different
311MongoClient from the one used for test operations. For example::
312
313 - name: runAdminCommand
314 object: testRunner
315 command_name: replSetFreeze
316 arguments:
317 command:
318 replSetFreeze: 0
319 readPreference:
320 mode: Secondary
321
322waitForPrimaryChange
323''''''''''''''''''''
324
325The "waitForPrimaryChange" operation instructs the test runner to wait up to
326"timeoutMS" milliseconds for the MongoClient to discover a new primary server.
327The new primary should be different from the one recorded by "recordPrimary".
328For example::
329
330 - name: waitForPrimaryChange
331 object: testRunner
332 arguments:
333 timeoutMS: 15000
334
335To implement, Drivers can subscribe to ServerDescriptionChangedEvents and wait
336for an event where newDescription.type is ``RSPrimary`` and the address is
337different from the one previously recorded by "recordPrimary".
338
339startThread
340'''''''''''
341
342The "startThread" operation instructs the test runner to start a new thread
343with the provided "name". The `runOnThread`_ and `waitForThread`_ operations
344reference a thread by its "name". For example::
345
346 - name: startThread
347 object: testRunner
348 arguments:
349 name: thread1
350
351runOnThread
352'''''''''''
353
354The "runOnThread" operation instructs the test runner to schedule an operation
355to be run on the given thread. runOnThread MUST NOT wait for the scheduled
356operation to complete. For example::
357
358 - name: runOnThread
359 object: testRunner
360 arguments:
361 name: thread1
362 operation:
363 name: insertOne
364 object: collection
365 arguments:
366 document:
367 _id: 2
368 error: true
369
370waitForThread
371'''''''''''''
372
373The "waitForThread" operation instructs the test runner to stop the given
374thread, wait for it to complete, and assert that the thread exited without
375any errors. For example::
376
377 - name: waitForThread
378 object: testRunner
379 arguments:
380 name: thread1
381
382Prose Tests
383-----------
384
385The following prose tests cannot be represented as spec tests and MUST be
386implemented.
387
388Streaming protocol Tests
389~~~~~~~~~~~~~~~~~~~~~~~~
390
391Drivers that implement the streaming protocol (multi-threaded or
392asynchronous drivers) must implement the following tests. Each test should be
393run against a standalone, replica set, and sharded cluster unless otherwise
394noted.
395
396Some of these cases should already be tested with the old protocol; in
397that case just verify the test cases succeed with the new protocol.
398
3991. Configure the client with heartbeatFrequencyMS set to 500,
400 overriding the default of 10000. Assert the client processes
401 hello and legacy hello replies more frequently (approximately every 500ms).
402
403RTT Tests
404~~~~~~~~~
405
406Run the following test(s) on MongoDB 4.4+.
407
4081. Test that RTT is continuously updated.
409
410 #. Create a client with ``heartbeatFrequencyMS=500``,
411 ``appName=streamingRttTest``, and subscribe to server events.
412
413 #. Run a find command to wait for the server to be discovered.
414
415 #. Sleep for 2 seconds. This must be long enough for multiple heartbeats
416 to succeed.
417
418 #. Assert that each ``ServerDescriptionChangedEvent`` includes a non-zero
419 RTT.
420
421 #. Configure the following failpoint to block hello or legacy hello commands
422 for 250ms which should add extra latency to each RTT check::
423
424 db.adminCommand({
425 configureFailPoint: "failCommand",
426 mode: {times: 1000},
427 data: {
428 failCommands: ["hello"], // or the legacy hello command
429 blockConnection: true,
430 blockTimeMS: 500,
431 appName: "streamingRttTest",
432 },
433 });
434
435 #. Wait for the server's RTT to exceed 250ms. Eventually the average RTT
436 should also exceed 500ms but we use 250ms to speed up the test. Note
437 that the `Server Description Equality`_ rule means that
438 ServerDescriptionChangedEvents will not be published. This test may
439 need to use a driver specific helper to obtain the latest RTT instead.
440 If the RTT does not exceed 250ms after 10 seconds, consider the test
441 failed.
442
443 #. Disable the failpoint::
444
445 db.adminCommand({
446 configureFailPoint: "failCommand",
447 mode: "off",
448 });
449
450.. Section for links.
451
452.. _Server Description Equality: /source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server-description-equality
View as plain text