...
1# TestSnapshotSucceedViaAppResp regression tests the situation in which a snap-
2# shot is sent to a follower at the most recent index (i.e. the snapshot index
3# is the leader's last index is the committed index). In that situation, a bug
4# in the past left the follower in probing status until the next log entry was
5# committed.
6#
7# See https://github.com/etcd-io/etcd/pull/10308 for additional background.
8
9# Turn off output during the setup of the test.
10log-level none
11----
12ok
13
14# Start with two nodes, but the config already has a third.
15add-nodes 2 voters=(1,2,3) index=10
16----
17ok
18
19campaign 1
20----
21ok
22
23# Fully replicate everything, including the leader's empty index.
24stabilize
25----
26ok (quiet)
27
28compact 1 11
29----
30ok (quiet)
31
32# Drop inflight messages to n3.
33deliver-msgs drop=(3)
34----
35ok (quiet)
36
37# Show the Raft log messages from now on.
38log-level debug
39----
40ok
41
42status 1
43----
441: StateReplicate match=11 next=12 inactive
452: StateReplicate match=11 next=12
463: StateProbe match=0 next=11 paused inactive
47
48# Add the node that will receive a snapshot (it has no state at all, does not
49# even have a config).
50add-nodes 1
51----
52INFO 3 switched to configuration voters=()
53INFO 3 became follower at term 0
54INFO newRaft 3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
55
56# Time passes on the leader so that it will try the previously missing follower
57# again.
58tick-heartbeat 1
59----
60ok
61
62process-ready 1
63----
64Ready MustSync=false:
65Messages:
661->2 MsgHeartbeat Term:1 Log:0/0 Commit:11
671->3 MsgHeartbeat Term:1 Log:0/0
68
69# Iterate until no more work is done by the new peer. It receives the heartbeat
70# and responds.
71stabilize 3
72----
73> 3 receiving messages
74 1->3 MsgHeartbeat Term:1 Log:0/0
75 INFO 3 [term: 0] received a MsgHeartbeat message with higher term from 1 [term: 1]
76 INFO 3 became follower at term 1
77> 3 handling Ready
78 Ready MustSync=true:
79 Lead:1 State:StateFollower
80 HardState Term:1 Commit:0
81 Messages:
82 3->1 MsgHeartbeatResp Term:1 Log:0/0
83
84# The leader in turn will realize that n3 needs a snapshot, which it initiates.
85stabilize 1
86----
87> 1 receiving messages
88 3->1 MsgHeartbeatResp Term:1 Log:0/0
89 DEBUG 1 [firstindex: 12, commit: 11] sent snapshot[index: 11, term: 1] to 3 [StateProbe match=0 next=11]
90 DEBUG 1 paused sending replication messages to 3 [StateSnapshot match=0 next=11 paused pendingSnap=11]
91> 1 handling Ready
92 Ready MustSync=false:
93 Messages:
94 1->3 MsgSnap Term:1 Log:0/0 Snapshot: Index:11 Term:1 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
95
96status 1
97----
981: StateReplicate match=11 next=12 inactive
992: StateReplicate match=11 next=12
1003: StateSnapshot match=0 next=11 paused pendingSnap=11
101
102# Follower applies the snapshot. Note how it reacts with a MsgAppResp upon completion.
103# The snapshot fully catches the follower up (i.e. there are no more log entries it
104# needs to apply after). The bug was that the leader failed to realize that the follower
105# was now fully caught up.
106stabilize 3
107----
108> 3 receiving messages
109 1->3 MsgSnap Term:1 Log:0/0 Snapshot: Index:11 Term:1 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
110 INFO log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 11, term: 1]
111 INFO 3 switched to configuration voters=(1 2 3)
112 INFO 3 [commit: 11, lastindex: 11, lastterm: 1] restored snapshot [index: 11, term: 1]
113 INFO 3 [commit: 11] restored snapshot [index: 11, term: 1]
114> 3 handling Ready
115 Ready MustSync=false:
116 HardState Term:1 Commit:11
117 Snapshot Index:11 Term:1 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
118 Messages:
119 3->1 MsgAppResp Term:1 Log:0/11
120
121# The MsgAppResp lets the leader move the follower back to replicating state.
122# Leader sends another MsgAppResp, to communicate the updated commit index.
123stabilize 1
124----
125> 1 receiving messages
126 3->1 MsgAppResp Term:1 Log:0/11
127 DEBUG 1 recovered from needing snapshot, resumed sending replication messages to 3 [StateSnapshot match=11 next=12 paused pendingSnap=11]
128> 1 handling Ready
129 Ready MustSync=false:
130 Messages:
131 1->3 MsgApp Term:1 Log:1/11 Commit:11
132
133status 1
134----
1351: StateReplicate match=11 next=12 inactive
1362: StateReplicate match=11 next=12
1373: StateReplicate match=11 next=12
138
139# Let things settle.
140stabilize
141----
142> 2 receiving messages
143 1->2 MsgHeartbeat Term:1 Log:0/0 Commit:11
144> 3 receiving messages
145 1->3 MsgApp Term:1 Log:1/11 Commit:11
146> 2 handling Ready
147 Ready MustSync=false:
148 Messages:
149 2->1 MsgHeartbeatResp Term:1 Log:0/0
150> 3 handling Ready
151 Ready MustSync=false:
152 Messages:
153 3->1 MsgAppResp Term:1 Log:0/11
154> 1 receiving messages
155 2->1 MsgHeartbeatResp Term:1 Log:0/0
156 3->1 MsgAppResp Term:1 Log:0/11
View as plain text