1krammar: kafka grammar
2=======
3
4- [Comments, field versioning](#comments,_field_versioning)
5- [Types](#types)
6 - [Primitives](#primitives)
7 - [Boolean and numeric](#boolean_and_numeric)
8 - [Text](#text)
9 - [Complex types](#complex_types)
10 - [Arrays](#arrays)
11 - [Structs](#structs)
12- [Named struct modifiers](#named_struct_modifiers)
13- [Miscellaneous](#miscellaneous)
14
15Comments, field versioning
16--------
17
18All comments begin with `//`.
19There are two types of comments: struct/field comments and field version comments.
20
21**Struct** or **field** comments are above what they are commenting:
22
23```
24// FooRequest is a foo request.
25FooRequest =>
26 // FooField is a string.
27 FooField: string
28```
29
30**Field version** comments follow the field being specified:
31
32```
33FooRequest =>
34 FooField: string // v1+
35```
36
37Version comments must be of the form `v\d+(\+|\-v\d+)`:
38
39```
40v1+
41v8-v9
42```
43
44If a version comment applies to a struct field,
45it is implied that all fields within that struct are bounded
46by the overarching struct field version constraints:
47
48```
49FooRequest =>
50 // All fields in FooField are bounded between version 3 and 5.
51 FooField: [=>] // v3-v5
52 Biz: int8
53 Bar: int8
54 Baz: int8
55```
56
57Fields that do not have version comments
58are valid for all versions of a struct.
59
60Types
61-----
62
63Kafka has a few basic primitive types and some more confusing, one off types.
64
65### Primitives
66
67#### Boolean and numeric
68
69```
70bool one byte; 0 == false
71int8 one byte, signed
72int16 two bytes, signed big endian
73int32 four bytes, signed big endian
74int64 eight bytes, signed big endian
75uint32 four bytes, unsigned big endian
76varint variadic bytes of an int32 (one to five) using protocol buffer encoding
77varlong variadic bytes of an int64 (one to ten) using protocol buffer encoding
78```
79
80#### Text
81
82```
83bytes int32 size specifier followed by that many bytes
84nullable-bytes like bytes, but a negative size signifies null
85varint-bytes like nullable-bytes, but with a varint size specifier
86string int16 size specifier followed by that many bytes of a string
87nullable-string like string, but a negative size signifies a null string
88varint-string like nullable-string, but with a varint size specifier
89```
90
91### Complex types
92
93#### Arrays
94
95There are three types of arrays: normal, nullable, and varint length.
96Normal arrays have int32 size specifiers;
97nullable arrays allow the size specifier to be negative while varint arrays allow the size to be a varint.
98Arrays are specified wrapping a type.
99If the type is an inner unnamed struct, the array is specified as [=>],
100where the next lines continue specifying the struct.
101If the inner struct is not anonymous, the struct must have been already defined.
102
103Examples:
104
105```
106[bool] an array of booleans
107nullable[int8] a nullable array of int8s
108varint[Header] a variadic length array containing Header structs
109[=>] an array of anonymous structs that are specified on the following lines
110```
111
112#### Structs
113
114There are two types of structs: named structs and anonymous structs.
115Named structs exist as a top level definition and have a name.
116Anonymous structs exist anonymously inside a top level struct and do not have a name.
117However, parsers may name anonymous structs themselves so that users can more easily create types.
118Named structs have potential modifiers to change how they are parsed.
119
120Only structs are allowed in the `DEFINITIONS` file;
121there is no primitive nor array at the top level.
122Struct fields can be anything, including other struct fields.
123It is impossible for a struct definition to be circular.
124
125Spacing in struct fields is important:
126all fields must have the same amount of spacing.
127Inner structs must nest their fields with more spacing.
128To stop defining an inner struct and continue defining the outer struct,
129simply drop to the outer structs pacing level.
130
131If a struct field is an anonymous array,
132the `[=>]` may be followed with a singular name hint.
133This is to aid generators that do want to de-anonymize the anonymous array.
134
135Top level structs do not have a colon following their name,
136but all struct fields do have a colon following the field names.
137
138Examples:
139```
140SimpleType =>
141 Field1: int8
142
143FooRequest =>
144 Field1: string
145 Field2: int8
146 Field3Pluralies: [=>]Field3Pluraly
147 InnerField1: int8
148 InnerField2: nullable-bytes
149 InnerField3: [SimpleType]
150 InnerField4s: [=>]
151 InnerInner: string
152 InnerField5: =>
153 SuperIn: varint
154 Field4: int8
155```
156
157The above shows two type definitions,
158the second with multiple nested inner structs.
159Two of the nested structs are anonymous and in arrays,
160one is named and in an array,
161and the other is anonymous but not in an array.
162One of the anonymous array fields (`Field3Pluralies`) has a name hint.
163
164There is one special type of struct field: `length-field-minus`.
165This field is raw bytes whose size is determined from another field
166earlier in the struct.
167That is, if there exists a field `Length` in the struct,
168and then some other fields,
169and then finally the `length-field-minus` field `Foo`,
170the `Foo` field can be specified as `Foo: length-field-minus => Length - 49`
171signifying that `Foo` is a struct that is encoded as raw bytes of length `Length` minus 49.
172
173The only spot this is used is in the `RecordBatch` field,
174and mostly for a historic reason of the length field being
175defined separately from the message set in Kafka's original message set encoding.
176
177Named struct modifiers
178----------------------
179
180Named structs have a few potential modifiers.
181
182If a struct is not top level,
183meaning it is not a request or response type struct,
184the first modifier must be `not top level`.
185These non-top-level structs can be used to define
186structs that are embedded in requests or responses
187that are commonly used / more special than anonymous structs.
188For example, a `Record`, `RecordBatch`, or `StickyMemberMetadata`.
189
190For "not top level" structs, there are two other potential mutually exclusive modifiers:
191
1921) `, with version field` indicates that the struct contains a Version field.
193If a version field exists, it must be the first field.
194This modifier allows non-top-level structs to still have proper version fields.
1952) `, no encoding` indicates that encode and decode functions should not be generated for this struct.
196
197
198If a struct is top level,
199if the struct is a request type,
200the first two modifiers in order must be `key` and `max version`.
201These correspond to the request key and max version of that request supported.
202A top level request type also supports the modifiers
203`admin`, `group coordinator`, or `txn coordinator`.
204These three modifiers signify whether the request needs to go to the current
205coordinator,
206group coordinator,
207or transaction coordinator.
208
209If the request is a response type,
210there must be no modifier,
211and the response type must have the same name (`s/Request/Response/`)
212as the request it is for,
213and it must follow the request in the `DEFINITIONS` file.
214This is required to enforce that responses are tied to their requests appropriately.
215
216Examples:
217```
218Record => not top level
219 Key: bytes
220 Value: bytes
221
222ProduceRequest => key 0, max version 10
223 Records: [Record]
224
225ProduceResponse =>
226 ErrorCode: int16
227
228DeleteTopicRequest => key 1, max version 2, admin
229 Topic: string
230
231DeleteTopicResponse =>
232 ErrorCode: int16
233
234JoinGroupRequest => key 2, max version 3, group coordinator
235 GroupID: string
236
237JoinGroupResponse =>
238 ErrorCode: int16
239
240TxnCommitRequest => key 3, max version 8, txn coordinator
241 TxnID: string
242
243TxnCommitResponse =>
244 ErrorCode: int16
245```
246
247Miscellaneous
248-------------
249
250- Spacing is important.
251- Between every field name and type, there must be only one space.
252- Between every word / symbol / number, there must be only one space.
253- Lines cannot have trailing spaces.
254- Internal struct fields must be nested two more spaces than the encompassing struct.
255- There must be one blank line between type definitions.
View as plain text