WhitePaper.md

Documentation: github.com/google/flatbuffers/docs/source

     1FlatBuffers white paper    {#flatbuffers_white_paper}
     2=======================
     3
     4This document tries to shed some light on to the "why" of FlatBuffers, a
     5new serialization library.
     6
     7## Motivation
     8
     9Back in the good old days, performance was all about instructions and
    10cycles. Nowadays, processing units have run so far ahead of the memory
    11subsystem, that making an efficient application should start and finish
    12with thinking about memory. How much you use of it. How you lay it out
    13and access it. How you allocate it. When you copy it.
    14
    15Serialization is a pervasive activity in a lot programs, and a common
    16source of memory inefficiency, with lots of temporary data structures
    17needed to parse and represent data, and inefficient allocation patterns
    18and locality.
    19
    20If it would be possible to do serialization with no temporary objects,
    21no additional allocation, no copying, and good locality, this could be
    22of great value. The reason serialization systems usually don't manage
    23this is because it goes counter to forwards/backwards compatibility, and
    24platform specifics like endianness and alignment.
    25
    26FlatBuffers is what you get if you try anyway.
    27
    28In particular, FlatBuffers focus is on mobile hardware (where memory
    29size and memory bandwidth is even more constrained than on desktop
    30hardware), and applications that have the highest performance needs:
    31games.
    32
    33## FlatBuffers
    34
    35*This is a summary of FlatBuffers functionality, with some rationale.
    36A more detailed description can be found in the FlatBuffers
    37documentation.*
    38
    39### Summary
    40
    41A FlatBuffer is a binary buffer containing nested objects (structs,
    42tables, vectors,..) organized using offsets so that the data can be
    43traversed in-place just like any pointer-based data structure. Unlike
    44most in-memory data structures however, it uses strict rules of
    45alignment and endianness (always little) to ensure these buffers are
    46cross platform. Additionally, for objects that are tables, FlatBuffers
    47provides forwards/backwards compatibility and general optionality of
    48fields, to support most forms of format evolution.
    49
    50You define your object types in a schema, which can then be compiled to
    51C++ or Java for low to zero overhead reading & writing.
    52Optionally, JSON data can be dynamically parsed into buffers.
    53
    54### Tables
    55
    56Tables are the cornerstone of FlatBuffers, since format evolution is
    57essential for most applications of serialization. Typically, dealing
    58with format changes is something that can be done transparently during
    59the parsing process of most serialization solutions out there.
    60But a FlatBuffer isn't parsed before it is accessed.
    61
    62Tables get around this by using an extra indirection to access fields,
    63through a *vtable*. Each table comes with a vtable (which may be shared
    64between multiple tables with the same layout), and contains information
    65where fields for this particular kind of instance of vtable are stored.
    66The vtable may also indicate that the field is not present (because this
    67FlatBuffer was written with an older version of the software, or simply
    68because the information was not necessary for this instance, or deemed
    69deprecated), in which case a default value is returned.
    70
    71Tables have a low overhead in memory (since vtables are small and
    72shared) and in access cost (an extra indirection), but provide great
    73flexibility. Tables may even cost less memory than the equivalent
    74struct, since fields do not need to be stored when they are equal to
    75their default.
    76
    77FlatBuffers additionally offers "naked" structs, which do not offer
    78forwards/backwards compatibility, but can be even smaller (useful for
    79very small objects that are unlikely to change, like e.g. a coordinate
    80pair or a RGBA color).
    81
    82### Schemas
    83
    84While schemas reduce some generality (you can't just read any data
    85without having its schema), they have a lot of upsides:
    86
    87-   Most information about the format can be factored into the generated
    88    code, reducing memory needed to store data, and time to access it.
    89
    90-   The strong typing of the data definitions means less error
    91    checking/handling at runtime (less can go wrong).
    92
    93-   A schema enables us to access a buffer without parsing.
    94
    95FlatBuffer schemas are fairly similar to those of the incumbent,
    96Protocol Buffers, and generally should be readable to those familiar
    97with the C family of languages. We chose to improve upon the features
    98offered by .proto files in the following ways:
    99
   100-   Deprecation of fields instead of manual field id assignment.
   101    Extending an object in a .proto means hunting for a free slot among
   102    the numbers (preferring lower numbers since they have a more compact
   103    representation). Besides being inconvenient, it also makes removing
   104    fields problematic: you either have to keep them, not making it
   105    obvious that this field shouldn't be read/written anymore, and still
   106    generating accessors. Or you remove it, but now you risk that
   107    there's still old data around that uses that field by the time
   108    someone reuses that field id, with nasty consequences.
   109
   110-   Differentiating between tables and structs (see above). Effectively
   111    all table fields are `optional`, and all struct fields are
   112    `required`.
   113
   114-   Having a native vector type instead of `repeated`. This gives you a
   115    length without having to collect all items, and in the case of
   116    scalars provides for a more compact representation, and one that
   117    guarantees adjacency.
   118
   119-   Having a native `union` type instead of using a series of `optional`
   120    fields, all of which must be checked individually.
   121
   122-   Being able to define defaults for all scalars, instead of having to
   123    deal with their optionality at each access.
   124
   125-   A parser that can deal with both schemas and data definitions (JSON
   126    compatible) uniformly.
   127
   128<br>
View as plain text