README.md

Documentation: github.com/klauspost/compress/s2

     1# S2 Compression
     2
     3S2 is an extension of [Snappy](https://github.com/google/snappy).
     4
     5S2 is aimed for high throughput, which is why it features concurrent compression for bigger payloads.
     6
     7Decoding is compatible with Snappy compressed content, but content compressed with S2 cannot be decompressed by Snappy.
     8This means that S2 can seamlessly replace Snappy without converting compressed content.
     9
    10S2 can produce Snappy compatible output, faster and better than Snappy.
    11If you want full benefit of the changes you should use s2 without Snappy compatibility. 
    12
    13S2 is designed to have high throughput on content that cannot be compressed.
    14This is important, so you don't have to worry about spending CPU cycles on already compressed data. 
    15
    16## Benefits over Snappy
    17
    18* Better compression
    19* Adjustable compression (3 levels) 
    20* Concurrent stream compression
    21* Faster decompression, even for Snappy compatible content
    22* Concurrent Snappy/S2 stream decompression
    23* Skip forward in compressed stream
    24* Random seeking with indexes
    25* Compatible with reading Snappy compressed content
    26* Smaller block size overhead on incompressible blocks
    27* Block concatenation
    28* Block Dictionary support
    29* Uncompressed stream mode
    30* Automatic stream size padding
    31* Snappy compatible block compression
    32
    33## Drawbacks over Snappy
    34
    35* Not optimized for 32 bit systems
    36* Streams use slightly more memory due to larger blocks and concurrency (configurable)
    37
    38# Usage
    39
    40Installation: `go get -u github.com/klauspost/compress/s2`
    41
    42Full package documentation:
    43 
    44[![godoc][1]][2]
    45
    46[1]: https://godoc.org/github.com/klauspost/compress?status.svg
    47[2]: https://godoc.org/github.com/klauspost/compress/s2
    48
    49## Compression
    50
    51```Go
    52func EncodeStream(src io.Reader, dst io.Writer) error {
    53    enc := s2.NewWriter(dst)
    54    _, err := io.Copy(enc, src)
    55    if err != nil {
    56        enc.Close()
    57        return err
    58    }
    59    // Blocks until compression is done.
    60    return enc.Close() 
    61}
    62```
    63
    64You should always call `enc.Close()`, otherwise you will leak resources and your encode will be incomplete.
    65
    66For the best throughput, you should attempt to reuse the `Writer` using the `Reset()` method.
    67
    68The Writer in S2 is always buffered, therefore `NewBufferedWriter` in Snappy can be replaced with `NewWriter` in S2.
    69It is possible to flush any buffered data using the `Flush()` method. 
    70This will block until all data sent to the encoder has been written to the output.
    71
    72S2 also supports the `io.ReaderFrom` interface, which will consume all input from a reader.
    73
    74As a final method to compress data, if you have a single block of data you would like to have encoded as a stream,
    75a slightly more efficient method is to use the `EncodeBuffer` method.
    76This will take ownership of the buffer until the stream is closed.
    77
    78```Go
    79func EncodeStream(src []byte, dst io.Writer) error {
    80    enc := s2.NewWriter(dst)
    81    // The encoder owns the buffer until Flush or Close is called.
    82    err := enc.EncodeBuffer(buf)
    83    if err != nil {
    84        enc.Close()
    85        return err
    86    }
    87    // Blocks until compression is done.
    88    return enc.Close()
    89}
    90```
    91
    92Each call to `EncodeBuffer` will result in discrete blocks being created without buffering, 
    93so it should only be used a single time per stream.
    94If you need to write several blocks, you should use the regular io.Writer interface.
    95
    96
    97## Decompression
    98
    99```Go
   100func DecodeStream(src io.Reader, dst io.Writer) error {
   101    dec := s2.NewReader(src)
   102    _, err := io.Copy(dst, dec)
   103    return err
   104}
   105```
   106
   107Similar to the Writer, a Reader can be reused using the `Reset` method.
   108
   109For the best possible throughput, there is a `EncodeBuffer(buf []byte)` function available.
   110However, it requires that the provided buffer isn't used after it is handed over to S2 and until the stream is flushed or closed.  
   111
   112For smaller data blocks, there is also a non-streaming interface: `Encode()`, `EncodeBetter()` and `Decode()`.
   113Do however note that these functions (similar to Snappy) does not provide validation of data, 
   114so data corruption may be undetected. Stream encoding provides CRC checks of data.
   115
   116It is possible to efficiently skip forward in a compressed stream using the `Skip()` method. 
   117For big skips the decompressor is able to skip blocks without decompressing them.
   118
   119## Single Blocks
   120
   121Similar to Snappy S2 offers single block compression. 
   122Blocks do not offer the same flexibility and safety as streams,
   123but may be preferable for very small payloads, less than 100K.
   124
   125Using a simple `dst := s2.Encode(nil, src)` will compress `src` and return the compressed result. 
   126It is possible to provide a destination buffer. 
   127If the buffer has a capacity of `s2.MaxEncodedLen(len(src))` it will be used. 
   128If not a new will be allocated. 
   129
   130Alternatively `EncodeBetter`/`EncodeBest` can also be used for better, but slightly slower compression.
   131
   132Similarly to decompress a block you can use `dst, err := s2.Decode(nil, src)`. 
   133Again an optional destination buffer can be supplied. 
   134The `s2.DecodedLen(src)` can be used to get the minimum capacity needed. 
   135If that is not satisfied a new buffer will be allocated.
   136
   137Block function always operate on a single goroutine since it should only be used for small payloads.
   138
   139# Commandline tools
   140
   141Some very simply commandline tools are provided; `s2c` for compression and `s2d` for decompression.
   142
   143Binaries can be downloaded on the [Releases Page](https://github.com/klauspost/compress/releases).
   144
   145Installing then requires Go to be installed. To install them, use:
   146
   147`go install github.com/klauspost/compress/s2/cmd/s2c@latest && go install github.com/klauspost/compress/s2/cmd/s2d@latest`
   148
   149To build binaries to the current folder use:
   150
   151`go build github.com/klauspost/compress/s2/cmd/s2c && go build github.com/klauspost/compress/s2/cmd/s2d`
   152
   153
   154## s2c
   155
   156```
   157Usage: s2c [options] file1 file2
   158
   159Compresses all files supplied as input separately.
   160Output files are written as 'filename.ext.s2' or 'filename.ext.snappy'.
   161By default output files will be overwritten.
   162Use - as the only file name to read from stdin and write to stdout.
   163
   164Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
   165Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
   166
   167File names beginning with 'http://' and 'https://' will be downloaded and compressed.
   168Only http response code 200 is accepted.
   169
   170Options:
   171  -bench int
   172    	Run benchmark n times. No output will be written
   173  -blocksize string
   174    	Max  block size. Examples: 64K, 256K, 1M, 4M. Must be power of two and <= 4MB (default "4M")
   175  -c	Write all output to stdout. Multiple input files will be concatenated
   176  -cpu int
   177    	Compress using this amount of threads (default 32)
   178  -faster
   179    	Compress faster, but with a minor compression loss
   180  -help
   181    	Display help
   182  -index
   183        Add seek index (default true)    	
   184  -o string
   185        Write output to another file. Single input file only
   186  -pad string
   187    	Pad size to a multiple of this value, Examples: 500, 64K, 256K, 1M, 4M, etc (default "1")
   188  -q	Don't write any output to terminal, except errors
   189  -rm
   190    	Delete source file(s) after successful compression
   191  -safe
   192    	Do not overwrite output files
   193  -slower
   194    	Compress more, but a lot slower
   195  -snappy
   196        Generate Snappy compatible output stream
   197  -verify
   198    	Verify written files  
   199
   200```
   201
   202## s2d
   203
   204```
   205Usage: s2d [options] file1 file2
   206
   207Decompresses all files supplied as input. Input files must end with '.s2' or '.snappy'.
   208Output file names have the extension removed. By default output files will be overwritten.
   209Use - as the only file name to read from stdin and write to stdout.
   210
   211Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
   212Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
   213
   214File names beginning with 'http://' and 'https://' will be downloaded and decompressed.
   215Extensions on downloaded files are ignored. Only http response code 200 is accepted.
   216
   217Options:
   218  -bench int
   219    	Run benchmark n times. No output will be written
   220  -c	Write all output to stdout. Multiple input files will be concatenated
   221  -help
   222    	Display help
   223  -o string
   224        Write output to another file. Single input file only
   225  -offset string
   226        Start at offset. Examples: 92, 64K, 256K, 1M, 4M. Requires Index
   227  -q    Don't write any output to terminal, except errors
   228  -rm
   229        Delete source file(s) after successful decompression
   230  -safe
   231        Do not overwrite output files
   232  -tail string
   233        Return last of compressed file. Examples: 92, 64K, 256K, 1M, 4M. Requires Index
   234  -verify
   235    	Verify files, but do not write output                                      
   236```
   237
   238## s2sx: self-extracting archives
   239
   240s2sx allows creating self-extracting archives with no dependencies.
   241
   242By default, executables are created for the same platforms as the host os, 
   243but this can be overridden with `-os` and `-arch` parameters.
   244
   245Extracted files have 0666 permissions, except when untar option used.
   246
   247```
   248Usage: s2sx [options] file1 file2
   249
   250Compresses all files supplied as input separately.
   251If files have '.s2' extension they are assumed to be compressed already.
   252Output files are written as 'filename.s2sx' and with '.exe' for windows targets.
   253If output is big, an additional file with ".more" is written. This must be included as well.
   254By default output files will be overwritten.
   255
   256Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
   257Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
   258
   259Options:
   260  -arch string
   261        Destination architecture (default "amd64")
   262  -c    Write all output to stdout. Multiple input files will be concatenated
   263  -cpu int
   264        Compress using this amount of threads (default 32)
   265  -help
   266        Display help
   267  -max string
   268        Maximum executable size. Rest will be written to another file. (default "1G")
   269  -os string
   270        Destination operating system (default "windows")
   271  -q    Don't write any output to terminal, except errors
   272  -rm
   273        Delete source file(s) after successful compression
   274  -safe
   275        Do not overwrite output files
   276  -untar
   277        Untar on destination
   278```
   279
   280Available platforms are:
   281
   282 * darwin-amd64
   283 * darwin-arm64
   284 * linux-amd64
   285 * linux-arm
   286 * linux-arm64
   287 * linux-mips64
   288 * linux-ppc64le
   289 * windows-386
   290 * windows-amd64                                                                             
   291
   292By default, there is a size limit of 1GB for the output executable.
   293
   294When this is exceeded the remaining file content is written to a file called
   295output+`.more`. This file must be included for a successful extraction and 
   296placed alongside the executable for a successful extraction.
   297
   298This file *must* have the same name as the executable, so if the executable is renamed, 
   299so must the `.more` file. 
   300
   301This functionality is disabled with stdin/stdout. 
   302
   303### Self-extracting TAR files
   304
   305If you wrap a TAR file you can specify `-untar` to make it untar on the destination host.
   306
   307Files are extracted to the current folder with the path specified in the tar file.
   308
   309Note that tar files are not validated before they are wrapped.
   310
   311For security reasons files that move below the root folder are not allowed.
   312
   313# Performance
   314
   315This section will focus on comparisons to Snappy. 
   316This package is solely aimed at replacing Snappy as a high speed compression package.
   317If you are mainly looking for better compression [zstandard](https://github.com/klauspost/compress/tree/master/zstd#zstd)
   318gives better compression, but typically at speeds slightly below "better" mode in this package.
   319
   320Compression is increased compared to Snappy, mostly around 5-20% and the throughput is typically 25-40% increased (single threaded) compared to the Snappy Go implementation.
   321
   322Streams are concurrently compressed. The stream will be distributed among all available CPU cores for the best possible throughput.
   323
   324A "better" compression mode is also available. This allows to trade a bit of speed for a minor compression gain.
   325The content compressed in this mode is fully compatible with the standard decoder.
   326
   327Snappy vs S2 **compression** speed on 16 core (32 thread) computer, using all threads and a single thread (1 CPU):
   328
   329| File                                                                                                    | S2 Speed | S2 Throughput | S2 % smaller | S2 "better" | "better" throughput | "better" % smaller |
   330|---------------------------------------------------------------------------------------------------------|----------|---------------|--------------|-------------|---------------------|--------------------|
   331| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                        | 16.33x   | 10556 MB/s    | 8.0%         | 6.04x       | 5252 MB/s           | 14.7%              |
   332| (1 CPU)                                                                                                 | 1.08x    | 940 MB/s      | -            | 0.46x       | 400 MB/s            | -                  |
   333| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst)     | 16.51x   | 15224 MB/s    | 31.70%       | 9.47x       | 8734 MB/s           | 37.71%             |
   334| (1 CPU)                                                                                                 | 1.26x    | 1157 MB/s     | -            | 0.60x       | 556 MB/s            | -                  |
   335| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)             | 15.14x   | 12598 MB/s    | -5.76%       | 6.23x       | 5675 MB/s           | 3.62%              |
   336| (1 CPU)                                                                                                 | 1.02x    | 932 MB/s      | -            | 0.47x       | 432 MB/s            | -                  |
   337| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                         | 11.21x   | 12116 MB/s    | 15.95%       | 3.24x       | 3500 MB/s           | 18.00%             |
   338| (1 CPU)                                                                                                 | 1.05x    | 1135 MB/s     | -            | 0.27x       | 292 MB/s            | -                  |
   339| [apache.log](https://files.klauspost.com/compress/apache.log.zst)                                       | 8.55x    | 16673 MB/s    | 20.54%       | 5.85x       | 11420 MB/s          | 24.97%             |
   340| (1 CPU)                                                                                                 | 1.91x    | 1771 MB/s     | -            | 0.53x       | 1041 MB/s           | -                  |
   341| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                        | 15.76x   | 14357 MB/s    | 24.01%       | 8.67x       | 7891 MB/s           | 33.68%             |
   342| (1 CPU)                                                                                                 | 1.17x    | 1064 MB/s     | -            | 0.65x       | 595 MB/s            | -                  |
   343| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                         | 13.33x   | 9835 MB/s     | 2.34%        | 6.85x       | 4863 MB/s           | 9.96%              |
   344| (1 CPU)                                                                                                 | 0.97x    | 689 MB/s      | -            | 0.55x       | 387 MB/s            | -                  |
   345| sharnd.out.2gb                                                                                          | 9.11x    | 13213 MB/s    | 0.01%        | 1.49x       | 9184 MB/s           | 0.01%              |
   346| (1 CPU)                                                                                                 | 0.88x    | 5418 MB/s     | -            | 0.77x       | 5417 MB/s           | -                  |
   347| [sofia-air-quality-dataset csv](https://files.klauspost.com/compress/sofia-air-quality-dataset.tar.zst) | 22.00x   | 11477 MB/s    | 18.73%       | 11.15x      | 5817 MB/s           | 27.88%             |
   348| (1 CPU)                                                                                                 | 1.23x    | 642 MB/s      | -            | 0.71x       | 642 MB/s            | -                  |
   349| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                        | 11.23x   | 6520 MB/s     | 5.9%         | 5.35x       | 3109 MB/s           | 15.88%             |
   350| (1 CPU)                                                                                                 | 1.05x    | 607 MB/s      | -            | 0.52x       | 304 MB/s            | -                  |
   351| [enwik9](https://files.klauspost.com/compress/enwik9.zst)                                               | 19.28x   | 8440 MB/s     | 4.04%        | 9.31x       | 4076 MB/s           | 18.04%             |
   352| (1 CPU)                                                                                                 | 1.12x    | 488 MB/s      | -            | 0.57x       | 250 MB/s            | -                  |
   353
   354### Legend
   355
   356* `S2 Speed`: Speed of S2 compared to Snappy, using 16 cores and 1 core.
   357* `S2 Throughput`: Throughput of S2 in MB/s. 
   358* `S2 % smaller`: How many percent of the Snappy output size is S2 better.
   359* `S2 "better"`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
   360* `"better" throughput`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
   361* `"better" % smaller`: How many percent of the Snappy output size is S2 better when using "better" compression.
   362
   363There is a good speedup across the board when using a single thread and a significant speedup when using multiple threads.
   364
   365Machine generated data gets by far the biggest compression boost, with size being reduced by up to 35% of Snappy size.
   366
   367The "better" compression mode sees a good improvement in all cases, but usually at a performance cost.
   368
   369Incompressible content (`sharnd.out.2gb`, 2GB random data) sees the smallest speedup. 
   370This is likely dominated by synchronization overhead, which is confirmed by the fact that single threaded performance is higher (see above). 
   371
   372## Decompression
   373
   374S2 attempts to create content that is also fast to decompress, except in "better" mode where the smallest representation is used.
   375
   376S2 vs Snappy **decompression** speed. Both operating on single core:
   377
   378| File                                                                                                | S2 Throughput | vs. Snappy | Better Throughput | vs. Snappy |
   379|-----------------------------------------------------------------------------------------------------|---------------|------------|-------------------|------------|
   380| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                    | 2117 MB/s     | 1.14x      | 1738 MB/s         | 0.94x      |
   381| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst) | 2401 MB/s     | 1.25x      | 2307 MB/s         | 1.20x      |
   382| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)         | 2075 MB/s     | 0.98x      | 1764 MB/s         | 0.83x      |
   383| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                     | 2967 MB/s     | 1.05x      | 2885 MB/s         | 1.02x      |
   384| [adresser.json](https://files.klauspost.com/compress/adresser.json.zst)                             | 4141 MB/s     | 1.07x      | 4184 MB/s         | 1.08x      |
   385| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                    | 2264 MB/s     | 1.12x      | 2185 MB/s         | 1.08x      |
   386| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                     | 1525 MB/s     | 1.03x      | 1347 MB/s         | 0.91x      |
   387| sharnd.out.2gb                                                                                      | 3813 MB/s     | 0.79x      | 3900 MB/s         | 0.81x      |
   388| [enwik9](http://mattmahoney.net/dc/textdata.html)                                                   | 1246 MB/s     | 1.29x      | 967 MB/s          | 1.00x      |
   389| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                    | 1433 MB/s     | 1.12x      | 1203 MB/s         | 0.94x      |
   390| [enwik10](https://encode.su/threads/3315-enwik10-benchmark-results)                                 | 1284 MB/s     | 1.32x      | 1010 MB/s         | 1.04x      |
   391
   392### Legend
   393
   394* `S2 Throughput`: Decompression speed of S2 encoded content.
   395* `Better Throughput`: Decompression speed of S2 "better" encoded content.
   396* `vs Snappy`: Decompression speed of S2 "better" mode compared to Snappy and absolute speed.
   397
   398
   399While the decompression code hasn't changed, there is a significant speedup in decompression speed. 
   400S2 prefers longer matches and will typically only find matches that are 6 bytes or longer. 
   401While this reduces compression a bit, it improves decompression speed.
   402
   403The "better" compression mode will actively look for shorter matches, which is why it has a decompression speed quite similar to Snappy.   
   404
   405Without assembly decompression is also very fast; single goroutine decompression speed. No assembly:
   406
   407| File                           | S2 Throughput | S2 throughput |
   408|--------------------------------|---------------|---------------|
   409| consensus.db.10gb.s2           | 1.84x         | 2289.8 MB/s   |
   410| 10gb.tar.s2                    | 1.30x         | 867.07 MB/s   |
   411| rawstudio-mint14.tar.s2        | 1.66x         | 1329.65 MB/s  |
   412| github-june-2days-2019.json.s2 | 2.36x         | 1831.59 MB/s  |
   413| github-ranks-backup.bin.s2     | 1.73x         | 1390.7 MB/s   |
   414| enwik9.s2                      | 1.67x         | 681.53 MB/s   |
   415| adresser.json.s2               | 3.41x         | 4230.53 MB/s  |
   416| silesia.tar.s2                 | 1.52x         | 811.58        |
   417
   418Even though S2 typically compresses better than Snappy, decompression speed is always better. 
   419
   420### Concurrent Stream Decompression
   421
   422For full stream decompression S2 offers a [DecodeConcurrent](https://pkg.go.dev/github.com/klauspost/compress/s2#Reader.DecodeConcurrent) 
   423that will decode a full stream using multiple goroutines.
   424
   425Example scaling, AMD Ryzen 3950X, 16 cores, decompression using `s2d -bench=3 <input>`, best of 3: 
   426
   427| Input                                     | `-cpu=1`   | `-cpu=2`   | `-cpu=4`   | `-cpu=8`   | `-cpu=16`   |
   428|-------------------------------------------|------------|------------|------------|------------|-------------|
   429| enwik10.snappy                            | 1098.6MB/s | 1819.8MB/s | 3625.6MB/s | 6910.6MB/s | 10818.2MB/s |
   430| enwik10.s2                                | 1303.5MB/s | 2606.1MB/s | 4847.9MB/s | 8878.4MB/s | 9592.1MB/s  |
   431| sofia-air-quality-dataset.tar.snappy      | 1302.0MB/s | 2165.0MB/s | 4244.5MB/s | 8241.0MB/s | 12920.5MB/s |
   432| sofia-air-quality-dataset.tar.s2          | 1399.2MB/s | 2463.2MB/s | 5196.5MB/s | 9639.8MB/s | 11439.5MB/s |
   433| sofia-air-quality-dataset.tar.s2 (no asm) | 837.5MB/s  | 1652.6MB/s | 3183.6MB/s | 5945.0MB/s | 9620.7MB/s  |
   434
   435Scaling can be expected to be pretty linear until memory bandwidth is saturated. 
   436
   437For now the DecodeConcurrent can only be used for full streams without seeking or combining with regular reads.
   438
   439## Block compression
   440
   441
   442When compressing blocks no concurrent compression is performed just as Snappy. 
   443This is because blocks are for smaller payloads and generally will not benefit from concurrent compression.
   444
   445An important change is that incompressible blocks will not be more than at most 10 bytes bigger than the input.
   446In rare, worst case scenario Snappy blocks could be significantly bigger than the input.  
   447
   448### Mixed content blocks
   449
   450The most reliable is a wide dataset. 
   451For this we use [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
   45253927 files, total input size: 4,014,735,833 bytes. Single goroutine used.
   453
   454| *                 | Input      | Output     | Reduction  | MB/s       |
   455|-------------------|------------|------------|------------|------------|
   456| S2                | 4014735833 | 1059723369 | 73.60%     | **936.73** |
   457| S2 Better         | 4014735833 | 961580539  | 76.05%     | 451.10     |
   458| S2 Best           | 4014735833 | 899182886  | **77.60%** | 46.84      |
   459| Snappy            | 4014735833 | 1128706759 | 71.89%     | 790.15     |
   460| S2, Snappy Output | 4014735833 | 1093823291 | 72.75%     | 936.60     |
   461| LZ4               | 4014735833 | 1063768713 | 73.50%     | 452.02     |
   462
   463S2 delivers both the best single threaded throughput with regular mode and the best compression rate with "best".
   464"Better" mode provides the same compression speed as LZ4 with better compression ratio. 
   465
   466When outputting Snappy compatible output it still delivers better throughput (150MB/s more) and better compression.
   467
   468As can be seen from the other benchmarks decompression should also be easier on the S2 generated output.
   469
   470Though they cannot be compared due to different decompression speeds here are the speed/size comparisons for
   471other Go compressors:
   472
   473| *                 | Input      | Output     | Reduction | MB/s   |
   474|-------------------|------------|------------|-----------|--------|
   475| Zstd Fastest (Go) | 4014735833 | 794608518  | 80.21%    | 236.04 |
   476| Zstd Best (Go)    | 4014735833 | 704603356  | 82.45%    | 35.63  |
   477| Deflate (Go) l1   | 4014735833 | 871294239  | 78.30%    | 214.04 |
   478| Deflate (Go) l9   | 4014735833 | 730389060  | 81.81%    | 41.17  |
   479
   480### Standard block compression
   481
   482Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
   483So individual benchmarks should only be seen as a guideline and the overall picture is more important.
   484
   485These micro-benchmarks are with data in cache and trained branch predictors. For a more realistic benchmark see the mixed content above. 
   486
   487Block compression. Parallel benchmark running on 16 cores, 16 goroutines.
   488
   489AMD64 assembly is use for both S2 and Snappy.
   490
   491| Absolute Perf         | Snappy size | S2 Size | Snappy Speed | S2 Speed    | Snappy dec  | S2 dec      |
   492|-----------------------|-------------|---------|--------------|-------------|-------------|-------------|
   493| html                  | 22843       | 20868   | 16246 MB/s   | 18617 MB/s  | 40972 MB/s  | 49263 MB/s  |
   494| urls.10K              | 335492      | 286541  | 7943 MB/s    | 10201 MB/s  | 22523 MB/s  | 26484 MB/s  |
   495| fireworks.jpeg        | 123034      | 123100  | 349544 MB/s  | 303228 MB/s | 718321 MB/s | 827552 MB/s |
   496| fireworks.jpeg (200B) | 146         | 155     | 8869 MB/s    | 20180 MB/s  | 33691 MB/s  | 52421 MB/s  |
   497| paper-100k.pdf        | 85304       | 84202   | 167546 MB/s  | 112988 MB/s | 326905 MB/s | 291944 MB/s |
   498| html_x_4              | 92234       | 20870   | 15194 MB/s   | 54457 MB/s  | 30843 MB/s  | 32217 MB/s  |
   499| alice29.txt           | 88034       | 85934   | 5936 MB/s    | 6540 MB/s   | 12882 MB/s  | 20044 MB/s  |
   500| asyoulik.txt          | 77503       | 79575   | 5517 MB/s    | 6657 MB/s   | 12735 MB/s  | 22806 MB/s  |
   501| lcet10.txt            | 234661      | 220383  | 6235 MB/s    | 6303 MB/s   | 14519 MB/s  | 18697 MB/s  |
   502| plrabn12.txt          | 319267      | 318196  | 5159 MB/s    | 6074 MB/s   | 11923 MB/s  | 19901 MB/s  |
   503| geo.protodata         | 23335       | 18606   | 21220 MB/s   | 25432 MB/s  | 56271 MB/s  | 62540 MB/s  |
   504| kppkn.gtb             | 69526       | 65019   | 9732 MB/s    | 8905 MB/s   | 18491 MB/s  | 18969 MB/s  |
   505| alice29.txt (128B)    | 80          | 82      | 6691 MB/s    | 17179 MB/s  | 31883 MB/s  | 38874 MB/s  |
   506| alice29.txt (1000B)   | 774         | 774     | 12204 MB/s   | 13273 MB/s  | 48056 MB/s  | 52341 MB/s  |
   507| alice29.txt (10000B)  | 6648        | 6933    | 10044 MB/s   | 12824 MB/s  | 32378 MB/s  | 46322 MB/s  |
   508| alice29.txt (20000B)  | 12686       | 13516   | 7733 MB/s    | 12160 MB/s  | 30566 MB/s  | 58969 MB/s  |
   509
   510
   511Speed is generally at or above Snappy. Small blocks gets a significant speedup, although at the expense of size. 
   512
   513Decompression speed is better than Snappy, except in one case. 
   514
   515Since payloads are very small the variance in terms of size is rather big, so they should only be seen as a general guideline.
   516
   517Size is on average around Snappy, but varies on content type. 
   518In cases where compression is worse, it usually is compensated by a speed boost. 
   519
   520
   521### Better compression
   522
   523Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
   524So individual benchmarks should only be seen as a guideline and the overall picture is more important.
   525
   526| Absolute Perf         | Snappy size | Better Size | Snappy Speed | Better Speed | Snappy dec  | Better dec  |
   527|-----------------------|-------------|-------------|--------------|--------------|-------------|-------------|
   528| html                  | 22843       | 18972       | 16246 MB/s   | 8621 MB/s    | 40972 MB/s  | 40292 MB/s  |
   529| urls.10K              | 335492      | 248079      | 7943 MB/s    | 5104 MB/s    | 22523 MB/s  | 20981 MB/s  |
   530| fireworks.jpeg        | 123034      | 123100      | 349544 MB/s  | 84429 MB/s   | 718321 MB/s | 823698 MB/s |
   531| fireworks.jpeg (200B) | 146         | 149         | 8869 MB/s    | 7125 MB/s    | 33691 MB/s  | 30101 MB/s  |
   532| paper-100k.pdf        | 85304       | 82887       | 167546 MB/s  | 11087 MB/s   | 326905 MB/s | 198869 MB/s |
   533| html_x_4              | 92234       | 18982       | 15194 MB/s   | 29316 MB/s   | 30843 MB/s  | 30937 MB/s  |
   534| alice29.txt           | 88034       | 71611       | 5936 MB/s    | 3709 MB/s    | 12882 MB/s  | 16611 MB/s  |
   535| asyoulik.txt          | 77503       | 65941       | 5517 MB/s    | 3380 MB/s    | 12735 MB/s  | 14975 MB/s  |
   536| lcet10.txt            | 234661      | 184939      | 6235 MB/s    | 3537 MB/s    | 14519 MB/s  | 16634 MB/s  |
   537| plrabn12.txt          | 319267      | 264990      | 5159 MB/s    | 2960 MB/s    | 11923 MB/s  | 13382 MB/s  |
   538| geo.protodata         | 23335       | 17689       | 21220 MB/s   | 10859 MB/s   | 56271 MB/s  | 57961 MB/s  |
   539| kppkn.gtb             | 69526       | 55398       | 9732 MB/s    | 5206 MB/s    | 18491 MB/s  | 16524 MB/s  |
   540| alice29.txt (128B)    | 80          | 78          | 6691 MB/s    | 7422 MB/s    | 31883 MB/s  | 34225 MB/s  |
   541| alice29.txt (1000B)   | 774         | 746         | 12204 MB/s   | 5734 MB/s    | 48056 MB/s  | 42068 MB/s  |
   542| alice29.txt (10000B)  | 6648        | 6218        | 10044 MB/s   | 6055 MB/s    | 32378 MB/s  | 28813 MB/s  |
   543| alice29.txt (20000B)  | 12686       | 11492       | 7733 MB/s    | 3143 MB/s    | 30566 MB/s  | 27315 MB/s  |
   544
   545
   546Except for the mostly incompressible JPEG image compression is better and usually in the 
   547double digits in terms of percentage reduction over Snappy.
   548
   549The PDF sample shows a significant slowdown compared to Snappy, as this mode tries harder 
   550to compress the data. Very small blocks are also not favorable for better compression, so throughput is way down.
   551
   552This mode aims to provide better compression at the expense of performance and achieves that 
   553without a huge performance penalty, except on very small blocks. 
   554
   555Decompression speed suffers a little compared to the regular S2 mode, 
   556but still manages to be close to Snappy in spite of increased compression.  
   557 
   558# Best compression mode
   559
   560S2 offers a "best" compression mode. 
   561
   562This will compress as much as possible with little regard to CPU usage.
   563
   564Mainly for offline compression, but where decompression speed should still
   565be high and compatible with other S2 compressed data.
   566
   567Some examples compared on 16 core CPU, amd64 assembly used:
   568
   569```
   570* enwik10
   571Default... 10000000000 -> 4759950115 [47.60%]; 1.03s, 9263.0MB/s
   572Better...  10000000000 -> 4084706676 [40.85%]; 2.16s, 4415.4MB/s
   573Best...    10000000000 -> 3615520079 [36.16%]; 42.259s, 225.7MB/s
   574
   575* github-june-2days-2019.json
   576Default... 6273951764 -> 1041700255 [16.60%]; 431ms, 13882.3MB/s
   577Better...  6273951764 -> 945841238 [15.08%]; 547ms, 10938.4MB/s
   578Best...    6273951764 -> 826392576 [13.17%]; 9.455s, 632.8MB/s
   579
   580* nyc-taxi-data-10M.csv
   581Default... 3325605752 -> 1093516949 [32.88%]; 324ms, 9788.7MB/s
   582Better...  3325605752 -> 885394158 [26.62%]; 491ms, 6459.4MB/s
   583Best...    3325605752 -> 773681257 [23.26%]; 8.29s, 412.0MB/s
   584
   585* 10gb.tar
   586Default... 10065157632 -> 5915541066 [58.77%]; 1.028s, 9337.4MB/s
   587Better...  10065157632 -> 5453844650 [54.19%]; 1.597s, 4862.7MB/s
   588Best...    10065157632 -> 5192495021 [51.59%]; 32.78s, 308.2MB/
   589
   590* consensus.db.10gb
   591Default... 10737418240 -> 4549762344 [42.37%]; 882ms, 12118.4MB/s
   592Better...  10737418240 -> 4438535064 [41.34%]; 1.533s, 3500.9MB/s
   593Best...    10737418240 -> 4210602774 [39.21%]; 42.96s, 254.4MB/s
   594```
   595
   596Decompression speed should be around the same as using the 'better' compression mode. 
   597
   598## Dictionaries
   599
   600*Note: S2 dictionary compression is currently at an early implementation stage, with no assembly for
   601neither encoding nor decoding. Performance improvements can be expected in the future.*
   602
   603Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
   604
   605The same dictionary *must* be used for both encoding and decoding. 
   606S2 does not keep track of whether the same dictionary is used,
   607and using the wrong dictionary will most often not result in an error when decompressing.
   608
   609Blocks encoded *without* dictionaries can be decompressed seamlessly *with* a dictionary.
   610This means it is possible to switch from an encoding without dictionaries to an encoding with dictionaries
   611and treat the blocks similarly.
   612
   613Similar to [zStandard dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression), 
   614the same usage scenario applies to S2 dictionaries.  
   615
   616> Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no universal dictionary). Hence, deploying one dictionary per type of data will provide the greatest benefits. Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
   617
   618S2 further limits the dictionary to only be enabled on the first 64KB of a block.
   619This will remove any negative (speed) impacts of the dictionaries on bigger blocks. 
   620
   621### Compression
   622
   623Using the [github_users_sample_set](https://github.com/facebook/zstd/releases/download/v1.1.3/github_users_sample_set.tar.zst) 
   624and a 64KB dictionary trained with zStandard the following sizes can be achieved. 
   625
   626|                    | Default          | Better           | Best                  |
   627|--------------------|------------------|------------------|-----------------------|
   628| Without Dictionary | 3362023 (44.92%) | 3083163 (41.19%) | 3057944 (40.86%)      |
   629| With Dictionary    | 921524 (12.31%)  | 873154 (11.67%)  | 785503 bytes (10.49%) |
   630
   631So for highly repetitive content, this case provides an almost 3x reduction in size.
   632
   633For less uniform data we will use the Go source code tree.
   634Compressing First 64KB of all `.go` files in `go/src`, Go 1.19.5, 8912 files, 51253563 bytes input:
   635
   636|                    | Default           | Better            | Best              |
   637|--------------------|-------------------|-------------------|-------------------|
   638| Without Dictionary | 22955767 (44.79%) | 20189613 (39.39%  | 19482828 (38.01%) |
   639| With Dictionary    | 19654568 (38.35%) | 16289357 (31.78%) | 15184589 (29.63%) |
   640| Saving/file        | 362 bytes         | 428 bytes         | 472 bytes         |
   641
   642
   643### Creating Dictionaries
   644
   645There are no tools to create dictionaries in S2. 
   646However, there are multiple ways to create a useful dictionary:
   647
   648#### Using a Sample File
   649
   650If your input is very uniform, you can just use a sample file as the dictionary.
   651
   652For example in the `github_users_sample_set` above, the average compression only goes up from 
   65310.49% to 11.48% by using the first file as dictionary compared to using a dedicated dictionary.
   654
   655```Go
   656    // Read a sample
   657    sample, err := os.ReadFile("sample.json")
   658
   659    // Create a dictionary.
   660    dict := s2.MakeDict(sample, nil)
   661	
   662    // b := dict.Bytes() will provide a dictionary that can be saved
   663    // and reloaded with s2.NewDict(b).
   664	
   665    // To encode:
   666    encoded := dict.Encode(nil, file)
   667
   668    // To decode:
   669    decoded, err := dict.Decode(nil, file)
   670```
   671
   672#### Using Zstandard
   673
   674Zstandard dictionaries can easily be converted to S2 dictionaries.
   675
   676This can be helpful to generate dictionaries for files that don't have a fixed structure.
   677
   678
   679Example, with training set files  placed in `./training-set`: 
   680
   681`λ zstd -r --train-fastcover training-set/* --maxdict=65536 -o name.dict`
   682
   683This will create a dictionary of 64KB, that can be converted to a dictionary like this:
   684
   685```Go
   686    // Decode the Zstandard dictionary.
   687    insp, err := zstd.InspectDictionary(zdict)
   688    if err != nil {
   689        panic(err)
   690    }
   691	
   692    // We are only interested in the contents.
   693    // Assume that files start with "// Copyright (c) 2023".
   694    // Search for the longest match for that.
   695    // This may save a few bytes.
   696    dict := s2.MakeDict(insp.Content(), []byte("// Copyright (c) 2023"))
   697
   698    // b := dict.Bytes() will provide a dictionary that can be saved
   699    // and reloaded with s2.NewDict(b).
   700
   701    // We can now encode using this dictionary
   702    encodedWithDict := dict.Encode(nil, payload)
   703
   704    // To decode content:
   705    decoded, err := dict.Decode(nil, encodedWithDict)
   706```
   707
   708It is recommended to save the dictionary returned by ` b:= dict.Bytes()`, since that will contain only the S2 dictionary.
   709
   710This dictionary can later be loaded using `s2.NewDict(b)`. The dictionary then no longer requires `zstd` to be initialized.
   711
   712Also note how `s2.MakeDict` allows you to search for a common starting sequence of your files.
   713This can be omitted, at the expense of a few bytes.
   714
   715# Snappy Compatibility
   716
   717S2 now offers full compatibility with Snappy.
   718
   719This means that the efficient encoders of S2 can be used to generate fully Snappy compatible output.
   720
   721There is a [snappy](https://github.com/klauspost/compress/tree/master/snappy) package that can be used by
   722simply changing imports from `github.com/golang/snappy` to `github.com/klauspost/compress/snappy`.
   723This uses "better" mode for all operations.
   724If you would like more control, you can use the s2 package as described below: 
   725
   726## Blocks
   727
   728Snappy compatible blocks can be generated with the S2 encoder. 
   729Compression and speed is typically a bit better `MaxEncodedLen` is also smaller for smaller memory usage. Replace 
   730
   731| Snappy                    | S2 replacement        |
   732|---------------------------|-----------------------|
   733| snappy.Encode(...)        | s2.EncodeSnappy(...)  |
   734| snappy.MaxEncodedLen(...) | s2.MaxEncodedLen(...) |
   735
   736`s2.EncodeSnappy` can be replaced with `s2.EncodeSnappyBetter` or `s2.EncodeSnappyBest` to get more efficiently compressed snappy compatible output. 
   737
   738`s2.ConcatBlocks` is compatible with snappy blocks.
   739
   740Comparison of [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
   74153927 files, total input size: 4,014,735,833 bytes. amd64, single goroutine used:
   742
   743| Encoder               | Size       | MB/s       | Reduction  |
   744|-----------------------|------------|------------|------------|
   745| snappy.Encode         | 1128706759 | 725.59     | 71.89%     |
   746| s2.EncodeSnappy       | 1093823291 | **899.16** | 72.75%     |
   747| s2.EncodeSnappyBetter | 1001158548 | 578.49     | 75.06%     |
   748| s2.EncodeSnappyBest   | 944507998  | 66.00      | **76.47%** |
   749
   750## Streams
   751
   752For streams, replace `enc = snappy.NewBufferedWriter(w)` with `enc = s2.NewWriter(w, s2.WriterSnappyCompat())`.
   753All other options are available, but note that block size limit is different for snappy.
   754
   755Comparison of different streams, AMD Ryzen 3950x, 16 cores. Size and throughput: 
   756
   757| File                        | snappy.NewWriter         | S2 Snappy                 | S2 Snappy, Better        | S2 Snappy, Best         |
   758|-----------------------------|--------------------------|---------------------------|--------------------------|-------------------------|
   759| nyc-taxi-data-10M.csv       | 1316042016 - 539.47MB/s  | 1307003093 - 10132.73MB/s | 1174534014 - 5002.44MB/s | 1115904679 - 177.97MB/s |
   760| enwik10 (xml)               | 5088294643 - 451.13MB/s  | 5175840939 -  9440.69MB/s | 4560784526 - 4487.21MB/s | 4340299103 - 158.92MB/s |
   761| 10gb.tar (mixed)            | 6056946612 - 729.73MB/s  | 6208571995 -  9978.05MB/s | 5741646126 - 4919.98MB/s | 5548973895 - 180.44MB/s |
   762| github-june-2days-2019.json | 1525176492 - 933.00MB/s  | 1476519054 - 13150.12MB/s | 1400547532 - 5803.40MB/s | 1321887137 - 204.29MB/s |
   763| consensus.db.10gb (db)      | 5412897703 - 1102.14MB/s | 5354073487 - 13562.91MB/s | 5335069899 - 5294.73MB/s | 5201000954 - 175.72MB/s |
   764
   765# Decompression
   766
   767All decompression functions map directly to equivalent s2 functions.
   768
   769| Snappy                 | S2 replacement     |
   770|------------------------|--------------------|
   771| snappy.Decode(...)     | s2.Decode(...)     |
   772| snappy.DecodedLen(...) | s2.DecodedLen(...) |
   773| snappy.NewReader(...)  | s2.NewReader(...)  |
   774
   775Features like [quick forward skipping without decompression](https://pkg.go.dev/github.com/klauspost/compress/s2#Reader.Skip)
   776are also available for Snappy streams.
   777
   778If you know you are only decompressing snappy streams, setting [`ReaderMaxBlockSize(64<<10)`](https://pkg.go.dev/github.com/klauspost/compress/s2#ReaderMaxBlockSize)
   779on your Reader will reduce memory consumption.
   780
   781# Concatenating blocks and streams.
   782
   783Concatenating streams will concatenate the output of both without recompressing them. 
   784While this is inefficient in terms of compression it might be usable in certain scenarios. 
   785The 10 byte 'stream identifier' of the second stream can optionally be stripped, but it is not a requirement.
   786
   787Blocks can be concatenated using the `ConcatBlocks` function.
   788
   789Snappy blocks/streams can safely be concatenated with S2 blocks and streams.
   790Streams with indexes (see below) will currently not work on concatenated streams.
   791
   792# Stream Seek Index
   793
   794S2 and Snappy streams can have indexes. These indexes will allow random seeking within the compressed data.
   795
   796The index can either be appended to the stream as a skippable block or returned for separate storage.
   797
   798When the index is appended to a stream it will be skipped by regular decoders, 
   799so the output remains compatible with other decoders. 
   800
   801## Creating an Index
   802
   803To automatically add an index to a stream, add `WriterAddIndex()` option to your writer.
   804Then the index will be added to the stream when `Close()` is called.
   805
   806```
   807	// Add Index to stream...
   808	enc := s2.NewWriter(w, s2.WriterAddIndex())
   809	io.Copy(enc, r)
   810	enc.Close()
   811```
   812
   813If you want to store the index separately, you can use `CloseIndex()` instead of the regular `Close()`.
   814This will return the index. Note that `CloseIndex()` should only be called once, and you shouldn't call `Close()`.
   815
   816```
   817	// Get index for separate storage... 
   818	enc := s2.NewWriter(w)
   819	io.Copy(enc, r)
   820	index, err := enc.CloseIndex()
   821```
   822
   823The `index` can then be used needing to read from the stream. 
   824This means the index can be used without needing to seek to the end of the stream 
   825or for manually forwarding streams. See below.
   826
   827Finally, an existing S2/Snappy stream can be indexed using the `s2.IndexStream(r io.Reader)` function.
   828
   829## Using Indexes
   830
   831To use indexes there is a `ReadSeeker(random bool, index []byte) (*ReadSeeker, error)` function available.
   832
   833Calling ReadSeeker will return an [io.ReadSeeker](https://pkg.go.dev/io#ReadSeeker) compatible version of the reader.
   834
   835If 'random' is specified the returned io.Seeker can be used for random seeking, otherwise only forward seeking is supported.
   836Enabling random seeking requires the original input to support the [io.Seeker](https://pkg.go.dev/io#Seeker) interface.
   837
   838```
   839	dec := s2.NewReader(r)
   840	rs, err := dec.ReadSeeker(false, nil)
   841	rs.Seek(wantOffset, io.SeekStart)	
   842```
   843
   844Get a seeker to seek forward. Since no index is provided, the index is read from the stream.
   845This requires that an index was added and that `r` supports the [io.Seeker](https://pkg.go.dev/io#Seeker) interface.
   846
   847A custom index can be specified which will be used if supplied.
   848When using a custom index, it will not be read from the input stream.
   849
   850```
   851	dec := s2.NewReader(r)
   852	rs, err := dec.ReadSeeker(false, index)
   853	rs.Seek(wantOffset, io.SeekStart)	
   854```
   855
   856This will read the index from `index`. Since we specify non-random (forward only) seeking `r` does not have to be an io.Seeker
   857
   858```
   859	dec := s2.NewReader(r)
   860	rs, err := dec.ReadSeeker(true, index)
   861	rs.Seek(wantOffset, io.SeekStart)	
   862```
   863
   864Finally, since we specify that we want to do random seeking `r` must be an io.Seeker. 
   865
   866The returned [ReadSeeker](https://pkg.go.dev/github.com/klauspost/compress/s2#ReadSeeker) contains a shallow reference to the existing Reader,
   867meaning changes performed to one is reflected in the other.
   868
   869To check if a stream contains an index at the end, the `(*Index).LoadStream(rs io.ReadSeeker) error` can be used.
   870
   871## Manually Forwarding Streams
   872
   873Indexes can also be read outside the decoder using the [Index](https://pkg.go.dev/github.com/klauspost/compress/s2#Index) type.
   874This can be used for parsing indexes, either separate or in streams.
   875
   876In some cases it may not be possible to serve a seekable stream.
   877This can for instance be an HTTP stream, where the Range request 
   878is sent at the start of the stream. 
   879
   880With a little bit of extra code it is still possible to use indexes
   881to forward to specific offset with a single forward skip. 
   882
   883It is possible to load the index manually like this: 
   884```
   885	var index s2.Index
   886	_, err = index.Load(idxBytes)
   887```
   888
   889This can be used to figure out how much to offset the compressed stream:
   890
   891```
   892	compressedOffset, uncompressedOffset, err := index.Find(wantOffset)
   893```
   894
   895The `compressedOffset` is the number of bytes that should be skipped 
   896from the beginning of the compressed file.
   897
   898The `uncompressedOffset` will then be offset of the uncompressed bytes returned
   899when decoding from that position. This will always be <= wantOffset.
   900
   901When creating a decoder it must be specified that it should *not* expect a stream identifier
   902at the beginning of the stream. Assuming the io.Reader `r` has been forwarded to `compressedOffset`
   903we create the decoder like this:
   904
   905```
   906	dec := s2.NewReader(r, s2.ReaderIgnoreStreamIdentifier())
   907```
   908
   909We are not completely done. We still need to forward the stream the uncompressed bytes we didn't want.
   910This is done using the regular "Skip" function:
   911
   912```
   913	err = dec.Skip(wantOffset - uncompressedOffset)
   914```
   915
   916This will ensure that we are at exactly the offset we want, and reading from `dec` will start at the requested offset.
   917
   918# Compact storage
   919
   920For compact storage [RemoveIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RemoveIndexHeaders) can be used to remove any redundant info from 
   921a serialized index. If you remove the header it must be restored before [Loading](https://pkg.go.dev/github.com/klauspost/compress/s2#Index.Load).
   922
   923This is expected to save 20 bytes. These can be restored using [RestoreIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RestoreIndexHeaders). This removes a layer of security, but is the most compact representation. Returns nil if headers contains errors.
   924
   925## Index Format:
   926
   927Each block is structured as a snappy skippable block, with the chunk ID 0x99.
   928
   929The block can be read from the front, but contains information so it can be read from the back as well.
   930
   931Numbers are stored as fixed size little endian values or [zigzag encoded](https://developers.google.com/protocol-buffers/docs/encoding#signed_integers) [base 128 varints](https://developers.google.com/protocol-buffers/docs/encoding), 
   932with un-encoded value length of 64 bits, unless other limits are specified. 
   933
   934| Content                              | Format                                                                                                                        |
   935|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
   936| ID, `[1]byte`                        | Always 0x99.                                                                                                                  |
   937| Data Length, `[3]byte`               | 3 byte little-endian length of the chunk in bytes, following this.                                                            |
   938| Header `[6]byte`                     | Header, must be `[115, 50, 105, 100, 120, 0]` or in text: "s2idx\x00".                                                        |
   939| UncompressedSize, Varint             | Total Uncompressed size.                                                                                                      |
   940| CompressedSize, Varint               | Total Compressed size if known. Should be -1 if unknown.                                                                      |
   941| EstBlockSize, Varint                 | Block Size, used for guessing uncompressed offsets. Must be >= 0.                                                             |
   942| Entries, Varint                      | Number of Entries in index, must be < 65536 and >=0.                                                                          |
   943| HasUncompressedOffsets `byte`        | 0 if no uncompressed offsets are present, 1 if present. Other values are invalid.                                             |
   944| UncompressedOffsets, [Entries]VarInt | Uncompressed offsets. See below how to decode.                                                                                |
   945| CompressedOffsets, [Entries]VarInt   | Compressed offsets. See below how to decode.                                                                                  |
   946| Block Size, `[4]byte`                | Little Endian total encoded size (including header and trailer). Can be used for searching backwards to start of block.       |
   947| Trailer `[6]byte`                    | Trailer, must be `[0, 120, 100, 105, 50, 115]` or in text: "\x00xdi2s". Can be used for identifying block from end of stream. |
   948
   949For regular streams the uncompressed offsets are fully predictable,
   950so `HasUncompressedOffsets` allows to specify that compressed blocks all have 
   951exactly `EstBlockSize` bytes of uncompressed content.
   952
   953Entries *must* be in order, starting with the lowest offset, 
   954and there *must* be no uncompressed offset duplicates.  
   955Entries *may* point to the start of a skippable block, 
   956but it is then not allowed to also have an entry for the next block since 
   957that would give an uncompressed offset duplicate.
   958
   959There is no requirement for all blocks to be represented in the index. 
   960In fact there is a maximum of 65536 block entries in an index.
   961
   962The writer can use any method to reduce the number of entries.
   963An implicit block start at 0,0 can be assumed.
   964
   965### Decoding entries:
   966
   967```
   968// Read Uncompressed entries.
   969// Each assumes EstBlockSize delta from previous.
   970for each entry {
   971    uOff = 0
   972    if HasUncompressedOffsets == 1 {
   973        uOff = ReadVarInt // Read value from stream
   974    }
   975   
   976    // Except for the first entry, use previous values.
   977    if entryNum == 0 {
   978        entry[entryNum].UncompressedOffset = uOff
   979        continue
   980    }
   981    
   982    // Uncompressed uses previous offset and adds EstBlockSize
   983    entry[entryNum].UncompressedOffset = entry[entryNum-1].UncompressedOffset + EstBlockSize + uOff
   984}
   985
   986
   987// Guess that the first block will be 50% of uncompressed size.
   988// Integer truncating division must be used.
   989CompressGuess := EstBlockSize / 2
   990
   991// Read Compressed entries.
   992// Each assumes CompressGuess delta from previous.
   993// CompressGuess is adjusted for each value.
   994for each entry {
   995    cOff = ReadVarInt // Read value from stream
   996    
   997    // Except for the first entry, use previous values.
   998    if entryNum == 0 {
   999        entry[entryNum].CompressedOffset = cOff
  1000        continue
  1001    }
  1002    
  1003    // Compressed uses previous and our estimate.
  1004    entry[entryNum].CompressedOffset = entry[entryNum-1].CompressedOffset + CompressGuess + cOff
  1005        
  1006     // Adjust compressed offset for next loop, integer truncating division must be used. 
  1007     CompressGuess += cOff/2               
  1008}
  1009```
  1010
  1011To decode from any given uncompressed offset `(wantOffset)`:
  1012
  1013* Iterate entries until `entry[n].UncompressedOffset > wantOffset`.
  1014* Start decoding from `entry[n-1].CompressedOffset`.
  1015* Discard `entry[n-1].UncompressedOffset - wantOffset` bytes from the decoded stream.
  1016
  1017See [using indexes](https://github.com/klauspost/compress/tree/master/s2#using-indexes) for functions that perform the operations with a simpler interface.
  1018
  1019
  1020# Format Extensions
  1021
  1022* Frame [Stream identifier](https://github.com/google/snappy/blob/master/framing_format.txt#L68) changed from `sNaPpY` to `S2sTwO`.
  1023* [Framed compressed blocks](https://github.com/google/snappy/blob/master/format_description.txt) can be up to 4MB (up from 64KB).
  1024* Compressed blocks can have an offset of `0`, which indicates to repeat the last seen offset.
  1025
  1026Repeat offsets must be encoded as a [2.2.1. Copy with 1-byte offset (01)](https://github.com/google/snappy/blob/master/format_description.txt#L89), where the offset is 0.
  1027
  1028The length is specified by reading the 3-bit length specified in the tag and decode using this table:
  1029
  1030| Length | Actual Length        |
  1031|--------|----------------------|
  1032| 0      | 4                    |
  1033| 1      | 5                    |
  1034| 2      | 6                    |
  1035| 3      | 7                    |
  1036| 4      | 8                    |
  1037| 5      | 8 + read 1 byte      |
  1038| 6      | 260 + read 2 bytes   |
  1039| 7      | 65540 + read 3 bytes |
  1040
  1041This allows any repeat offset + length to be represented by 2 to 5 bytes.
  1042It also allows to emit matches longer than 64 bytes with one copy + one repeat instead of several 64 byte copies.
  1043
  1044Lengths are stored as little endian values.
  1045
  1046The first copy of a block cannot be a repeat offset and the offset is reset on every block in streams.
  1047
  1048Default streaming block size is 1MB.
  1049
  1050# Dictionary Encoding
  1051
  1052Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
  1053
  1054A dictionary provides an initial repeat value that can be used to point to a common header.
  1055
  1056Other than that the dictionary contains values that can be used as back-references.
  1057
  1058Often used data should be placed at the *end* of the dictionary since offsets < 2048 bytes will be smaller.
  1059
  1060## Format
  1061
  1062Dictionary *content* must at least 16 bytes and less or equal to 64KiB (65536 bytes).
  1063
  1064Encoding: `[repeat value (uvarint)][dictionary content...]`
  1065
  1066Before the dictionary content, an unsigned base-128 (uvarint) encoded value specifying the initial repeat offset.
  1067This value is an offset into the dictionary content and not a back-reference offset,
  1068so setting this to 0 will make the repeat value point to the first value of the dictionary.
  1069
  1070The value must be less than the dictionary length-8
  1071
  1072## Encoding
  1073
  1074From the decoder point of view the dictionary content is seen as preceding the encoded content.
  1075
  1076`[dictionary content][decoded output]`
  1077
  1078Backreferences to the dictionary are encoded as ordinary backreferences that have an offset before the start of the decoded block.
  1079
  1080Matches copying from the dictionary are **not** allowed to cross from the dictionary into the decoded data.
  1081However, if a copy ends at the end of the dictionary the next repeat will point to the start of the decoded buffer, which is allowed.
  1082
  1083The first match can be a repeat value, which will use the repeat offset stored in the dictionary.
  1084
  1085When 64KB (65536 bytes) has been en/decoded it is no longer allowed to reference the dictionary, 
  1086neither by a copy nor repeat operations. 
  1087If the boundary is crossed while copying from the dictionary, the operation should complete, 
  1088but the next instruction is not allowed to reference the dictionary.
  1089
  1090Valid blocks encoded *without* a dictionary can be decoded with any dictionary. 
  1091There are no checks whether the supplied dictionary is the correct for a block.
  1092Because of this there is no overhead by using a dictionary.
  1093
  1094## Example
  1095
  1096This is the dictionary content. Elements are separated by `[]`.
  1097
  1098Dictionary: `[0x0a][Yesterday 25 bananas were added to Benjamins brown bag]`.
  1099
  1100Initial repeat offset is set at 10, which is the letter `2`.
  1101
  1102Encoded `[LIT "10"][REPEAT len=10][LIT "hich"][MATCH off=50 len=6][MATCH off=31 len=6][MATCH off=61 len=10]`
  1103
  1104Decoded: `[10][ bananas w][hich][ were ][brown ][were added]`
  1105
  1106Output: `10 bananas which were brown were added`
  1107
  1108
  1109## Streams
  1110
  1111For streams each block can use the dictionary.
  1112
  1113The dictionary cannot not currently be provided on the stream.
  1114
  1115
  1116# LICENSE
  1117
  1118This code is based on the [Snappy-Go](https://github.com/golang/snappy) implementation.
  1119
  1120Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.
View as plain text