Max and Min constants for the IndexType
const ( MaxIndexType = math.MaxInt32 MinIndexType = math.MinInt32 )
const (
MaxValuesPerLiteralRun = (1 << 6) * 8
)
func BytesToBools(in []byte, out []bool)
BytesToBools efficiently populates a slice of booleans from an input bitmap
func MaxRLEBufferSize(width, numValues int) int
func MinRLEBufferSize(bitWidth int) int
BitReader implements functionality for reading bits or bytes buffering up to a uint64 at a time from the reader in order to improve efficiency. It also provides methods to read multiple bytes in one read such as encoded ints/values.
This BitReader is the basis for the other utility classes like RLE decoding and such, providing the necessary functions for interpreting the values.
type BitReader struct {
// contains filtered or unexported fields
}
func NewBitReader(r reader) *BitReader
NewBitReader takes in a reader that implements io.Reader, io.ReaderAt and io.Seeker interfaces and returns a BitReader for use with various bit level manipulations.
func (b *BitReader) CurOffset() int64
CurOffset returns the current Byte offset into the data that the reader is at.
func (b *BitReader) GetAligned(nbytes int, v interface{}) bool
GetAligned reads nbytes from the underlying stream into the passed interface value. Returning false if there aren't enough bytes remaining in the stream or if an invalid type is passed. The bytes are read aligned to byte boundaries.
v must be a pointer to a byte or sized uint type (*byte, *uint16, *uint32, *uint64). encoded values are assumed to be little endian.
func (b *BitReader) GetBatch(bits uint, out []uint64) (int, error)
GetBatch fills out by decoding values repeated from the stream that are encoded using bits as the number of bits per value. The values are expected to be bit packed so we will unpack the values to populate.
func (b *BitReader) GetBatchBools(out []bool) (int, error)
GetBatchBools is like GetBatch but optimized for reading bits as boolean values
func (b *BitReader) GetBatchIndex(bits uint, out []IndexType) (i int, err error)
GetBatchIndex is like GetBatch but for IndexType (used for dictionary decoding)
func (b *BitReader) GetValue(width int) (uint64, bool)
GetValue returns a single value that is bit packed using width as the number of bits and returns false if there weren't enough bits remaining.
func (b *BitReader) GetVlqInt() (uint64, bool)
GetVlqInt reads a Vlq encoded int from the stream. The encoded value must start at the beginning of a byte and this returns false if there weren't enough bytes in the buffer or reader. This will call `ReadByte` which in turn retrieves byte aligned values from the reader
func (b *BitReader) GetZigZagVlqInt() (int64, bool)
GetZigZagVlqInt reads a zigzag encoded integer, returning false if there weren't enough bytes remaining.
func (b *BitReader) ReadByte() (byte, error)
ReadByte reads a single aligned byte from the underlying stream, or populating error if there aren't enough bytes left.
func (b *BitReader) Reset(r reader)
Reset allows reusing a BitReader by setting a new reader and resetting the internal state back to zeros.
BitWriter is a utility for writing values of specific bit widths to a stream using a uint64 as a buffer to build up between flushing for efficiency.
type BitWriter struct {
// contains filtered or unexported fields
}
func NewBitWriter(w WriterAtWithLen) *BitWriter
NewBitWriter initializes a new bit writer to write to the passed in interface using WriteAt to write the appropriate offsets and values.
func (b *BitWriter) Clear()
Clear resets the writer so that subsequent writes will start from offset 0, allowing reuse of the underlying buffer and writer.
func (b *BitWriter) Flush(align bool)
Flush will flush any buffered data to the underlying writer, pass true if the next write should be byte-aligned after this flush.
func (b *BitWriter) SkipBytes(nbytes int) (int, error)
SkipBytes reserves the next aligned nbytes, skipping them and returning the offset to use with WriteAt to write to those reserved bytes. Used for RLE encoding to fill in the indicators after encoding.
func (b *BitWriter) WriteAligned(val uint64, nbytes int) bool
WriteAligned writes the value val as a little endian value in exactly nbytes byte-aligned to the underlying writer, flushing via Flush(true) before writing nbytes without buffering.
func (b *BitWriter) WriteAt(val []byte, off int64) (int, error)
WriteAt fulfills the io.WriterAt interface to write len(p) bytes from p to the underlying byte slice starting at offset off. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered. This allows writing full bytes directly to the underlying writer.
func (b *BitWriter) WriteValue(v uint64, nbits uint) error
WriteValue writes the value v using nbits to pack it, returning false if it fails for some reason.
func (b *BitWriter) WriteVlqInt(v uint64) bool
WriteVlqInt writes v as a vlq encoded integer byte-aligned to the underlying writer without buffering.
func (b *BitWriter) WriteZigZagVlqInt(v int64) bool
WriteZigZagVlqInt writes a zigzag encoded integer byte-aligned to the underlying writer without buffering.
func (b *BitWriter) Written() int
Written returns the number of bytes that have been written to the BitWriter, not how many bytes have been flushed. Use Flush to ensure that all data is flushed to the underlying writer.
BitmapWriter is an interface for bitmap writers so that we can use multiple implementations or swap if necessary.
type BitmapWriter interface { // Set sets the current bit that will be written Set() // Clear clears the current bit that will be written Clear() // Next advances to the next bit for the writer Next() // Finish flushes the current byte out to the bitmap slice Finish() // AppendWord takes nbits from word which should be an LSB bitmap and appends them to the bitmap. AppendWord(word uint64, nbits int64) // AppendBools appends the bit representation of the bools slice, returning the number // of bools that were able to fit in the remaining length of the bitmapwriter. AppendBools(in []bool) int // Pos is the current position that will be written next Pos() int // Reset allows reusing the bitmapwriter by resetting Pos to start with length as // the number of bits that the writer can write. Reset(start, length int) }
func NewBitmapWriter(bitmap []byte, start, length int) BitmapWriter
func NewFirstTimeBitmapWriter(buf []byte, start, length int64) BitmapWriter
NewFirstTimeBitmapWriter creates a bitmap writer that might clobber any bit values following the bits written to the bitmap, as such it is faster than the bitmapwriter that is created with NewBitmapWriter
DictionaryConverter is an interface used for dealing with RLE decoding and encoding when working with dictionaries to get values from indexes.
type DictionaryConverter interface { // Copy takes an interface{} which must be a slice of the appropriate type, and will be populated // by the dictionary values at the indexes from the IndexType slice Copy(interface{}, []IndexType) error // Fill fills interface{} which must be a slice of the appropriate type, with the value // specified by the dictionary index passed in. Fill(interface{}, IndexType) error // FillZero fills interface{}, which must be a slice of the appropriate type, with the zero value // for the given type. FillZero(interface{}) // IsValid validates that all of the indexes passed in are valid indexes for the dictionary IsValid(...IndexType) bool }
IndexType is the type we're going to use for Dictionary indexes, currently an alias to int32
type IndexType = int32
type RleDecoder struct {
// contains filtered or unexported fields
}
func NewRleDecoder(data *bytes.Reader, width int) *RleDecoder
func (r *RleDecoder) GetBatch(values []uint64) int
func (r *RleDecoder) GetBatchSpaced(vals []uint64, nullcount int, validBits []byte, validBitsOffset int64) (int, error)
func (r *RleDecoder) GetBatchWithDict(dc DictionaryConverter, vals interface{}) (int, error)
func (r *RleDecoder) GetBatchWithDictByteArray(dc DictionaryConverter, vals []parquet.ByteArray) (int, error)
func (r *RleDecoder) GetBatchWithDictFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray) (int, error)
func (r *RleDecoder) GetBatchWithDictFloat32(dc DictionaryConverter, vals []float32) (int, error)
func (r *RleDecoder) GetBatchWithDictFloat64(dc DictionaryConverter, vals []float64) (int, error)
func (r *RleDecoder) GetBatchWithDictInt32(dc DictionaryConverter, vals []int32) (int, error)
func (r *RleDecoder) GetBatchWithDictInt64(dc DictionaryConverter, vals []int64) (int, error)
func (r *RleDecoder) GetBatchWithDictInt96(dc DictionaryConverter, vals []parquet.Int96) (int, error)
func (r *RleDecoder) GetBatchWithDictSpaced(dc DictionaryConverter, vals interface{}, nullCount int, validBits []byte, validBitsOffset int64) (int, error)
func (r *RleDecoder) GetBatchWithDictSpacedByteArray(dc DictionaryConverter, vals []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedFloat32(dc DictionaryConverter, vals []float32, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedFloat64(dc DictionaryConverter, vals []float64, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedInt32(dc DictionaryConverter, vals []int32, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedInt64(dc DictionaryConverter, vals []int64, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetBatchWithDictSpacedInt96(dc DictionaryConverter, vals []parquet.Int96, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (r *RleDecoder) GetValue() (uint64, bool)
func (r *RleDecoder) Next() bool
func (r *RleDecoder) Reset(data *bytes.Reader, width int)
type RleEncoder struct { BitWidth int // contains filtered or unexported fields }
func NewRleEncoder(w WriterAtWithLen, width int) *RleEncoder
func (r *RleEncoder) Clear()
func (r *RleEncoder) Flush() int
func (r *RleEncoder) Put(value uint64) error
Put buffers input values 8 at a time. after seeing all 8 values, it decides whether they should be encoded as a literal or repeated run.
TellWrapper wraps any io.Writer to add a Tell function that tracks the position based on calls to Write. It does not take into account any calls to Seek or any Writes that don't go through the TellWrapper
type TellWrapper struct { io.Writer // contains filtered or unexported fields }
func (w *TellWrapper) Close() error
Close makes TellWrapper an io.Closer so that calling Close will also call Close on the wrapped writer if it has a Close function.
func (w *TellWrapper) Tell() int64
func (w *TellWrapper) Write(p []byte) (n int, err error)
WriteCloserTell is an interface adding a Tell function to a WriteCloser so if the underlying writer has a Close function, it is exposed and not hidden.
type WriteCloserTell interface { io.WriteCloser Tell() int64 }
WriterAtBuffer is a convenience struct for providing a WriteAt function to a byte slice for use with things that want an io.WriterAt
type WriterAtBuffer struct {
// contains filtered or unexported fields
}
func (w *WriterAtBuffer) Len() int
Len returns the length of the underlying byte slice.
func (w *WriterAtBuffer) Reserve(nbytes int)
func (w *WriterAtBuffer) WriteAt(p []byte, off int64) (n int, err error)
WriteAt fulfills the io.WriterAt interface to write len(p) bytes from p to the underlying byte slice starting at offset off. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered.
WriterAtWithLen is an interface for an io.WriterAt with a Len function
type WriterAtWithLen interface { io.WriterAt Len() int Reserve(int) }
func NewWriterAtBuffer(buf []byte) WriterAtWithLen
NewWriterAtBuffer returns an object which fulfills the io.WriterAt interface by taking ownership of the passed in slice.
WriterTell is an interface that adds a Tell function to an io.Writer
type WriterTell interface { io.Writer Tell() int64 }