...

Text file src/github.com/dgraph-io/ristretto/README.md

Documentation: github.com/dgraph-io/ristretto

     1# Ristretto
     2[![Go Doc](https://img.shields.io/badge/godoc-reference-blue.svg)](http://godoc.org/github.com/dgraph-io/ristretto)
     3[![Go Report Card](https://img.shields.io/badge/go%20report-A%2B-brightgreen)](https://goreportcard.com/report/github.com/dgraph-io/ristretto)
     4[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)](https://gocover.io/github.com/dgraph-io/ristretto)
     5![Tests](https://github.com/dgraph-io/ristretto/workflows/tests/badge.svg)
     6
     7Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.
     8
     9The motivation to build Ristretto comes from the need for a contention-free
    10cache in [Dgraph][].
    11
    12[Dgraph]: https://github.com/dgraph-io/dgraph
    13
    14## Features
    15
    16* **High Hit Ratios** - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
    17	* **Eviction: SampledLFU** - on par with exact LRU and better performance on Search and Database traces.
    18	* **Admission: TinyLFU** - extra performance with little memory overhead (12 bits per counter).
    19* **Fast Throughput** - we use a variety of techniques for managing contention and the result is excellent throughput.
    20* **Cost-Based Eviction** - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
    21* **Fully Concurrent** - you can use as many goroutines as you want with little throughput degradation. 
    22* **Metrics** - optional performance metrics for throughput, hit ratios, and other stats.
    23* **Simple API** - just figure out your ideal `Config` values and you're off and running.
    24
    25## Status
    26
    27Ristretto is usable but still under active development. We expect it to be production ready in the near future.
    28
    29## Table of Contents
    30
    31* [Usage](#Usage)
    32	* [Example](#Example)
    33	* [Config](#Config)
    34		* [NumCounters](#Config)
    35		* [MaxCost](#Config)
    36		* [BufferItems](#Config)
    37		* [Metrics](#Config)
    38		* [OnEvict](#Config)
    39		* [KeyToHash](#Config)
    40        * [Cost](#Config)
    41* [Benchmarks](#Benchmarks)
    42	* [Hit Ratios](#Hit-Ratios)
    43		* [Search](#Search)
    44		* [Database](#Database)
    45		* [Looping](#Looping)
    46		* [CODASYL](#CODASYL)
    47	* [Throughput](#Throughput)
    48		* [Mixed](#Mixed)
    49		* [Read](#Read)
    50		* [Write](#Write)
    51* [FAQ](#FAQ)
    52
    53## Usage
    54
    55### Example
    56
    57```go
    58func main() {
    59	cache, err := ristretto.NewCache(&ristretto.Config{
    60		NumCounters: 1e7,     // number of keys to track frequency of (10M).
    61		MaxCost:     1 << 30, // maximum cost of cache (1GB).
    62		BufferItems: 64,      // number of keys per Get buffer.
    63	})
    64	if err != nil {
    65		panic(err)
    66	}
    67
    68	// set a value with a cost of 1
    69	cache.Set("key", "value", 1)
    70	
    71	// wait for value to pass through buffers
    72	time.Sleep(10 * time.Millisecond)
    73
    74	value, found := cache.Get("key")
    75	if !found {
    76		panic("missing value")
    77	}
    78	fmt.Println(value)
    79	cache.Del("key")
    80}
    81```
    82
    83### Config
    84
    85The `Config` struct is passed to `NewCache` when creating Ristretto instances (see the example above). 
    86
    87**NumCounters** `int64`
    88
    89NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full. 
    90
    91For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the *number of unique items* in the full cache, not necessarily the MaxCost value. 
    92
    93**MaxCost** `int64`
    94
    95MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted. 
    96
    97MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted. 
    98
    99MaxCost could be anything as long as it matches how you're using the cost values when calling Set. 
   100
   101**BufferItems** `int64`
   102
   103BufferItems is the size of the Get buffers. The best value we've found for this is 64. 
   104
   105If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this.
   106
   107**Metrics** `bool`
   108
   109Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead. 
   110
   111**OnEvict** `func(hashes [2]uint64, value interface{}, cost int64)`
   112
   113OnEvict is called for every eviction.
   114
   115**KeyToHash** `func(key interface{}) [2]uint64`
   116
   117KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of [defaults depending on the underlying interface type](https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L19-L41).
   118
   119Note that if you want 128bit hashes you should use the full `[2]uint64`,
   120otherwise just fill the `uint64` at the `0` position and it will behave like
   121any 64bit hash.
   122
   123**Cost** `func(value interface{}) int64`
   124
   125Cost is an optional function you can pass to the Config in order to evaluate
   126item cost at runtime, and only for the Set calls that aren't dropped (this is
   127useful if calculating item cost is particularly expensive and you don't want to
   128waste time on items that will be dropped anyways).
   129
   130To signal to Ristretto that you'd like to use this Cost function:
   131
   1321. Set the Cost field to a non-nil function.
   1332. When calling Set for new items or item updates, use a `cost` of 0.
   134
   135## Benchmarks
   136
   137The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto.
   138
   139### Hit Ratios
   140
   141#### Search
   142
   143This trace is described as "disk read accesses initiated by a large commercial
   144search engine in response to various web search requests."
   145
   146<p align="center">
   147	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Search%20(ARC-S3).svg">
   148</p>
   149
   150#### Database
   151
   152This trace is described as "a database server running at a commercial site
   153running an ERP application on top of a commercial database."
   154
   155<p align="center">
   156	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Database%20(ARC-DS1).svg">
   157</p>
   158
   159#### Looping
   160
   161This trace demonstrates a looping access pattern.
   162
   163<p align="center">
   164	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Glimpse%20(LIRS-GLI).svg">
   165</p>
   166
   167#### CODASYL
   168
   169This trace is described as "references to a CODASYL database for a one hour
   170period."
   171
   172<p align="center">
   173	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20CODASYL%20(ARC-OLTP).svg">
   174</p>
   175
   176### Throughput
   177
   178All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb
   179of RAM.
   180
   181#### Mixed
   182
   183<p align="center">
   184	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Mixed.svg">
   185</p>
   186
   187#### Read
   188
   189<p align="center">
   190	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Read%20(Zipfian).svg">
   191</p>
   192
   193#### Write
   194
   195<p align="center">
   196	<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Write%20(Zipfian).svg">
   197</p>
   198
   199## FAQ
   200
   201### How are you achieving this performance? What shortcuts are you taking?
   202
   203We go into detail in the [Ristretto blog post](https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/), but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent [admission policy](https://arxiv.org/abs/1512.00727) and SampledLFU eviction policy.
   204
   205As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache. 
   206
   207### Is Ristretto distributed?
   208
   209No, it's just like any other Go library that you can import into your project and use in a single process. 

View as plain text