1# What is diskv?
2
3Diskv (disk-vee) is a simple, persistent key-value store written in the Go
4language. It starts with an incredibly simple API for storing arbitrary data on
5a filesystem by key, and builds several layers of performance-enhancing
6abstraction on top. The end result is a conceptually simple, but highly
7performant, disk-backed storage system.
8
9[![Build Status][1]][2]
10
11[1]: https://drone.io/github.com/peterbourgon/diskv/status.png
12[2]: https://drone.io/github.com/peterbourgon/diskv/latest
13
14
15# Installing
16
17Install [Go 1][3], either [from source][4] or [with a prepackaged binary][5].
18Then,
19
20```bash
21$ go get github.com/peterbourgon/diskv/v3
22```
23
24[3]: http://golang.org
25[4]: http://golang.org/doc/install/source
26[5]: http://golang.org/doc/install
27
28
29# Usage
30
31```go
32package main
33
34import (
35 "fmt"
36 "github.com/peterbourgon/diskv/v3"
37)
38
39func main() {
40 // Simplest transform function: put all the data files into the base dir.
41 flatTransform := func(s string) []string { return []string{} }
42
43 // Initialize a new diskv store, rooted at "my-data-dir", with a 1MB cache.
44 d := diskv.New(diskv.Options{
45 BasePath: "my-data-dir",
46 Transform: flatTransform,
47 CacheSizeMax: 1024 * 1024,
48 })
49
50 // Write three bytes to the key "alpha".
51 key := "alpha"
52 d.Write(key, []byte{'1', '2', '3'})
53
54 // Read the value back out of the store.
55 value, _ := d.Read(key)
56 fmt.Printf("%v\n", value)
57
58 // Erase the key+value from the store (and the disk).
59 d.Erase(key)
60}
61```
62
63More complex examples can be found in the "examples" subdirectory.
64
65
66# Theory
67
68## Basic idea
69
70At its core, diskv is a map of a key (`string`) to arbitrary data (`[]byte`).
71The data is written to a single file on disk, with the same name as the key.
72The key determines where that file will be stored, via a user-provided
73`TransformFunc`, which takes a key and returns a slice (`[]string`)
74corresponding to a path list where the key file will be stored. The simplest
75TransformFunc,
76
77```go
78func SimpleTransform (key string) []string {
79 return []string{}
80}
81```
82
83will place all keys in the same, base directory. The design is inspired by
84[Redis diskstore][6]; a TransformFunc which emulates the default diskstore
85behavior is available in the content-addressable-storage example.
86
87[6]: http://groups.google.com/group/redis-db/browse_thread/thread/d444bc786689bde9?pli=1
88
89**Note** that your TransformFunc should ensure that one valid key doesn't
90transform to a subset of another valid key. That is, it shouldn't be possible
91to construct valid keys that resolve to directory names. As a concrete example,
92if your TransformFunc splits on every 3 characters, then
93
94```go
95d.Write("abcabc", val) // OK: written to <base>/abc/abc/abcabc
96d.Write("abc", val) // Error: attempted write to <base>/abc/abc, but it's a directory
97```
98
99This will be addressed in an upcoming version of diskv.
100
101Probably the most important design principle behind diskv is that your data is
102always flatly available on the disk. diskv will never do anything that would
103prevent you from accessing, copying, backing up, or otherwise interacting with
104your data via common UNIX commandline tools.
105
106## Advanced path transformation
107
108If you need more control over the file name written to disk or if you want to support
109slashes in your key name or special characters in the keys, you can use the
110AdvancedTransform property. You must supply a function that returns
111a special PathKey structure, which is a breakdown of a path and a file name. Strings
112returned must be clean of any slashes or special characters:
113
114```go
115func AdvancedTransformExample(key string) *diskv.PathKey {
116 path := strings.Split(key, "/")
117 last := len(path) - 1
118 return &diskv.PathKey{
119 Path: path[:last],
120 FileName: path[last] + ".txt",
121 }
122}
123
124// If you provide an AdvancedTransform, you must also provide its
125// inverse:
126
127func InverseTransformExample(pathKey *diskv.PathKey) (key string) {
128 txt := pathKey.FileName[len(pathKey.FileName)-4:]
129 if txt != ".txt" {
130 panic("Invalid file found in storage folder!")
131 }
132 return strings.Join(pathKey.Path, "/") + pathKey.FileName[:len(pathKey.FileName)-4]
133}
134
135func main() {
136 d := diskv.New(diskv.Options{
137 BasePath: "my-data-dir",
138 AdvancedTransform: AdvancedTransformExample,
139 InverseTransform: InverseTransformExample,
140 CacheSizeMax: 1024 * 1024,
141 })
142 // Write some text to the key "alpha/beta/gamma".
143 key := "alpha/beta/gamma"
144 d.WriteString(key, "¡Hola!") // will be stored in "<basedir>/alpha/beta/gamma.txt"
145 fmt.Println(d.ReadString("alpha/beta/gamma"))
146}
147```
148
149
150## Adding a cache
151
152An in-memory caching layer is provided by combining the BasicStore
153functionality with a simple map structure, and keeping it up-to-date as
154appropriate. Since the map structure in Go is not threadsafe, it's combined
155with a RWMutex to provide safe concurrent access.
156
157## Adding order
158
159diskv is a key-value store and therefore inherently unordered. An ordering
160system can be injected into the store by passing something which satisfies the
161diskv.Index interface. (A default implementation, using Google's
162[btree][7] package, is provided.) Basically, diskv keeps an ordered (by a
163user-provided Less function) index of the keys, which can be queried.
164
165[7]: https://github.com/google/btree
166
167## Adding compression
168
169Something which implements the diskv.Compression interface may be passed
170during store creation, so that all Writes and Reads are filtered through
171a compression/decompression pipeline. Several default implementations,
172using stdlib compression algorithms, are provided. Note that data is cached
173compressed; the cost of decompression is borne with each Read.
174
175## Streaming
176
177diskv also now provides ReadStream and WriteStream methods, to allow very large
178data to be handled efficiently.
179
180
181# Future plans
182
183 * Needs plenty of robust testing: huge datasets, etc...
184 * More thorough benchmarking
185 * Your suggestions for use-cases I haven't thought of
186
187
188# Credits and contributions
189
190Original idea, design and implementation: [Peter Bourgon](https://github.com/peterbourgon)
191Other collaborations: [Javier Peletier](https://github.com/jpeletier) ([Epic Labs](https://www.epiclabs.io))
View as plain text