1 | .. -*- coding: utf-8-with-signature-unix; fill-column: 77 -*- |
---|
2 | |
---|
3 | ******************************** |
---|
4 | Using Tahoe as a key-value store |
---|
5 | ******************************** |
---|
6 | |
---|
7 | There are several ways you could use Tahoe-LAFS as a key-value store. |
---|
8 | |
---|
9 | Looking only at things that are *already implemented*, there are three |
---|
10 | options: |
---|
11 | |
---|
12 | 1. Immutable files |
---|
13 | |
---|
14 | API: |
---|
15 | |
---|
16 | * key ← put(value) |
---|
17 | |
---|
18 | This is spelled "`PUT /uri`_" in the API. |
---|
19 | |
---|
20 | Note: the user (client code) of this API does not get to choose the key! |
---|
21 | The key is determined programmatically using secure hash functions and |
---|
22 | encryption of the value and of the optional "added convergence secret". |
---|
23 | |
---|
24 | * value ← get(key) |
---|
25 | |
---|
26 | This is spelled "`GET /uri/$FILECAP`_" in the API. "$FILECAP" is the |
---|
27 | key. |
---|
28 | |
---|
29 | For details, see "immutable files" in :doc:`performance`, but in summary: |
---|
30 | the performance is not great but not bad. |
---|
31 | |
---|
32 | That document doesn't mention that if the size of the A-byte mutable file |
---|
33 | is less than or equal to `55 bytes`_ then the performance cost is much |
---|
34 | smaller, because the value gets packed into the key. Added a ticket: |
---|
35 | `#2226`_. |
---|
36 | |
---|
37 | 2. Mutable files |
---|
38 | |
---|
39 | API: |
---|
40 | |
---|
41 | * key ← create() |
---|
42 | |
---|
43 | This is spelled "`PUT /uri?format=mdmf`_". |
---|
44 | |
---|
45 | Note: again, the key cannot be chosen by the user! The key is |
---|
46 | determined programmatically using secure hash functions and RSA public |
---|
47 | key pair generation. |
---|
48 | |
---|
49 | * set(key, value) |
---|
50 | |
---|
51 | * value ← get(key) |
---|
52 | |
---|
53 | This is spelled "`GET /uri/$FILECAP`_". Again, the "$FILECAP" is the |
---|
54 | key. This is the same API as for getting the value from an immutable, |
---|
55 | above. Whether the value you get this way is immutable (i.e. it will |
---|
56 | always be the same value) or mutable (i.e. an authorized person can |
---|
57 | change what value you get when you read) depends on the type of the |
---|
58 | key. |
---|
59 | |
---|
60 | Again, for details, see "mutable files" in :doc:`performance` (and |
---|
61 | `these tickets`_ about how that doc is incomplete), but in summary, the |
---|
62 | performance of the create() operation is *terrible*! (It involves |
---|
63 | generating a 2048-bit RSA key pair.) The performance of the set and get |
---|
64 | operations are probably merely not great but not bad. |
---|
65 | |
---|
66 | 3. Directories |
---|
67 | |
---|
68 | API: |
---|
69 | |
---|
70 | * directory ← create() |
---|
71 | |
---|
72 | This is spelled "`POST /uri?t=mkdir`_". |
---|
73 | |
---|
74 | :doc:`performance` does not mention directories (`#2228`_), but in order |
---|
75 | to understand the performance of directories you have to understand how |
---|
76 | they are implemented. Mkdir creates a new mutable file, exactly the |
---|
77 | same, and with exactly the same performance, as the "create() mutable" |
---|
78 | above. |
---|
79 | |
---|
80 | * set(directory, key, value) |
---|
81 | |
---|
82 | This is spelled "`PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME`_". "$DIRCAP" |
---|
83 | is the directory, "FILENAME" is the key. The value is the body of the |
---|
84 | HTTP PUT request. The part about "[SUBDIRS../]" in there is for |
---|
85 | optional nesting which you can ignore for the purposes of this |
---|
86 | key-value store. |
---|
87 | |
---|
88 | This way, you *do* get to choose the key to be whatever you want (an |
---|
89 | arbitrary unicode string). |
---|
90 | |
---|
91 | To understand the performance of ``PUT /uri/$directory/$key``, |
---|
92 | understand that this proceeds in two steps: first it uploads the value |
---|
93 | as an immutable file, exactly the same as the "put(value)" API from the |
---|
94 | immutable API above. So right there you've already paid exactly the |
---|
95 | same cost as if you had used that API. Then after it has finished |
---|
96 | uploading that, and it has the immutable file cap from that operation |
---|
97 | in hand, it downloads the entire current directory, changes it to |
---|
98 | include the mapping from key to the immutable file cap, and re-uploads |
---|
99 | the entire directory. So that has a cost which is easy to understand: |
---|
100 | you have to download and re-upload the entire directory, which is the |
---|
101 | entire set of mappings from user-chosen keys (Unicode strings) to |
---|
102 | immutable file caps. Each entry in the directory occupies something on |
---|
103 | the order of 300 bytes. |
---|
104 | |
---|
105 | So the "set()" call from this directory-based API has obviously much |
---|
106 | worse performance than the the equivalent "set()" calls from the |
---|
107 | immutable-file-based API or the mutable-file-based API. This is not |
---|
108 | necessarily worse overall than the performance of the |
---|
109 | mutable-file-based API if you take into account the cost of the |
---|
110 | necessary create() calls. |
---|
111 | |
---|
112 | * value ← get(directory, key) |
---|
113 | |
---|
114 | This is spelled "`GET /uri/$DIRCAP/[SUBDIRS../]FILENAME`_". As above, |
---|
115 | "$DIRCAP" is the directory, "FILENAME" is the key. |
---|
116 | |
---|
117 | The performance of this is determined by the fact that it first |
---|
118 | downloads the entire directory, then finds the immutable filecap for |
---|
119 | the given key, then does a GET on that immutable filecap. So again, |
---|
120 | it is strictly worse than using the immutable file API (about twice |
---|
121 | as bad, if the directory size is similar to the value size). |
---|
122 | |
---|
123 | What about ways to use LAFS as a key-value store that are not yet |
---|
124 | implemented? Well, Zooko has lots of ideas about ways to extend Tahoe-LAFS to |
---|
125 | support different kinds of storage APIs or better performance. One that he |
---|
126 | thinks is pretty promising is just the Keep It Simple, Stupid idea of "store a |
---|
127 | sqlite db in a Tahoe-LAFS mutable". ☺ |
---|
128 | |
---|
129 | .. _PUT /uri: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#writing-uploading-a-file |
---|
130 | |
---|
131 | .. _GET /uri/$FILECAP: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#viewing-downloading-a-file |
---|
132 | |
---|
133 | .. _55 bytes: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/immutable/upload.py?rev=196bd583b6c4959c60d3f73cdcefc9edda6a38ae#L1504 |
---|
134 | |
---|
135 | .. _PUT /uri?format=mdmf: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#writing-uploading-a-file |
---|
136 | |
---|
137 | .. _#2226: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2226 |
---|
138 | |
---|
139 | .. _these tickets: https://tahoe-lafs.org/trac/tahoe-lafs/query?status=assigned&status=new&status=reopened&keywords=~doc&description=~performance.rst&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=milestone&order=priority |
---|
140 | |
---|
141 | .. _POST /uri?t=mkdir: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#creating-a-new-directory |
---|
142 | |
---|
143 | .. _#2228: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2228 |
---|
144 | |
---|
145 | .. _PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#creating-a-new-directory |
---|
146 | |
---|
147 | .. _GET /uri/$DIRCAP/[SUBDIRS../]FILENAME: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/frontends/webapi.rst#reading-a-file |
---|
148 | |
---|