1 | |
---|
2 | (protocol proposal, work-in-progress, not authoritative) |
---|
3 | |
---|
4 | (this document describes DSA-based mutable files, as opposed to the RSA-based |
---|
5 | mutable files that were introduced in tahoe-0.7.0 . This proposal has not yet |
---|
6 | been implemented. Please see mutable-DSA.svg for a quick picture of the |
---|
7 | crypto scheme described herein) |
---|
8 | |
---|
9 | This file shows only the differences from RSA-based mutable files to |
---|
10 | (EC)DSA-based mutable files. You have to read and understand |
---|
11 | docs/specifications/mutable.rst before reading this file (mutable-DSA.txt). |
---|
12 | |
---|
13 | == new design criteria == |
---|
14 | |
---|
15 | * provide for variable number of semiprivate sections? |
---|
16 | * put e.g. filenames in one section, readcaps in another, writecaps in a third |
---|
17 | (ideally, to modify a filename you'd only have to modify one section, and |
---|
18 | we'd make encrypting/hashing more efficient by doing it on larger blocks of |
---|
19 | data, preferably one segment at a time instead of one writecap at a time) |
---|
20 | * cleanly distinguish between "container" (leases, write-enabler) and |
---|
21 | "slot contents" (everything that comes from share encoding) |
---|
22 | * sign all slot bits (to allow server-side verification) |
---|
23 | * someone reading the whole file should be able to read the share in a single |
---|
24 | linear pass with just a single seek to zero |
---|
25 | * writing the file should occur in two passes (3 seeks) in mostly linear order |
---|
26 | 1: write version/pubkey/topbits/salt |
---|
27 | 2: write zeros / seek+prefill where the hashchain/tree goes |
---|
28 | 3: write blocks |
---|
29 | 4: seek back |
---|
30 | 5: write hashchain/tree |
---|
31 | * storage format: consider putting container bits in a separate file |
---|
32 | - $SI.index (contains list of shnums, leases, other-cabal-members, WE, etc) |
---|
33 | - $SI-$shnum.share (actual share data) |
---|
34 | * possible layout: |
---|
35 | - version |
---|
36 | - pubkey |
---|
37 | - topbits (k, N, size, segsize, etc) |
---|
38 | - salt? (salt tree root?) |
---|
39 | - share hash root |
---|
40 | - share hash chain |
---|
41 | - block hash tree |
---|
42 | - (salts?) (salt tree?) |
---|
43 | - blocks |
---|
44 | - signature (of [version .. share hash root]) |
---|
45 | |
---|
46 | === SDMF slots overview === |
---|
47 | |
---|
48 | Each SDMF slot is created with a DSA public/private key pair, using a |
---|
49 | system-wide common modulus and generator, in which the private key is a |
---|
50 | random 256 bit number, and the public key is a larger value (about 2048 bits) |
---|
51 | that can be derived with a bit of math from the private key. The public key |
---|
52 | is known as the "verification key", while the private key is called the |
---|
53 | "signature key". |
---|
54 | |
---|
55 | The 256-bit signature key is used verbatim as the "write capability". This |
---|
56 | can be converted into the 2048ish-bit verification key through a fairly cheap |
---|
57 | set of modular exponentiation operations; this is done any time the holder of |
---|
58 | the write-cap wants to read the data. (Note that the signature key can either |
---|
59 | be a newly-generated random value, or the hash of something else, if we found |
---|
60 | a need for a capability that's stronger than the write-cap). |
---|
61 | |
---|
62 | This results in a write-cap which is 256 bits long and can thus be expressed |
---|
63 | in an ASCII/transport-safe encoded form (base62 encoding, fits in 72 |
---|
64 | characters, including a local-node http: convenience prefix). |
---|
65 | |
---|
66 | The private key is hashed to form a 256-bit "salt". The public key is also |
---|
67 | hashed to form a 256-bit "pubkey hash". These two values are concatenated, |
---|
68 | hashed, and truncated to 192 bits to form the first 192 bits of the read-cap. |
---|
69 | The pubkey hash is hashed by itself and truncated to 64 bits to form the last |
---|
70 | 64 bits of the read-cap. The full read-cap is 256 bits long, just like the |
---|
71 | write-cap. |
---|
72 | |
---|
73 | The first 192 bits of the read-cap are hashed and truncated to form the first |
---|
74 | 192 bits of the "traversal cap". The last 64 bits of the read-cap are hashed |
---|
75 | to form the last 64 bits of the traversal cap. This gives us a 256-bit |
---|
76 | traversal cap. |
---|
77 | |
---|
78 | The first 192 bits of the traversal-cap are hashed and truncated to form the |
---|
79 | first 64 bits of the storage index. The last 64 bits of the traversal-cap are |
---|
80 | hashed to form the last 64 bits of the storage index. This gives us a 128-bit |
---|
81 | storage index. |
---|
82 | |
---|
83 | The verification-cap is the first 64 bits of the storage index plus the |
---|
84 | pubkey hash, 320 bits total. The verification-cap doesn't need to be |
---|
85 | expressed in a printable transport-safe form, so it's ok that it's longer. |
---|
86 | |
---|
87 | The read-cap is hashed one way to form an AES encryption key that is used to |
---|
88 | encrypt the salt; this key is called the "salt key". The encrypted salt is |
---|
89 | stored in the share. The private key never changes, therefore the salt never |
---|
90 | changes, and the salt key is only used for a single purpose, so there is no |
---|
91 | need for an IV. |
---|
92 | |
---|
93 | The read-cap is hashed a different way to form the master data encryption |
---|
94 | key. A random "data salt" is generated each time the share's contents are |
---|
95 | replaced, and the master data encryption key is concatenated with the data |
---|
96 | salt, then hashed, to form the AES CTR-mode "read key" that will be used to |
---|
97 | encrypt the actual file data. This is to avoid key-reuse. An outstanding |
---|
98 | issue is how to avoid key reuse when files are modified in place instead of |
---|
99 | being replaced completely; this is not done in SDMF but might occur in MDMF. |
---|
100 | |
---|
101 | The master data encryption key is used to encrypt data that should be visible |
---|
102 | to holders of a write-cap or a read-cap, but not to holders of a |
---|
103 | traversal-cap. |
---|
104 | |
---|
105 | The private key is hashed one way to form the salt, and a different way to |
---|
106 | form the "write enabler master". For each storage server on which a share is |
---|
107 | kept, the write enabler master is concatenated with the server's nodeid and |
---|
108 | hashed, and the result is called the "write enabler" for that particular |
---|
109 | server. Note that multiple shares of the same slot stored on the same server |
---|
110 | will all get the same write enabler, i.e. the write enabler is associated |
---|
111 | with the "bucket", rather than the individual shares. |
---|
112 | |
---|
113 | The private key is hashed a third way to form the "data write key", which can |
---|
114 | be used by applications which wish to store some data in a form that is only |
---|
115 | available to those with a write-cap, and not to those with merely a read-cap. |
---|
116 | This is used to implement transitive read-onlyness of dirnodes. |
---|
117 | |
---|
118 | The traversal cap is hashed to work the "traversal key", which can be used by |
---|
119 | applications that wish to store data in a form that is available to holders |
---|
120 | of a write-cap, read-cap, or traversal-cap. |
---|
121 | |
---|
122 | The idea is that dirnodes will store child write-caps under the writekey, |
---|
123 | child names and read-caps under the read-key, and verify-caps (for files) or |
---|
124 | deep-verify-caps (for directories) under the traversal key. This would give |
---|
125 | the holder of a root deep-verify-cap the ability to create a verify manifest |
---|
126 | for everything reachable from the root, but not the ability to see any |
---|
127 | plaintext or filenames. This would make it easier to delegate filechecking |
---|
128 | and repair to a not-fully-trusted agent. |
---|
129 | |
---|
130 | The public key is stored on the servers, as is the encrypted salt, the |
---|
131 | (non-encrypted) data salt, the encrypted data, and a signature. The container |
---|
132 | records the write-enabler, but of course this is not visible to readers. To |
---|
133 | make sure that every byte of the share can be verified by a holder of the |
---|
134 | verify-cap (and also by the storage server itself), the signature covers the |
---|
135 | version number, the sequence number, the root hash "R" of the share merkle |
---|
136 | tree, the encoding parameters, and the encrypted salt. "R" itself covers the |
---|
137 | hash trees and the share data. |
---|
138 | |
---|
139 | The read-write URI is just the private key. The read-only URI is the read-cap |
---|
140 | key. The deep-verify URI is the traversal-cap. The verify-only URI contains |
---|
141 | the the pubkey hash and the first 64 bits of the storage index. |
---|
142 | |
---|
143 | FMW:b2a(privatekey) |
---|
144 | FMR:b2a(readcap) |
---|
145 | FMT:b2a(traversalcap) |
---|
146 | FMV:b2a(storageindex[:64])b2a(pubkey-hash) |
---|
147 | |
---|
148 | Note that this allows the read-only, deep-verify, and verify-only URIs to be |
---|
149 | derived from the read-write URI without actually retrieving any data from the |
---|
150 | share, but instead by regenerating the public key from the private one. Users |
---|
151 | of the read-only, deep-verify, or verify-only caps must validate the public |
---|
152 | key against their pubkey hash (or its derivative) the first time they |
---|
153 | retrieve the pubkey, before trusting any signatures they see. |
---|
154 | |
---|
155 | The SDMF slot is allocated by sending a request to the storage server with a |
---|
156 | desired size, the storage index, and the write enabler for that server's |
---|
157 | nodeid. If granted, the write enabler is stashed inside the slot's backing |
---|
158 | store file. All further write requests must be accompanied by the write |
---|
159 | enabler or they will not be honored. The storage server does not share the |
---|
160 | write enabler with anyone else. |
---|
161 | |
---|
162 | The SDMF slot structure will be described in more detail below. The important |
---|
163 | pieces are: |
---|
164 | |
---|
165 | * a sequence number |
---|
166 | * a root hash "R" |
---|
167 | * the data salt |
---|
168 | * the encoding parameters (including k, N, file size, segment size) |
---|
169 | * a signed copy of [seqnum,R,data_salt,encoding_params] (using signature key) |
---|
170 | * the verification key (not encrypted) |
---|
171 | * the share hash chain (part of a Merkle tree over the share hashes) |
---|
172 | * the block hash tree (Merkle tree over blocks of share data) |
---|
173 | * the share data itself (erasure-coding of read-key-encrypted file data) |
---|
174 | * the salt, encrypted with the salt key |
---|
175 | |
---|
176 | The access pattern for read (assuming we hold the write-cap) is: |
---|
177 | * generate public key from the private one |
---|
178 | * hash private key to get the salt, hash public key, form read-cap |
---|
179 | * form storage-index |
---|
180 | * use storage-index to locate 'k' shares with identical 'R' values |
---|
181 | * either get one share, read 'k' from it, then read k-1 shares |
---|
182 | * or read, say, 5 shares, discover k, either get more or be finished |
---|
183 | * or copy k into the URIs |
---|
184 | * .. jump to "COMMON READ", below |
---|
185 | |
---|
186 | To read (assuming we only hold the read-cap), do: |
---|
187 | * hash read-cap pieces to generate storage index and salt key |
---|
188 | * use storage-index to locate 'k' shares with identical 'R' values |
---|
189 | * retrieve verification key and encrypted salt |
---|
190 | * decrypt salt |
---|
191 | * hash decrypted salt and pubkey to generate another copy of the read-cap, |
---|
192 | make sure they match (this validates the pubkey) |
---|
193 | * .. jump to "COMMON READ" |
---|
194 | |
---|
195 | * COMMON READ: |
---|
196 | * read seqnum, R, data salt, encoding parameters, signature |
---|
197 | * verify signature against verification key |
---|
198 | * hash data salt and read-cap to generate read-key |
---|
199 | * read share data, compute block-hash Merkle tree and root "r" |
---|
200 | * read share hash chain (leading from "r" to "R") |
---|
201 | * validate share hash chain up to the root "R" |
---|
202 | * submit share data to erasure decoding |
---|
203 | * decrypt decoded data with read-key |
---|
204 | * submit plaintext to application |
---|
205 | |
---|
206 | The access pattern for write is: |
---|
207 | * generate pubkey, salt, read-cap, storage-index as in read case |
---|
208 | * generate data salt for this update, generate read-key |
---|
209 | * encrypt plaintext from application with read-key |
---|
210 | * application can encrypt some data with the data-write-key to make it |
---|
211 | only available to writers (used for transitively-readonly dirnodes) |
---|
212 | * erasure-code crypttext to form shares |
---|
213 | * split shares into blocks |
---|
214 | * compute Merkle tree of blocks, giving root "r" for each share |
---|
215 | * compute Merkle tree of shares, find root "R" for the file as a whole |
---|
216 | * create share data structures, one per server: |
---|
217 | * use seqnum which is one higher than the old version |
---|
218 | * share hash chain has log(N) hashes, different for each server |
---|
219 | * signed data is the same for each server |
---|
220 | * include pubkey, encrypted salt, data salt |
---|
221 | * now we have N shares and need homes for them |
---|
222 | * walk through peers |
---|
223 | * if share is not already present, allocate-and-set |
---|
224 | * otherwise, try to modify existing share: |
---|
225 | * send testv_and_writev operation to each one |
---|
226 | * testv says to accept share if their(seqnum+R) <= our(seqnum+R) |
---|
227 | * count how many servers wind up with which versions (histogram over R) |
---|
228 | * keep going until N servers have the same version, or we run out of servers |
---|
229 | * if any servers wound up with a different version, report error to |
---|
230 | application |
---|
231 | * if we ran out of servers, initiate recovery process (described below) |
---|
232 | |
---|
233 | ==== Cryptographic Properties ==== |
---|
234 | |
---|
235 | This scheme protects the data's confidentiality with 192 bits of key |
---|
236 | material, since the read-cap contains 192 secret bits (derived from an |
---|
237 | encrypted salt, which is encrypted using those same 192 bits plus some |
---|
238 | additional public material). |
---|
239 | |
---|
240 | The integrity of the data (assuming that the signature is valid) is protected |
---|
241 | by the 256-bit hash which gets included in the signature. The privilege of |
---|
242 | modifying the data (equivalent to the ability to form a valid signature) is |
---|
243 | protected by a 256 bit random DSA private key, and the difficulty of |
---|
244 | computing a discrete logarithm in a 2048-bit field. |
---|
245 | |
---|
246 | There are a few weaker denial-of-service attacks possible. If N-k+1 of the |
---|
247 | shares are damaged or unavailable, the client will be unable to recover the |
---|
248 | file. Any coalition of more than N-k shareholders will be able to effect this |
---|
249 | attack by merely refusing to provide the desired share. The "write enabler" |
---|
250 | shared secret protects existing shares from being displaced by new ones, |
---|
251 | except by the holder of the write-cap. One server cannot affect the other |
---|
252 | shares of the same file, once those other shares are in place. |
---|
253 | |
---|
254 | The worst DoS attack is the "roadblock attack", which must be made before |
---|
255 | those shares get placed. Storage indexes are effectively random (being |
---|
256 | derived from the hash of a random value), so they are not guessable before |
---|
257 | the writer begins their upload, but there is a window of vulnerability during |
---|
258 | the beginning of the upload, when some servers have heard about the storage |
---|
259 | index but not all of them. |
---|
260 | |
---|
261 | The roadblock attack we want to prevent is when the first server that the |
---|
262 | uploader contacts quickly runs to all the other selected servers and places a |
---|
263 | bogus share under the same storage index, before the uploader can contact |
---|
264 | them. These shares will normally be accepted, since storage servers create |
---|
265 | new shares on demand. The bogus shares would have randomly-generated |
---|
266 | write-enablers, which will of course be different than the real uploader's |
---|
267 | write-enabler, since the malicious server does not know the write-cap. |
---|
268 | |
---|
269 | If this attack were successful, the uploader would be unable to place any of |
---|
270 | their shares, because the slots have already been filled by the bogus shares. |
---|
271 | The uploader would probably try for peers further and further away from the |
---|
272 | desired location, but eventually they will hit a preconfigured distance limit |
---|
273 | and give up. In addition, the further the writer searches, the less likely it |
---|
274 | is that a reader will search as far. So a successful attack will either cause |
---|
275 | the file to be uploaded but not be reachable, or it will cause the upload to |
---|
276 | fail. |
---|
277 | |
---|
278 | If the uploader tries again (creating a new privkey), they may get lucky and |
---|
279 | the malicious servers will appear later in the query list, giving sufficient |
---|
280 | honest servers a chance to see their share before the malicious one manages |
---|
281 | to place bogus ones. |
---|
282 | |
---|
283 | The first line of defense against this attack is the timing challenges: the |
---|
284 | attacking server must be ready to act the moment a storage request arrives |
---|
285 | (which will only occur for a certain percentage of all new-file uploads), and |
---|
286 | only has a few seconds to act before the other servers will have allocated |
---|
287 | the shares (and recorded the write-enabler, terminating the window of |
---|
288 | vulnerability). |
---|
289 | |
---|
290 | The second line of defense is post-verification, and is possible because the |
---|
291 | storage index is partially derived from the public key hash. A storage server |
---|
292 | can, at any time, verify every public bit of the container as being signed by |
---|
293 | the verification key (this operation is recommended as a continual background |
---|
294 | process, when disk usage is minimal, to detect disk errors). The server can |
---|
295 | also hash the verification key to derive 64 bits of the storage index. If it |
---|
296 | detects that these 64 bits do not match (but the rest of the share validates |
---|
297 | correctly), then the implication is that this share was stored to the wrong |
---|
298 | storage index, either due to a bug or a roadblock attack. |
---|
299 | |
---|
300 | If an uploader finds that they are unable to place their shares because of |
---|
301 | "bad write enabler errors" (as reported by the prospective storage servers), |
---|
302 | it can "cry foul", and ask the storage server to perform this verification on |
---|
303 | the share in question. If the pubkey and storage index do not match, the |
---|
304 | storage server can delete the bogus share, thus allowing the real uploader to |
---|
305 | place their share. Of course the origin of the offending bogus share should |
---|
306 | be logged and reported to a central authority, so corrective measures can be |
---|
307 | taken. It may be necessary to have this "cry foul" protocol include the new |
---|
308 | write-enabler, to close the window during which the malicious server can |
---|
309 | re-submit the bogus share during the adjudication process. |
---|
310 | |
---|
311 | If the problem persists, the servers can be placed into pre-verification |
---|
312 | mode, in which this verification is performed on all potential shares before |
---|
313 | being committed to disk. This mode is more CPU-intensive (since normally the |
---|
314 | storage server ignores the contents of the container altogether), but would |
---|
315 | solve the problem completely. |
---|
316 | |
---|
317 | The mere existence of these potential defenses should be sufficient to deter |
---|
318 | any actual attacks. Note that the storage index only has 64 bits of |
---|
319 | pubkey-derived data in it, which is below the usual crypto guidelines for |
---|
320 | security factors. In this case it's a pre-image attack which would be needed, |
---|
321 | rather than a collision, and the actual attack would be to find a keypair for |
---|
322 | which the public key can be hashed three times to produce the desired portion |
---|
323 | of the storage index. We believe that 64 bits of material is sufficiently |
---|
324 | resistant to this form of pre-image attack to serve as a suitable deterrent. |
---|
325 | |
---|
326 | === SMDF Slot Format === |
---|
327 | |
---|
328 | This SMDF data lives inside a server-side MutableSlot container. The server |
---|
329 | is generally oblivious to this format, but it may look inside the container |
---|
330 | when verification is desired. |
---|
331 | |
---|
332 | This data is tightly packed. There are no gaps left between the different |
---|
333 | fields, and the offset table is mainly present to allow future flexibility of |
---|
334 | key sizes. |
---|
335 | |
---|
336 | # offset size name |
---|
337 | 1 0 1 version byte, \x01 for this format |
---|
338 | 2 1 8 sequence number. 2^64-1 must be handled specially, TBD |
---|
339 | 3 9 32 "R" (root of share hash Merkle tree) |
---|
340 | 4 41 32 data salt (readkey is H(readcap+data_salt)) |
---|
341 | 5 73 32 encrypted salt (AESenc(key=H(readcap), salt) |
---|
342 | 6 105 18 encoding parameters: |
---|
343 | 105 1 k |
---|
344 | 106 1 N |
---|
345 | 107 8 segment size |
---|
346 | 115 8 data length (of original plaintext) |
---|
347 | 7 123 36 offset table: |
---|
348 | 127 4 (9) signature |
---|
349 | 131 4 (10) share hash chain |
---|
350 | 135 4 (11) block hash tree |
---|
351 | 139 4 (12) share data |
---|
352 | 143 8 (13) EOF |
---|
353 | 8 151 256 verification key (2048bit DSA key) |
---|
354 | 9 407 40 signature=DSAsig(H([1,2,3,4,5,6])) |
---|
355 | 10 447 (a) share hash chain, encoded as: |
---|
356 | "".join([pack(">H32s", shnum, hash) |
---|
357 | for (shnum,hash) in needed_hashes]) |
---|
358 | 11 ?? (b) block hash tree, encoded as: |
---|
359 | "".join([pack(">32s",hash) for hash in block_hash_tree]) |
---|
360 | 12 ?? LEN share data |
---|
361 | 13 ?? -- EOF |
---|
362 | |
---|
363 | (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long. |
---|
364 | This is the set of hashes necessary to validate this share's leaf in the |
---|
365 | share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes. |
---|
366 | (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes |
---|
367 | long. This is the set of hashes necessary to validate any given block of |
---|
368 | share data up to the per-share root "r". Each "r" is a leaf of the share |
---|
369 | has tree (with root "R"), from which a minimal subset of hashes is put in |
---|
370 | the share hash chain in (8). |
---|
371 | |
---|
372 | == TODO == |
---|
373 | |
---|
374 | Every node in a given tahoe grid must have the same common DSA moduli and |
---|
375 | exponent, but different grids could use different parameters. We haven't |
---|
376 | figured out how to define a "grid id" yet, but I think the DSA parameters |
---|
377 | should be part of that identifier. In practical terms, this might mean that |
---|
378 | the Introducer tells each node what parameters to use, or perhaps the node |
---|
379 | could have a config file which specifies them instead. |
---|
380 | |
---|
381 | The shares MUST have a ciphertext hash of some sort (probably a merkle tree |
---|
382 | over the blocks, and/or a flat hash of the ciphertext), just like immutable |
---|
383 | files do. Without this, a malicious publisher could produce some shares that |
---|
384 | result in file A, and other shares that result in file B, and upload both of |
---|
385 | them (incorporating both into the share hash tree). The result would be a |
---|
386 | read-cap that would sometimes resolve to file A, and sometimes to file B, |
---|
387 | depending upon which servers were used for the download. By including a |
---|
388 | ciphertext hash in the SDMF data structure, the publisher must commit to just |
---|
389 | a single ciphertext, closing this hole. See ticket #492 for more details. |
---|
390 | |
---|