1 | .. -*- coding: utf-8-with-signature -*- |
---|
2 | |
---|
3 | ============= |
---|
4 | File Encoding |
---|
5 | ============= |
---|
6 | |
---|
7 | When the client wishes to upload an immutable file, the first step is to |
---|
8 | decide upon an encryption key. There are two methods: convergent or random. |
---|
9 | The goal of the convergent-key method is to make sure that multiple uploads |
---|
10 | of the same file will result in only one copy on the grid, whereas the |
---|
11 | random-key method does not provide this "convergence" feature. |
---|
12 | |
---|
13 | The convergent-key method computes the SHA-256d hash of a single-purpose tag, |
---|
14 | the encoding parameters, a "convergence secret", and the contents of the |
---|
15 | file. It uses a portion of the resulting hash as the AES encryption key. |
---|
16 | There are security concerns with using convergence this approach (the |
---|
17 | "partial-information guessing attack", please see ticket #365 for some |
---|
18 | references), so Tahoe uses a separate (randomly-generated) "convergence |
---|
19 | secret" for each node, stored in NODEDIR/private/convergence . The encoding |
---|
20 | parameters (k, N, and the segment size) are included in the hash to make sure |
---|
21 | that two different encodings of the same file will get different keys. This |
---|
22 | method requires an extra IO pass over the file, to compute this key, and |
---|
23 | encryption cannot be started until the pass is complete. This means that the |
---|
24 | convergent-key method will require at least two total passes over the file. |
---|
25 | |
---|
26 | The random-key method simply chooses a random encryption key. Convergence is |
---|
27 | disabled, however this method does not require a separate IO pass, so upload |
---|
28 | can be done with a single pass. This mode makes it easier to perform |
---|
29 | streaming upload. |
---|
30 | |
---|
31 | Regardless of which method is used to generate the key, the plaintext file is |
---|
32 | encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is |
---|
33 | then erasure-coded and uploaded to the servers. Two hashes of the ciphertext |
---|
34 | are generated as the encryption proceeds: a flat hash of the whole |
---|
35 | ciphertext, and a Merkle tree. These are used to verify the correctness of |
---|
36 | the erasure decoding step, and can be used by a "verifier" process to make |
---|
37 | sure the file is intact without requiring the decryption key. |
---|
38 | |
---|
39 | The encryption key is hashed (with SHA-256d and a single-purpose tag) to |
---|
40 | produce the "Storage Index". This Storage Index (or SI) is used to identify |
---|
41 | the shares produced by the method described below. The grid can be thought of |
---|
42 | as a large table that maps Storage Index to a ciphertext. Since the |
---|
43 | ciphertext is stored as erasure-coded shares, it can also be thought of as a |
---|
44 | table that maps SI to shares. |
---|
45 | |
---|
46 | Anybody who knows a Storage Index can retrieve the associated ciphertext: |
---|
47 | ciphertexts are not secret. |
---|
48 | |
---|
49 | .. image:: file-encoding1.svg |
---|
50 | |
---|
51 | The ciphertext file is then broken up into segments. The last segment is |
---|
52 | likely to be shorter than the rest. Each segment is erasure-coded into a |
---|
53 | number of "blocks". This takes place one segment at a time. (In fact, |
---|
54 | encryption and erasure-coding take place at the same time, once per plaintext |
---|
55 | segment). Larger segment sizes result in less overhead overall, but increase |
---|
56 | both the memory footprint and the "alacrity" (the number of bytes we have to |
---|
57 | receive before we can deliver validated plaintext to the user). The current |
---|
58 | default segment size is 128KiB. |
---|
59 | |
---|
60 | One block from each segment is sent to each shareholder (aka leaseholder, |
---|
61 | aka landlord, aka storage node, aka peer). The "share" held by each remote |
---|
62 | shareholder is nominally just a collection of these blocks. The file will |
---|
63 | be recoverable when a certain number of shares have been retrieved. |
---|
64 | |
---|
65 | .. image:: file-encoding2.svg |
---|
66 | |
---|
67 | The blocks are hashed as they are generated and transmitted. These |
---|
68 | block hashes are put into a Merkle hash tree. When the last share has been |
---|
69 | created, the merkle tree is completed and delivered to the peer. Later, when |
---|
70 | we retrieve these blocks, the peer will send many of the merkle hash tree |
---|
71 | nodes ahead of time, so we can validate each block independently. |
---|
72 | |
---|
73 | The root of this block hash tree is called the "block root hash" and |
---|
74 | used in the next step. |
---|
75 | |
---|
76 | .. image:: file-encoding3.svg |
---|
77 | |
---|
78 | There is a higher-level Merkle tree called the "share hash tree". Its leaves |
---|
79 | are the block root hashes from each share. The root of this tree is called |
---|
80 | the "share root hash" and is included in the "URI Extension Block", aka UEB. |
---|
81 | The ciphertext hash and Merkle tree are also put here, along with the |
---|
82 | original file size, and the encoding parameters. The UEB contains all the |
---|
83 | non-secret values that could be put in the URI, but would have made the URI |
---|
84 | too big. So instead, the UEB is stored with the share, and the hash of the |
---|
85 | UEB is put in the URI. |
---|
86 | |
---|
87 | The URI then contains the secret encryption key and the UEB hash. It also |
---|
88 | contains the basic encoding parameters (k and N) and the file size, to make |
---|
89 | download more efficient (by knowing the number of required shares ahead of |
---|
90 | time, sufficient download queries can be generated in parallel). |
---|
91 | |
---|
92 | The URI (also known as the immutable-file read-cap, since possessing it |
---|
93 | grants the holder the capability to read the file's plaintext) is then |
---|
94 | represented as a (relatively) short printable string like so:: |
---|
95 | |
---|
96 | URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000 |
---|
97 | |
---|
98 | .. image:: file-encoding4.svg |
---|
99 | |
---|
100 | During download, when a peer begins to transmit a share, it first transmits |
---|
101 | all of the parts of the share hash tree that are necessary to validate its |
---|
102 | block root hash. Then it transmits the portions of the block hash tree |
---|
103 | that are necessary to validate the first block. Then it transmits the |
---|
104 | first block. It then continues this loop: transmitting any portions of the |
---|
105 | block hash tree to validate block#N, then sending block#N. |
---|
106 | |
---|
107 | .. image:: file-encoding5.svg |
---|
108 | |
---|
109 | So the "share" that is sent to the remote peer actually consists of three |
---|
110 | pieces, sent in a specific order as they become available, and retrieved |
---|
111 | during download in a different order according to when they are needed. |
---|
112 | |
---|
113 | The first piece is the blocks themselves, one per segment. The last |
---|
114 | block will likely be shorter than the rest, because the last segment is |
---|
115 | probably shorter than the rest. The second piece is the block hash tree, |
---|
116 | consisting of a total of two SHA-1 hashes per block. The third piece is a |
---|
117 | hash chain from the share hash tree, consisting of log2(numshares) hashes. |
---|
118 | |
---|
119 | During upload, all blocks are sent first, followed by the block hash |
---|
120 | tree, followed by the share hash chain. During download, the share hash chain |
---|
121 | is delivered first, followed by the block root hash. The client then uses |
---|
122 | the hash chain to validate the block root hash. Then the peer delivers |
---|
123 | enough of the block hash tree to validate the first block, followed by |
---|
124 | the first block itself. The block hash chain is used to validate the |
---|
125 | block, then it is passed (along with the first block from several other |
---|
126 | peers) into decoding, to produce the first segment of crypttext, which is |
---|
127 | then decrypted to produce the first segment of plaintext, which is finally |
---|
128 | delivered to the user. |
---|
129 | |
---|
130 | .. image:: file-encoding6.svg |
---|
131 | |
---|
132 | |
---|
133 | Hashes |
---|
134 | ====== |
---|
135 | |
---|
136 | All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson |
---|
137 | and Schneier). All hashes use a single-purpose tag, e.g. the hash that |
---|
138 | converts an encryption key into a storage index is defined as follows:: |
---|
139 | |
---|
140 | SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key) |
---|
141 | |
---|
142 | When two separate values need to be combined together in a hash, we wrap each |
---|
143 | in a netstring. |
---|
144 | |
---|
145 | Using SHA-256d (instead of plain SHA-256) guards against length-extension |
---|
146 | attacks. Using the tag protects our Merkle trees against attacks in which the |
---|
147 | hash of a leaf is confused with a hash of two children (allowing an attacker |
---|
148 | to generate corrupted data that nevertheless appears to be valid), and is |
---|
149 | simply good "cryptograhic hygiene". The `“Chosen Protocol Attack” by Kelsey, |
---|
150 | Schneier, and Wagner`_ is relevant. Putting the tag in a netstring guards |
---|
151 | against attacks that seek to confuse the end of the tag with the beginning of |
---|
152 | the subsequent value. |
---|
153 | |
---|
154 | .. _“Chosen Protocol Attack” by Kelsey, Schneier, and Wagner: http://www.schneier.com/paper-chosen-protocol.html |
---|