1 | .. -*- coding: utf-8-with-signature -*- |
---|
2 | |
---|
3 | ============================================ |
---|
4 | Performance costs for some common operations |
---|
5 | ============================================ |
---|
6 | |
---|
7 | 1. `Publishing an A-byte immutable file`_ |
---|
8 | 2. `Publishing an A-byte mutable file`_ |
---|
9 | 3. `Downloading B bytes of an A-byte immutable file`_ |
---|
10 | 4. `Downloading B bytes of an A-byte mutable file`_ |
---|
11 | 5. `Modifying B bytes of an A-byte mutable file`_ |
---|
12 | 6. `Inserting/Removing B bytes in an A-byte mutable file`_ |
---|
13 | 7. `Adding an entry to an A-entry directory`_ |
---|
14 | 8. `Listing an A entry directory`_ |
---|
15 | 9. `Checking an A-byte file`_ |
---|
16 | 10. `Verifying an A-byte file (immutable)`_ |
---|
17 | 11. `Repairing an A-byte file (mutable or immutable)`_ |
---|
18 | |
---|
19 | ``K`` indicates the number of shares required to reconstruct the file |
---|
20 | (default: 3) |
---|
21 | |
---|
22 | ``N`` indicates the total number of shares produced (default: 10) |
---|
23 | |
---|
24 | ``S`` indicates the segment size (default: 128 KiB) |
---|
25 | |
---|
26 | ``A`` indicates the number of bytes in a file |
---|
27 | |
---|
28 | ``B`` indicates the number of bytes of a file that are being read or |
---|
29 | written |
---|
30 | |
---|
31 | ``G`` indicates the number of storage servers on your grid |
---|
32 | |
---|
33 | Most of these cost estimates may have a further constant multiplier: when a |
---|
34 | formula says ``N/K*S``, the cost may actually be ``2*N/K*S`` or ``3*N/K*S``. |
---|
35 | Also note that all references to mutable files are for SDMF-formatted files; |
---|
36 | this document has not yet been updated to describe the MDMF format. |
---|
37 | |
---|
38 | Publishing an ``A``-byte immutable file |
---|
39 | ======================================= |
---|
40 | |
---|
41 | when the file is already uploaded |
---|
42 | --------------------------------- |
---|
43 | |
---|
44 | If the file is already uploaded with the exact same contents, same |
---|
45 | erasure coding parameters (K, N), and same added convergence secret, |
---|
46 | then it reads the whole file from disk one time while hashing it to |
---|
47 | compute the storage index, then contacts about N servers to ask each |
---|
48 | one to store a share. All of the servers reply that they already have |
---|
49 | a copy of that share, and the upload is done. |
---|
50 | |
---|
51 | disk: A |
---|
52 | |
---|
53 | cpu: ~A |
---|
54 | |
---|
55 | network: ~N |
---|
56 | |
---|
57 | memory footprint: S |
---|
58 | |
---|
59 | when the file is not already uploaded |
---|
60 | ------------------------------------- |
---|
61 | |
---|
62 | If the file is not already uploaded with the exact same contents, same |
---|
63 | erasure coding parameters (K, N), and same added convergence secret, |
---|
64 | then it reads the whole file from disk one time while hashing it to |
---|
65 | compute the storage index, then contacts about N servers to ask each |
---|
66 | one to store a share. Then it uploads each share to a storage server. |
---|
67 | |
---|
68 | disk: 2*A |
---|
69 | |
---|
70 | cpu: 2*~A |
---|
71 | |
---|
72 | network: N/K*A |
---|
73 | |
---|
74 | memory footprint: N/K*S |
---|
75 | |
---|
76 | Publishing an ``A``-byte mutable file |
---|
77 | ===================================== |
---|
78 | |
---|
79 | cpu: ~A + a large constant for RSA keypair generation |
---|
80 | |
---|
81 | network: A |
---|
82 | |
---|
83 | memory footprint: N/K*A |
---|
84 | |
---|
85 | notes: |
---|
86 | Tahoe-LAFS generates a new RSA keypair for each mutable file that it publishes to a grid. |
---|
87 | This takes around 100 milliseconds on a relatively high-end laptop from 2021. |
---|
88 | |
---|
89 | Part of the process of encrypting, encoding, and uploading a mutable file to a |
---|
90 | Tahoe-LAFS grid requires that the entire file be in memory at once. For larger |
---|
91 | files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint |
---|
92 | (at least when uploading a mutable file). |
---|
93 | |
---|
94 | Downloading ``B`` bytes of an ``A``-byte immutable file |
---|
95 | ======================================================= |
---|
96 | |
---|
97 | cpu: ~B |
---|
98 | |
---|
99 | network: B |
---|
100 | |
---|
101 | notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary |
---|
102 | range of an immutable file, only the S-byte segments that overlap the |
---|
103 | requested range will be downloaded. |
---|
104 | |
---|
105 | (Earlier versions would download from the beginning of the file up |
---|
106 | until the end of the requested range, and then continue to download |
---|
107 | the rest of the file even after the request was satisfied.) |
---|
108 | |
---|
109 | Downloading ``B`` bytes of an ``A``-byte mutable file |
---|
110 | ===================================================== |
---|
111 | |
---|
112 | cpu: ~A |
---|
113 | |
---|
114 | network: A |
---|
115 | |
---|
116 | memory footprint: A |
---|
117 | |
---|
118 | notes: As currently implemented, mutable files must be downloaded in |
---|
119 | their entirety before any part of them can be read. We are |
---|
120 | exploring fixes for this; see ticket #393 for more information. |
---|
121 | |
---|
122 | Modifying ``B`` bytes of an ``A``-byte mutable file |
---|
123 | =================================================== |
---|
124 | |
---|
125 | cpu: ~A |
---|
126 | |
---|
127 | network: A |
---|
128 | |
---|
129 | memory footprint: N/K*A |
---|
130 | |
---|
131 | notes: If you upload a changed version of a mutable file that you |
---|
132 | earlier put onto your grid with, say, 'tahoe put --mutable', |
---|
133 | Tahoe-LAFS will replace the old file with the new file on the |
---|
134 | grid, rather than attempting to modify only those portions of the |
---|
135 | file that have changed. Modifying a file in this manner is |
---|
136 | essentially uploading the file over again, except that it re-uses |
---|
137 | the existing RSA keypair instead of generating a new one. |
---|
138 | |
---|
139 | Inserting/Removing ``B`` bytes in an ``A``-byte mutable file |
---|
140 | ============================================================ |
---|
141 | |
---|
142 | cpu: ~A |
---|
143 | |
---|
144 | network: A |
---|
145 | |
---|
146 | memory footprint: N/K*A |
---|
147 | |
---|
148 | notes: Modifying any part of a mutable file in Tahoe-LAFS requires that |
---|
149 | the entire file be downloaded, modified, held in memory while it is |
---|
150 | encrypted and encoded, and then re-uploaded. A future version of the |
---|
151 | mutable file layout ("LDMF") may provide efficient inserts and |
---|
152 | deletes. Note that this sort of modification is mostly used internally |
---|
153 | for directories, and isn't something that the WUI, CLI, or other |
---|
154 | interfaces will do -- instead, they will simply overwrite the file to |
---|
155 | be modified, as described in "Modifying B bytes of an A-byte mutable |
---|
156 | file". |
---|
157 | |
---|
158 | Adding an entry to an ``A``-entry directory |
---|
159 | =========================================== |
---|
160 | |
---|
161 | cpu: ~A |
---|
162 | |
---|
163 | network: ~A |
---|
164 | |
---|
165 | memory footprint: N/K*~A |
---|
166 | |
---|
167 | notes: In Tahoe-LAFS, directories are implemented as specialized mutable |
---|
168 | files. So adding an entry to a directory is essentially adding B |
---|
169 | (actually, 300-330) bytes somewhere in an existing mutable file. |
---|
170 | |
---|
171 | Listing an ``A`` entry directory |
---|
172 | ================================ |
---|
173 | |
---|
174 | cpu: ~A |
---|
175 | |
---|
176 | network: ~A |
---|
177 | |
---|
178 | memory footprint: N/K*~A |
---|
179 | |
---|
180 | notes: Listing a directory requires that the mutable file storing the |
---|
181 | directory be downloaded from the grid. So listing an A entry |
---|
182 | directory requires downloading a (roughly) 330 * A byte mutable |
---|
183 | file, since each directory entry is about 300-330 bytes in size. |
---|
184 | |
---|
185 | Checking an ``A``-byte file |
---|
186 | =========================== |
---|
187 | |
---|
188 | cpu: ~G |
---|
189 | |
---|
190 | network: ~G |
---|
191 | |
---|
192 | memory footprint: negligible |
---|
193 | |
---|
194 | notes: To check a file, Tahoe-LAFS queries all the servers that it knows |
---|
195 | about. Note that neither of these values directly depend on the size |
---|
196 | of the file. This is relatively inexpensive, compared to the verify |
---|
197 | and repair operations. |
---|
198 | |
---|
199 | Verifying an A-byte file (immutable) |
---|
200 | ==================================== |
---|
201 | |
---|
202 | cpu: ~N/K*A |
---|
203 | |
---|
204 | network: N/K*A |
---|
205 | |
---|
206 | memory footprint: N/K*S |
---|
207 | |
---|
208 | notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext |
---|
209 | shares that were originally uploaded to the grid and integrity checks |
---|
210 | them. This is (for grids with good redundancy) more expensive than |
---|
211 | downloading an A-byte file, since only a fraction of these shares would |
---|
212 | be necessary to recover the file. |
---|
213 | |
---|
214 | Verifying an A-byte file (mutable) |
---|
215 | ================================== |
---|
216 | |
---|
217 | cpu: ~N/K*A |
---|
218 | |
---|
219 | network: N/K*A |
---|
220 | |
---|
221 | memory footprint: N/K*A |
---|
222 | |
---|
223 | notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext |
---|
224 | shares that were originally uploaded to the grid and integrity checks |
---|
225 | them. This is (for grids with good redundancy) more expensive than |
---|
226 | downloading an A-byte file, since only a fraction of these shares would |
---|
227 | be necessary to recover the file. |
---|
228 | |
---|
229 | Repairing an ``A``-byte file (mutable or immutable) |
---|
230 | =================================================== |
---|
231 | |
---|
232 | cpu: variable, between ~A and ~N/K*A |
---|
233 | |
---|
234 | network: variable; between A and N/K*A |
---|
235 | |
---|
236 | memory footprint (immutable): (1+N/K)*S |
---|
237 | (SDMF mutable): (1+N/K)*A |
---|
238 | |
---|
239 | notes: To repair a file, Tahoe-LAFS downloads the file, and |
---|
240 | generates/uploads missing shares in the same way as when it initially |
---|
241 | uploads the file. So, depending on how many shares are missing, this |
---|
242 | can cost as little as a download or as much as a download followed by |
---|
243 | a full upload. |
---|
244 | |
---|
245 | Since SDMF files have only one segment, which must be processed in its |
---|
246 | entirety, repair requires a full-file download followed by a full-file |
---|
247 | upload. |
---|