1 | .. -*- coding: utf-8-with-signature -*- |
---|
2 | |
---|
3 | =========================== |
---|
4 | Garbage Collection in Tahoe |
---|
5 | =========================== |
---|
6 | |
---|
7 | 1. `Overview`_ |
---|
8 | 2. `Client-side Renewal`_ |
---|
9 | 3. `Server Side Expiration`_ |
---|
10 | 4. `Expiration Progress`_ |
---|
11 | 5. `Future Directions`_ |
---|
12 | |
---|
13 | Overview |
---|
14 | ======== |
---|
15 | |
---|
16 | When a file or directory in a Tahoe-LAFS file store is no longer referenced, |
---|
17 | the space that its shares occupied on each storage server can be freed, |
---|
18 | making room for other shares. Tahoe currently uses a garbage collection |
---|
19 | ("GC") mechanism to implement this space-reclamation process. Each share has |
---|
20 | one or more "leases", which are managed by clients who want the |
---|
21 | file/directory to be retained. The storage server accepts each share for a |
---|
22 | pre-defined period of time, and is allowed to delete the share if all of the |
---|
23 | leases expire. |
---|
24 | |
---|
25 | Garbage collection is not enabled by default: storage servers will not delete |
---|
26 | shares without being explicitly configured to do so. When GC is enabled, |
---|
27 | clients are responsible for renewing their leases on a periodic basis at |
---|
28 | least frequently enough to prevent any of the leases from expiring before the |
---|
29 | next renewal pass. |
---|
30 | |
---|
31 | There are several tradeoffs to be considered when choosing the renewal timer |
---|
32 | and the lease duration, and there is no single optimal pair of values. See |
---|
33 | the following diagram to get an idea of the tradeoffs involved: |
---|
34 | |
---|
35 | .. image:: lease-tradeoffs.svg |
---|
36 | |
---|
37 | |
---|
38 | If lease renewal occurs quickly and with 100% reliability, than any renewal |
---|
39 | time that is shorter than the lease duration will suffice, but a larger ratio |
---|
40 | of duration-over-renewal-time will be more robust in the face of occasional |
---|
41 | delays or failures. |
---|
42 | |
---|
43 | The current recommended values for a small Tahoe grid are to renew the leases |
---|
44 | once a week, and give each lease a duration of 31 days. In the current |
---|
45 | release, there is not yet a way to create a lease with a different duration, |
---|
46 | but the server can use the ``expire.override_lease_duration`` configuration |
---|
47 | setting to increase or decrease the effective duration (when the lease is |
---|
48 | processed) to something other than 31 days. |
---|
49 | |
---|
50 | Renewing leases can be expected to take about one second per file/directory, |
---|
51 | depending upon the number of servers and the network speeds involved. |
---|
52 | |
---|
53 | |
---|
54 | |
---|
55 | Client-side Renewal |
---|
56 | =================== |
---|
57 | |
---|
58 | If all of the files and directories which you care about are reachable from a |
---|
59 | single starting point (usually referred to as a "rootcap"), and you store |
---|
60 | that rootcap as an alias (via "``tahoe create-alias``" for example), then the |
---|
61 | simplest way to renew these leases is with the following CLI command:: |
---|
62 | |
---|
63 | tahoe deep-check --add-lease ALIAS: |
---|
64 | |
---|
65 | This will recursively walk every directory under the given alias and renew |
---|
66 | the leases on all files and directories. (You may want to add a ``--repair`` |
---|
67 | flag to perform repair at the same time.) Simply run this command once a week |
---|
68 | (or whatever other renewal period your grid recommends) and make sure it |
---|
69 | completes successfully. As a side effect, a manifest of all unique files and |
---|
70 | directories will be emitted to stdout, as well as a summary of file sizes and |
---|
71 | counts. It may be useful to track these statistics over time. |
---|
72 | |
---|
73 | Note that newly uploaded files (and newly created directories) get an initial |
---|
74 | lease too: the ``--add-lease`` process is only needed to ensure that all |
---|
75 | older objects have up-to-date leases on them. |
---|
76 | |
---|
77 | A separate "rebalancing manager/service" is also planned -- see ticket |
---|
78 | `#543`_. The exact details of what this service will do are not settled, but |
---|
79 | it is likely to work by acquiring manifests from rootcaps on a periodic |
---|
80 | basis, keeping track of checker results, managing lease-addition, and |
---|
81 | prioritizing repair and rebalancing of shares. Eventually it may use multiple |
---|
82 | worker nodes to perform these jobs in parallel. |
---|
83 | |
---|
84 | .. _#543: http://tahoe-lafs.org/trac/tahoe-lafs/ticket/543 |
---|
85 | |
---|
86 | |
---|
87 | Server Side Expiration |
---|
88 | ====================== |
---|
89 | |
---|
90 | Expiration must be explicitly enabled on each storage server, since the |
---|
91 | default behavior is to never expire shares. Expiration is enabled by adding |
---|
92 | config keys to the ``[storage]`` section of the ``tahoe.cfg`` file (as described |
---|
93 | below) and restarting the server node. |
---|
94 | |
---|
95 | Each lease has two parameters: a create/renew timestamp and a duration. The |
---|
96 | timestamp is updated when the share is first uploaded (i.e. the file or |
---|
97 | directory is created), and updated again each time the lease is renewed (i.e. |
---|
98 | "``tahoe check --add-lease``" is performed). The duration is currently fixed |
---|
99 | at 31 days, and the "nominal lease expiration time" is simply $duration |
---|
100 | seconds after the $create_renew timestamp. (In a future release of Tahoe, the |
---|
101 | client will get to request a specific duration, and the server will accept or |
---|
102 | reject the request depending upon its local configuration, so that servers |
---|
103 | can achieve better control over their storage obligations.) |
---|
104 | |
---|
105 | The lease-expiration code has two modes of operation. The first is age-based: |
---|
106 | leases are expired when their age is greater than their duration. This is the |
---|
107 | preferred mode: as long as clients consistently update their leases on a |
---|
108 | periodic basis, and that period is shorter than the lease duration, then all |
---|
109 | active files and directories will be preserved, and the garbage will |
---|
110 | collected in a timely fashion. |
---|
111 | |
---|
112 | Since there is not yet a way for clients to request a lease duration of other |
---|
113 | than 31 days, there is a ``tahoe.cfg`` setting to override the duration of all |
---|
114 | leases. If, for example, this alternative duration is set to 60 days, then |
---|
115 | clients could safely renew their leases with an add-lease operation perhaps |
---|
116 | once every 50 days: even though nominally their leases would expire 31 days |
---|
117 | after the renewal, the server would not actually expire the leases until 60 |
---|
118 | days after renewal. |
---|
119 | |
---|
120 | The other mode is an absolute-date-cutoff: it compares the create/renew |
---|
121 | timestamp against some absolute date, and expires any lease which was not |
---|
122 | created or renewed since the cutoff date. If all clients have performed an |
---|
123 | add-lease some time after March 20th, you could tell the storage server to |
---|
124 | expire all leases that were created or last renewed on March 19th or earlier. |
---|
125 | This is most useful if you have a manual (non-periodic) add-lease process. |
---|
126 | Note that there is not much point to running a storage server in this mode |
---|
127 | for a long period of time: once the lease-checker has examined all shares and |
---|
128 | expired whatever it is going to expire, the second and subsequent passes are |
---|
129 | not going to find any new leases to remove. |
---|
130 | |
---|
131 | The ``tahoe.cfg`` file uses the following keys to control lease expiration: |
---|
132 | |
---|
133 | ``[storage]`` |
---|
134 | |
---|
135 | ``expire.enabled = (boolean, optional)`` |
---|
136 | |
---|
137 | If this is ``True``, the storage server will delete shares on which all |
---|
138 | leases have expired. Other controls dictate when leases are considered to |
---|
139 | have expired. The default is ``False``. |
---|
140 | |
---|
141 | ``expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)`` |
---|
142 | |
---|
143 | If this string is "age", the age-based expiration scheme is used, and the |
---|
144 | ``expire.override_lease_duration`` setting can be provided to influence the |
---|
145 | lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is |
---|
146 | used, and the ``expire.cutoff_date`` setting must be provided to specify |
---|
147 | the cutoff date. The mode setting currently has no default: you must |
---|
148 | provide a value. |
---|
149 | |
---|
150 | In a future release, this setting is likely to default to "age", but in |
---|
151 | this release it was deemed safer to require an explicit mode |
---|
152 | specification. |
---|
153 | |
---|
154 | ``expire.override_lease_duration = (duration string, optional)`` |
---|
155 | |
---|
156 | When age-based expiration is in use, a lease will be expired if its |
---|
157 | ``lease.create_renew`` timestamp plus its ``lease.duration`` time is |
---|
158 | earlier/older than the current time. This key, if present, overrides the |
---|
159 | duration value for all leases, changing the algorithm from:: |
---|
160 | |
---|
161 | if (lease.create_renew_timestamp + lease.duration) < now: |
---|
162 | expire_lease() |
---|
163 | |
---|
164 | to:: |
---|
165 | |
---|
166 | if (lease.create_renew_timestamp + override_lease_duration) < now: |
---|
167 | expire_lease() |
---|
168 | |
---|
169 | The value of this setting is a "duration string", which is a number of |
---|
170 | days, months, or years, followed by a units suffix, and optionally |
---|
171 | separated by a space, such as one of the following:: |
---|
172 | |
---|
173 | 7days |
---|
174 | 31day |
---|
175 | 60 days |
---|
176 | 2mo |
---|
177 | 3 month |
---|
178 | 12 months |
---|
179 | 2years |
---|
180 | |
---|
181 | This key is meant to compensate for the fact that clients do not yet have |
---|
182 | the ability to ask for leases that last longer than 31 days. A grid which |
---|
183 | wants to use faster or slower GC than a 31-day lease timer permits can |
---|
184 | use this parameter to implement it. |
---|
185 | |
---|
186 | This key is only valid when age-based expiration is in use (i.e. when |
---|
187 | ``expire.mode = age`` is used). It will be rejected if cutoff-date |
---|
188 | expiration is in use. |
---|
189 | |
---|
190 | ``expire.cutoff_date = (date string, required if mode=cutoff-date)`` |
---|
191 | |
---|
192 | When cutoff-date expiration is in use, a lease will be expired if its |
---|
193 | create/renew timestamp is older than the cutoff date. This string will be |
---|
194 | a date in the following format:: |
---|
195 | |
---|
196 | 2009-01-16 (January 16th, 2009) |
---|
197 | 2008-02-02 |
---|
198 | 2007-12-25 |
---|
199 | |
---|
200 | The actual cutoff time shall be midnight UTC at the beginning of the |
---|
201 | given day. Lease timers should naturally be generous enough to not depend |
---|
202 | upon differences in timezone: there should be at least a few days between |
---|
203 | the last renewal time and the cutoff date. |
---|
204 | |
---|
205 | This key is only valid when cutoff-based expiration is in use (i.e. when |
---|
206 | "expire.mode = cutoff-date"). It will be rejected if age-based expiration |
---|
207 | is in use. |
---|
208 | |
---|
209 | expire.immutable = (boolean, optional) |
---|
210 | |
---|
211 | If this is False, then immutable shares will never be deleted, even if |
---|
212 | their leases have expired. This can be used in special situations to |
---|
213 | perform GC on mutable files but not immutable ones. The default is True. |
---|
214 | |
---|
215 | expire.mutable = (boolean, optional) |
---|
216 | |
---|
217 | If this is False, then mutable shares will never be deleted, even if |
---|
218 | their leases have expired. This can be used in special situations to |
---|
219 | perform GC on immutable files but not mutable ones. The default is True. |
---|
220 | |
---|
221 | Expiration Progress |
---|
222 | =================== |
---|
223 | |
---|
224 | In the current release, leases are stored as metadata in each share file, and |
---|
225 | no separate database is maintained. As a result, checking and expiring leases |
---|
226 | on a large server may require multiple reads from each of several million |
---|
227 | share files. This process can take a long time and be very disk-intensive, so |
---|
228 | a "share crawler" is used. The crawler limits the amount of time looking at |
---|
229 | shares to a reasonable percentage of the storage server's overall usage: by |
---|
230 | default it uses no more than 10% CPU, and yields to other code after 100ms. A |
---|
231 | typical server with 1.1M shares was observed to take 3.5 days to perform this |
---|
232 | rate-limited crawl through the whole set of shares, with expiration disabled. |
---|
233 | It is expected to take perhaps 4 or 5 days to do the crawl with expiration |
---|
234 | turned on. |
---|
235 | |
---|
236 | The crawler's status is displayed on the "Storage Server Status Page", a web |
---|
237 | page dedicated to the storage server. This page resides at $NODEURL/storage, |
---|
238 | and there is a link to it from the front "welcome" page. The "Lease |
---|
239 | Expiration crawler" section of the status page shows the progress of the |
---|
240 | current crawler cycle, expected completion time, amount of space recovered, |
---|
241 | and details of how many shares have been examined. |
---|
242 | |
---|
243 | The crawler's state is persistent: restarting the node will not cause it to |
---|
244 | lose significant progress. The state file is located in two files |
---|
245 | ($BASEDIR/storage/lease_checker.state and lease_checker.history), and the |
---|
246 | crawler can be forcibly reset by stopping the node, deleting these two files, |
---|
247 | then restarting the node. |
---|
248 | |
---|
249 | Future Directions |
---|
250 | ================= |
---|
251 | |
---|
252 | Tahoe's GC mechanism is undergoing significant changes. The global |
---|
253 | mark-and-sweep garbage-collection scheme can require considerable network |
---|
254 | traffic for large grids, interfering with the bandwidth available for regular |
---|
255 | uploads and downloads (and for non-Tahoe users of the network). |
---|
256 | |
---|
257 | A preferable method might be to have a timer-per-client instead of a |
---|
258 | timer-per-lease: the leases would not be expired until/unless the client had |
---|
259 | not checked in with the server for a pre-determined duration. This would |
---|
260 | reduce the network traffic considerably (one message per week instead of |
---|
261 | thousands), but retain the same general failure characteristics. |
---|
262 | |
---|
263 | In addition, using timers is not fail-safe (from the client's point of view), |
---|
264 | in that a client which leaves the network for an extended period of time may |
---|
265 | return to discover that all of their files have been garbage-collected. (It |
---|
266 | *is* fail-safe from the server's point of view, in that a server is not |
---|
267 | obligated to provide disk space in perpetuity to an unresponsive client). It |
---|
268 | may be useful to create a "renewal agent" to which a client can pass a list |
---|
269 | of renewal-caps: the agent then takes the responsibility for keeping these |
---|
270 | leases renewed, so the client can go offline safely. Of course, this requires |
---|
271 | a certain amount of coordination: the renewal agent should not be keeping |
---|
272 | files alive that the client has actually deleted. The client can send the |
---|
273 | renewal-agent a manifest of renewal caps, and each new manifest should |
---|
274 | replace the previous set. |
---|
275 | |
---|
276 | The GC mechanism is also not immediate: a client which deletes a file will |
---|
277 | nevertheless be consuming extra disk space (and might be charged or otherwise |
---|
278 | held accountable for it) until the ex-file's leases finally expire on their |
---|
279 | own. |
---|
280 | |
---|
281 | In the current release, these leases are each associated with a single "node |
---|
282 | secret" (stored in $BASEDIR/private/secret), which is used to generate |
---|
283 | renewal-secrets for each lease. Two nodes with different secrets |
---|
284 | will produce separate leases, and will not be able to renew each |
---|
285 | others' leases. |
---|
286 | |
---|
287 | Once the Accounting project is in place, leases will be scoped by a |
---|
288 | sub-delegatable "account id" instead of a node secret, so clients will be able |
---|
289 | to manage multiple leases per file. In addition, servers will be able to |
---|
290 | identify which shares are leased by which clients, so that clients can safely |
---|
291 | reconcile their idea of which files/directories are active against the |
---|
292 | server's list, and explicitly cancel leases on objects that aren't on the |
---|
293 | active list. |
---|
294 | |
---|
295 | By reducing the size of the "lease scope", the coordination problem is made |
---|
296 | easier. In general, mark-and-sweep is easier to implement (it requires mere |
---|
297 | vigilance, rather than coordination), so unless the space used by deleted |
---|
298 | files is not expiring fast enough, the renew/expire timed lease approach is |
---|
299 | recommended. |
---|
300 | |
---|