Context Navigation

leasedb.rst

Visit:

Last change on this file was f81900e, checked in by Brian Warner <warner@…>, at 2016-03-30T08:26:47Z

format docs for Sphinx

Added indexes, fixed cross-references.

Also a few pip-related cleanups I noticed along the way.

Property mode set to 100644

File size: 10.4 KB

Line
1	.. -- coding: utf-8-with-signature --
2
3	=====================
4	Lease database design
5	=====================
6
7	The target audience for this document is developers who wish to understand
8	the new lease database (leasedb) planned to be added in Tahoe-LAFS v1.11.0.
9
10
11	Introduction
12	------------
13
14	A "lease" is a request by an account that a share not be deleted before a
15	specified time. Each storage server stores leases in order to know which
16	shares to spare from garbage collection.
17
18	Motivation
19	----------
20
21	The leasedb will replace the current design in which leases are stored in
22	the storage server's share container files. That design has several
23	disadvantages:
24
25	- Updating a lease requires modifying a share container file (even for
26	immutable shares). This complicates the implementation of share classes.
27	The mixing of share contents and lease data in share files also led to a
28	security bug (ticket `#1528`_).
29
30	- When only the disk backend is supported, it is possible to read and
31	update leases synchronously because the share files are stored locally
32	to the storage server. For the cloud backend, accessing share files
33	requires an HTTP request, and so must be asynchronous. Accepting this
34	asynchrony for lease queries would be both inefficient and complex.
35	Moving lease information out of shares and into a local database allows
36	lease queries to stay synchronous.
37
38	Also, the current cryptographic protocol for renewing and cancelling leases
39	(based on shared secrets derived from secure hash functions) is complex,
40	and the cancellation part was never used.
41
42	The leasedb solves the first two problems by storing the lease information in
43	a local database instead of in the share container files. The share data
44	itself is still held in the share container file.
45
46	At the same time as implementing leasedb, we devised a simpler protocol for
47	allocating and cancelling leases: a client can use a public key digital
48	signature to authenticate access to a foolscap object representing the
49	authority of an account. This protocol is not yet implemented; at the time
50	of writing, only an "anonymous" account is supported.
51
52	The leasedb also provides an efficient way to get summarized information,
53	such as total space usage of shares leased by an account, for accounting
54	purposes.
55
56	.. _`#1528`: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1528
57
58
59	Design constraints
60	------------------
61
62	A share is stored as a collection of objects. The persistent storage may be
63	remote from the server (for example, cloud storage).
64
65	Writing to the persistent store objects is in general not an atomic
66	operation. So the leasedb also keeps track of which shares are in an
67	inconsistent state because they have been partly written. (This may
68	change in future when we implement a protocol to improve atomicity of
69	updates to mutable shares.)
70
71	Leases are no longer stored in shares. The same share format is used as
72	before, but the lease slots are ignored, and are cleared when rewriting a
73	mutable share. The new design also does not use lease renewal or cancel
74	secrets. (They are accepted as parameters in the storage protocol interfaces
75	for backward compatibility, but are ignored. Cancel secrets were already
76	ignored due to the fix for `#1528`_.)
77
78	The new design needs to be fail-safe in the sense that if the lease database
79	is lost or corruption is detected, no share data will be lost (even though
80	the metadata about leases held by particular accounts has been lost).
81
82
83	Accounting crawler
84	------------------
85
86	A "crawler" is a long-running process that visits share container files at a
87	slow rate, so as not to overload the server by trying to visit all share
88	container files one after another immediately.
89
90	The accounting crawler replaces the previous "lease crawler". It examines
91	each share container file and compares it with the state of the leasedb, and
92	may update the state of the share and/or the leasedb.
93
94	The accounting crawler may perform the following functions (but see ticket
95	#1834 for a proposal to reduce the scope of its responsibility):
96
97	- Remove leases that are past their expiration time. (Currently, this is
98	done automatically before deleting shares, but we plan to allow expiration
99	to be performed separately for individual accounts in future.)
100
101	- Delete the objects containing unleased shares — that is, shares that have
102	stable entries in the leasedb but no current leases (see below for the
103	definition of "stable" entries).
104
105	- Discover shares that have been manually added to storage, via ``scp`` or
106	some other out-of-band means.
107
108	- Discover shares that are present when a storage server is upgraded to
109	a leasedb-supporting version from a previous version, and give them
110	"starter leases".
111
112	- Recover from a situation where the leasedb is lost or detectably
113	corrupted. This is handled in the same way as upgrading from a previous
114	version.
115
116	- Detect shares that have unexpectedly disappeared from storage. The
117	disappearance of a share is logged, and its entry and leases are removed
118	from the leasedb.
119
120
121	Accounts
122	--------
123
124	An account holds leases for some subset of shares stored by a server. The
125	leasedb schema can handle many distinct accounts, but for the time being we
126	create only two accounts: an anonymous account and a starter account. The
127	starter account is used for leases on shares discovered by the accounting
128	crawler; the anonymous account is used for all other leases.
129
130	The leasedb has at most one lease entry per account per (storage_index,
131	shnum) pair. This entry stores the times when the lease was last renewed and
132	when it is set to expire (if the expiration policy does not force it to
133	expire earlier), represented as Unix UTC-seconds-since-epoch timestamps.
134
135	For more on expiration policy, see :doc:`../garbage-collection`.
136
137
138	Share states
139	------------
140
141	The leasedb holds an explicit indicator of the state of each share.
142
143	The diagram and descriptions below give the possible values of the "state"
144	indicator, what that value means, and transitions between states, for any
145	(storage_index, shnum) pair on each server::
146
147
148	# STATE_STABLE -------.
149	# ^ \| ^ \| \|
150	# \| v \| \| v
151	# STATE_COMING \| \| STATE_GOING
152	# ^ \| \| \|
153	# \| \| v \|
154	# '----- NONE <------'
155
156
157	NONE: There is no entry in the ``shares`` table for this (storage_index,
158	shnum) in this server's leasedb. This is the initial state.
159
160	STATE_COMING: The share is being created or (if a mutable share)
161	updated. The store objects may have been at least partially written, but
162	the storage server doesn't have confirmation that they have all been
163	completely written.
164
165	STATE_STABLE: The store objects have been completely written and are
166	not in the process of being modified or deleted by the storage server. (It
167	could have been modified or deleted behind the back of the storage server,
168	but if it has, the server has not noticed that yet.) The share may or may not
169	be leased.
170
171	STATE_GOING: The share is being deleted.
172
173	State transitions
174	-----------------
175
176	• STATE_GOING → NONE
177
178	trigger: The storage server gains confidence that all store objects for
179	the share have been removed.
180
181	implementation:
182
183	1. Remove the entry in the leasedb.
184
185	• STATE_STABLE → NONE
186
187	trigger: The accounting crawler noticed that all the store objects for
188	this share are gone.
189
190	implementation:
191
192	1. Remove the entry in the leasedb.
193
194	• NONE → STATE_COMING
195
196	triggers: A new share is being created, as explicitly signalled by a
197	client invoking a creation command, or the accounting crawler discovers
198	an incomplete share.
199
200	implementation:
201
202	1. Add an entry to the leasedb with STATE_COMING.
203
204	2. (In case of explicit creation) begin writing the store objects to hold
205	the share.
206
207	• STATE_STABLE → STATE_COMING
208
209	trigger: A mutable share is being modified, as explicitly signalled by a
210	client invoking a modification command.
211
212	implementation:
213
214	1. Add an entry to the leasedb with STATE_COMING.
215
216	2. Begin updating the store objects.
217
218	• STATE_COMING → STATE_STABLE
219
220	trigger: All store objects have been written.
221
222	implementation:
223
224	1. Change the state value of this entry in the leasedb from
225	STATE_COMING to STATE_STABLE.
226
227	• NONE → STATE_STABLE
228
229	trigger: The accounting crawler discovers a complete share.
230
231	implementation:
232
233	1. Add an entry to the leasedb with STATE_STABLE.
234
235	• STATE_STABLE → STATE_GOING
236
237	trigger: The share should be deleted because it is unleased.
238
239	implementation:
240
241	1. Change the state value of this entry in the leasedb from
242	STATE_STABLE to STATE_GOING.
243
244	2. Initiate removal of the store objects.
245
246
247	The following constraints are needed to avoid race conditions:
248
249	- While a share is being deleted (entry in STATE_GOING), we do not accept
250	any requests to recreate it. That would result in add and delete requests
251	for store objects being sent concurrently, with undefined results.
252
253	- While a share is being added or modified (entry in STATE_COMING), we
254	treat it as leased.
255
256	- Creation or modification requests for a given mutable share are serialized.
257
258
259	Unresolved design issues
260	------------------------
261
262	- What happens if a write to store objects for a new share fails
263	permanently? If we delete the share entry, then the accounting crawler
264	will eventually get to those store objects and see that their lengths
265	are inconsistent with the length in the container header. This will cause
266	the share to be treated as corrupted. Should we instead attempt to
267	delete those objects immediately? If so, do we need a direct
268	STATE_COMING → STATE_GOING transition to handle this case?
269
270	- What happens if only some store objects for a share disappear
271	unexpectedly? This case is similar to only some objects having been
272	written when we get an unrecoverable error during creation of a share, but
273	perhaps we want to treat it differently in order to preserve information
274	about the storage service having lost data.
275
276	- Does the leasedb need to track corrupted shares?
277
278
279	Future directions
280	-----------------
281
282	Clients will have key pairs identifying accounts, and will be able to add
283	leases for a specific account. Various space usage policies can be defined.
284
285	Better migration tools ('tahoe storage export'?) will create export files
286	that include both the share data and the lease data, and then an import tool
287	will both put the share in the right place and update the recipient node's
288	leasedb.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/docs/proposed/leasedb.rst

Download in other formats: