Ticket #892: goodbye-vdrive-diff.txt

File goodbye-vdrive-diff.txt, 46.1 KB (added by davidsarah, at 2010-01-14T03:51:27Z)

Diff to remove references to 'vdrive' and 'virtual drive', and some other cleanups to architecture.txt and command synopses

Line 
1--- old-tahoe/docs/architecture.txt     2010-01-14 03:46:11.969000000 +0000
2+++ new-tahoe/docs/architecture.txt     2010-01-14 03:46:12.222000000 +0000
3@@ -5,14 +5,15 @@
4 
5 OVERVIEW
6 
7-At a high-level this system consists of three layers: the grid, the
8-filesystem, and the application.
9+At a high-level this system consists of three layers: the key-value store,
10+the filesystem, and the application.
11 
12-The lowest layer is the "grid", a key-value store mapping from capabilities to
13-data.  The capabilities are relatively short ascii strings, each used as a
14-reference to an arbitrary-length sequence of data bytes, and are like a URI
15-for that data. This data is encrypted and distributed across a number of
16-nodes, such that it will survive the loss of most of the nodes.
17+The lowest layer is the key-value store, which is a distributed hashtable
18+mapping from capabilities to data.  The capabilities are relatively short
19+ASCII strings, each used as a reference to an arbitrary-length sequence of
20+data bytes, and are like a URI for that data. This data is encrypted and
21+distributed across a number of nodes, such that it will survive the loss of
22+most of the nodes.
23 
24 The middle layer is the decentralized filesystem: a directed graph in which
25 the intermediate nodes are directories and the leaf nodes are files. The leaf
26@@ -31,19 +32,21 @@
27 
28 THE GRID OF STORAGE SERVERS
29 
30-The grid is composed of peer nodes -- processes running on computers.  They
31-establish TCP connections to each other using Foolscap, a secure remote
32-message passing library.
33+A key-value store is implemented by a collection of peer nodes -- processes
34+running on computers -- called a "grid". (The term "grid" is also used loosely
35+for the filesystem supported by these nodes.) The nodes in a grid establish
36+TCP connections to each other using Foolscap, a secure remote-message-passing
37+library.
38 
39-Each peer offers certain services to the others. The primary service is that
40+Each node offers certain services to the others. The primary service is that
41 of the storage server, which holds data in the form of "shares".  Shares are
42 encoded pieces of files.  There are a configurable number of shares for each
43 file, 10 by default.  Normally, each share is stored on a separate server, but
44 a single server can hold multiple shares for a single file.
45 
46-Peers learn about each other through an "introducer". Each peer connects to a
47-central introducer at startup, and receives a list of all other peers from
48-it. Each peer then connects to all other peers, creating a fully-connected
49+Nodes learn about each other through an "introducer". Each node connects to a
50+central introducer at startup, and receives a list of all other nodes from
51+it. Each node then connects to all other nodes, creating a fully-connected
52 topology.  In the current release, nodes behind NAT boxes will connect to all
53 nodes that they can open connections to, but they cannot open connections to
54 other nodes behind NAT boxes.  Therefore, the more nodes behind NAT boxes, the
55@@ -62,16 +65,17 @@
56 "gossip-based" introduction, simply knowing how to contact any one node will
57 be enough to contact all of them.
58 
59+
60 FILE ENCODING
61 
62-When a peer stores a file on the grid, it first encrypts the file, using a key
63+When a node stores a file on its grid, it first encrypts the file, using a key
64 that is optionally derived from the hash of the file itself.  It then segments
65 the encrypted file into small pieces, in order to reduce the memory footprint,
66 and to decrease the lag between initiating a download and receiving the first
67 part of the file; for example the lag between hitting "play" and a movie
68 actually starting.
69 
70-The peer then erasure-codes each segment, producing blocks such that only a
71+The node then erasure-codes each segment, producing blocks such that only a
72 subset of them are needed to reconstruct the segment. It sends one block from
73 each segment to a given server. The set of blocks on a given server
74 constitutes a "share". Only a subset of the shares (3 out of 10, by default)
75@@ -79,7 +83,7 @@
76 
77 A tagged hash of the encryption key is used to form the "storage index", which
78 is used for both server selection (described below) and to index shares within
79-the Storage Servers on the selected peers.
80+the Storage Servers on the selected nodes.
81 
82 Hashes are computed while the shares are being produced, to validate the
83 ciphertext and the shares themselves. Merkle hash trees are used to enable
84@@ -144,49 +148,49 @@
85 to retrieve a set of bytes, and then you can use it to validate ("identify")
86 that these potential bytes are indeed the ones that you were looking for.
87 
88-The "grid" layer is insufficient to provide a virtual drive: an actual
89-filesystem requires human-meaningful names.  Capabilities sit on the
90-"global+secure" edge of Zooko's Triangle[1]. They are self-authenticating,
91-meaning that nobody can trick you into using a file that doesn't match the
92-capability you used to refer to that file.
93+The "key-value store" layer is insufficient to provide a usable filesystem,
94+which requires human-meaningful names.  Capabilities sit on the "global+secure"
95+edge of Zooko's Triangle[1]. They are self-authenticating, meaning that
96+nobody can trick you into using a file that doesn't match the capability
97+you used to refer to that file.
98 
99 
100 SERVER SELECTION
101 
102-When a file is uploaded, the encoded shares are sent to other peers. But to
103+When a file is uploaded, the encoded shares are sent to other nodes. But to
104 which ones? The "server selection" algorithm is used to make this choice.
105 
106 In the current version, the storage index is used to consistently-permute the
107-set of all peers (by sorting the peers by HASH(storage_index+peerid)). Each
108-file gets a different permutation, which (on average) will evenly distribute
109+set of all peer nodes (by sorting the peer nodes by HASH(storage_index+peerid)).
110+Each file gets a different permutation, which (on average) will evenly distribute
111 shares among the grid and avoid hotspots.
112 
113-We use this permuted list of peers to ask each peer, in turn, if it will hold
114+We use this permuted list of nodes to ask each node, in turn, if it will hold
115 a share for us, by sending an 'allocate_buckets() query' to each one. Some
116-will say yes, others (those who are full) will say no: when a peer refuses our
117-request, we just take that share to the next peer on the list. We keep going
118+will say yes, others (those who are full) will say no: when a node refuses our
119+request, we just take that share to the next node on the list. We keep going
120 until we run out of shares to place. At the end of the process, we'll have a
121-table that maps each share number to a peer, and then we can begin the
122+table that maps each share number to a node, and then we can begin the
123 encode+push phase, using the table to decide where each share should be sent.
124 
125-Most of the time, this will result in one share per peer, which gives us
126+Most of the time, this will result in one share per node, which gives us
127 maximum reliability (since it disperses the failures as widely as possible).
128-If there are fewer useable peers than there are shares, we'll be forced to
129-loop around, eventually giving multiple shares to a single peer. This reduces
130+If there are fewer useable nodes than there are shares, we'll be forced to
131+loop around, eventually giving multiple shares to a single node. This reduces
132 reliability, so it isn't the sort of thing we want to happen all the time, and
133 either indicates that the default encoding parameters are set incorrectly
134-(creating more shares than you have peers), or that the grid does not have
135-enough space (many peers are full). But apart from that, it doesn't hurt. If
136-we have to loop through the peer list a second time, we accelerate the query
137-process, by asking each peer to hold multiple shares on the second pass. In
138+(creating more shares than you have nodes), or that the grid does not have
139+enough space (many nodes are full). But apart from that, it doesn't hurt. If
140+we have to loop through the node list a second time, we accelerate the query
141+process, by asking each node to hold multiple shares on the second pass. In
142 most cases, this means we'll never send more than two queries to any given
143-peer.
144+node.
145 
146-If a peer is unreachable, or has an error, or refuses to accept any of our
147+If a node is unreachable, or has an error, or refuses to accept any of our
148 shares, we remove them from the permuted list, so we won't query them a second
149-time for this file. If a peer already has shares for the file we're uploading
150+time for this file. If a node already has shares for the file we're uploading
151 (or if someone else is currently sending them shares), we add that information
152-to the share-to-peer table. This lets us do less work for files which have
153+to the share-to-peer-node table. This lets us do less work for files which have
154 been uploaded once before, while making sure we still wind up with as many
155 shares as we desire.
156 
157@@ -197,10 +201,10 @@
158 The current defaults use k=3, shares_of_happiness=7, and N=10, meaning that
159 we'll try to place 10 shares, we'll be happy if we can place 7, and we need to
160 get back any 3 to recover the file. This results in a 3.3x expansion
161-factor. In general, you should set N about equal to the number of peers in
162+factor. In general, you should set N about equal to the number of nodes in
163 your grid, then set N/k to achieve your desired availability goals.
164 
165-When downloading a file, the current release just asks all known peers for any
166+When downloading a file, the current release just asks all known nodes for any
167 shares they might have, chooses the minimal necessary subset, then starts
168 downloading and processing those shares. A later release will use the full
169 algorithm to reduce the number of queries that must be sent out. This
170@@ -209,26 +213,26 @@
171 queries that must be sent before downloading can begin.
172 
173 The actual number of queries is directly related to the availability of the
174-peers and the degree of overlap between the peerlist used at upload and at
175+nodes and the degree of overlap between the node list used at upload and at
176 download. For stable grids, this overlap is very high, and usually the first k
177 queries will result in shares. The number of queries grows as the stability
178 decreases. Some limits may be imposed in large grids to avoid querying a
179-million peers; this provides a tradeoff between the work spent to discover
180+million nodes; this provides a tradeoff between the work spent to discover
181 that a file is unrecoverable and the probability that a retrieval will fail
182 when it could have succeeded if we had just tried a little bit harder. The
183 appropriate value of this tradeoff will depend upon the size of the grid, and
184 will change over time.
185 
186-Other peer selection algorithms are possible. One earlier version (known as
187-"tahoe 3") used the permutation to place the peers around a large ring,
188+Other peer-node selection algorithms are possible. One earlier version (known
189+as "Tahoe 3") used the permutation to place the nodes around a large ring,
190 distributed shares evenly around the same ring, then walks clockwise from 0
191 with a basket: each time we encounter a share, put it in the basket, each time
192-we encounter a peer, give them as many shares from our basket as they'll
193+we encounter a node, give them as many shares from our basket as they'll
194 accept. This reduced the number of queries (usually to 1) for small grids
195-(where N is larger than the number of peers), but resulted in extremely
196+(where N is larger than the number of nodes), but resulted in extremely
197 non-uniform share distribution, which significantly hurt reliability
198 (sometimes the permutation resulted in most of the shares being dumped on a
199-single peer).
200+single node).
201 
202 Another algorithm (known as "denver airport"[2]) uses the permuted hash to
203 decide on an approximate target for each share, then sends lease requests via
204@@ -243,12 +247,12 @@
205 SWARMING DOWNLOAD, TRICKLING UPLOAD
206 
207 Because the shares being downloaded are distributed across a large number of
208-peers, the download process will pull from many of them at the same time. The
209+nodes, the download process will pull from many of them at the same time. The
210 current encoding parameters require 3 shares to be retrieved for each segment,
211-which means that up to 3 peers will be used simultaneously. For larger
212-networks, 8-of-22 encoding could be used, meaning 8 peers can be used
213+which means that up to 3 nodes will be used simultaneously. For larger
214+networks, 8-of-22 encoding could be used, meaning 8 nodes can be used
215 simultaneously. This allows the download process to use the sum of the
216-available peers' upload bandwidths, resulting in downloads that take full
217+available nodes' upload bandwidths, resulting in downloads that take full
218 advantage of the common 8x disparity between download and upload bandwith on
219 modern ADSL lines.
220 
221@@ -301,105 +305,25 @@
222 that are globally visible.
223 
224 
225-LEASES, REFRESHING, GARBAGE COLLECTION, QUOTAS
226+LEASES, REFRESHING, GARBAGE COLLECTION
227+
228+When a file or directory in the virtual filesystem is no longer referenced,
229+the space that its shares occupied on each storage server can be freed,
230+making room for other shares. Tahoe currently uses a garbage collection
231+("GC") mechanism to implement this space-reclamation process. Each share has
232+one or more "leases", which are managed by clients who want the
233+file/directory to be retained. The storage server accepts each share for a
234+pre-defined period of time, and is allowed to delete the share if all of the
235+leases are cancelled or allowed to expire.
236+
237+Garbage collection is not enabled by default: storage servers will not delete
238+shares without being explicitly configured to do so. When GC is enabled,
239+clients are responsible for renewing their leases on a periodic basis at
240+least frequently enough to prevent any of the leases from expiring before the
241+next renewal pass.
242 
243-THIS SECTION IS OUT OF DATE.  Since we wrote this we've changed our minds
244-about how we intend to implement these features.  Neither the old design,
245-documented below, nor the new one, documented on the tahoe-dev mailing list
246-and the wiki and the issue tracker, have actually been implemented yet.
247-
248-Shares are uploaded to a storage server, but they do not necessarily stay
249-there forever. We are anticipating three main share-lifetime management modes
250-for Tahoe: 1) per-share leases which expire, 2) per-account timers which
251-expire and cancel all leases for the account, and 3) centralized account
252-management without expiration timers.
253-
254-To be clear, none of these have been implemented yet. The
255-http://allmydata.org/trac/tahoe/wiki/QuotaManagement "Quota Management" wiki
256-page describes some of our plans for managing data lifetime and limited-space
257-user accounts.
258-
259-Multiple clients may be interested in a given share, for example if two
260-clients uploaded the same file, or if two clients are sharing a directory and
261-both want to make sure the files therein remain available. Consequently, each
262-share (technically each "bucket", which may contain multiple shares for a
263-single storage index) has a set of leases, one per client. One way to
264-visualize this is with a large table, with shares (i.e. buckets, or storage
265-indices, or files) as the rows, and accounts as columns. Each square of this
266-table might hold a lease.
267-
268-Using limited-duration leases reduces the storage consumed by clients who have
269-(for whatever reason) forgotten about the share they once cared about.
270-Clients are supposed to explicitly cancel leases for every file that they
271-remove from their vdrive, and when the last lease is removed on a share, the
272-storage server deletes that share. However, the storage server might be
273-offline when the client deletes the file, or the client might experience a bug
274-or a race condition that results in forgetting about the file. Using leases
275-that expire unless otherwise renewed ensures that these lost files will not
276-consume storage space forever. On the other hand, they require periodic
277-maintenance, which can become prohibitively expensive for large grids. In
278-addition, clients who go offline for a while are then obligated to get someone
279-else to keep their files alive for them.
280-
281-
282-In the first mode, each client holds a limited-duration lease on each share
283-(typically one month), and clients are obligated to periodically renew these
284-leases to keep them from expiring (typically once a week). In this mode, the
285-storage server does not know anything about which client is which: it only
286-knows about leases.
287-
288-In the second mode, each server maintains a list of clients and which leases
289-they hold. This is called the "account list", and each time a client wants to
290-upload a share or establish a lease, it provides credentials to allow the
291-server to know which Account it will be using. Rather than putting individual
292-timers on each lease, the server puts a timer on the Account. When the account
293-expires, all of the associated leases are cancelled.
294-
295-In this mode, clients are obligated to renew the Account periodically, but not
296-the (thousands of) individual share leases. Clients which forget about files
297-are still incurring a storage cost for those files. An occasional
298-reconcilliation process (in which the client presents the storage server with
299-a list of all the files it cares about, and the server removes leases for
300-anything that isn't on the list) can be used to free this storage, but the
301-effort involved is large, so reconcilliation must be done very infrequently.
302-
303-Our plan is to have the clients create their own Accounts, based upon the
304-possession of a private key. Clients can create as many accounts as they wish,
305-but they are responsible for their own maintenance. Servers can add up all the
306-leases for each account and present a report of usage, in bytes per
307-account. This is intended for friendnet scenarios where it would be nice to
308-know how much space your friends are consuming on your disk.
309-
310-In the third mode, the Account objects are centrally managed, and are not
311-expired by the storage servers. In this mode, the client presents credentials
312-that are issued by a central authority, such as a signed message which the
313-storage server can verify. The storage used by this account is not freed
314-unless and until the central account manager says so.
315-
316-This mode is more appropriate for a commercial offering, in which use of the
317-storage servers is contingent upon a monthly fee, or other membership
318-criteria. Being able to ask the storage usage for each account (or establish
319-limits on it) helps to enforce whatever kind of membership policy is desired.
320-
321-
322-Each lease is created with a pair of secrets: the "renew secret" and the
323-"cancel secret". These are just random-looking strings, derived by hashing
324-other higher-level secrets, starting with a per-client master secret. Anyone
325-who knows the secret is allowed to restart the expiration timer, or cancel the
326-lease altogether. Having these be individual values allows the original
327-uploading node to delegate these capabilities to others.
328-
329-In the current release, clients provide lease secrets to the storage server,
330-and each lease contains an expiration time, but there is no facility to
331-actually expire leases, nor are there explicit owners (the "ownerid" field of
332-each lease is always set to zero). In addition, many features have not been
333-implemented yet: the client should claim leases on files which are added to
334-the vdrive by linking (as opposed to uploading), and the client should cancel
335-leases on files which are removed from the vdrive, but neither has been
336-written yet. This means that shares are not ever deleted in this
337-release. (Note, however, that if read-cap to a file is deleted then it will no
338-longer be possible to decrypt that file, even if the shares which contain the
339-erasure-coded ciphertext still exist.)
340+See docs/garbage-collection.txt for further information, and how to configure
341+garbage collection.
342 
343 
344 FILE REPAIRER
345@@ -423,10 +347,10 @@
346 The repairer process does not get the full capability of the file to be
347 maintained: it merely gets the "repairer capability" subset, which does not
348 include the decryption key. The File Verifier uses that data to find out which
349-peers ought to hold shares for this file, and to see if those peers are still
350+nodes ought to hold shares for this file, and to see if those nodes are still
351 around and willing to provide the data. If the file is not healthy enough, the
352 File Repairer is invoked to download the ciphertext, regenerate any missing
353-shares, and upload them to new peers. The goal of the File Repairer is to
354+shares, and upload them to new nodes. The goal of the File Repairer is to
355 finish up with a full set of "N" shares.
356 
357 There are a number of engineering issues to be resolved here. The bandwidth,
358@@ -439,13 +363,13 @@
359 performed at the same time, and repair of files can be delegated off to other
360 nodes.
361 
362-The security model we are currently using assumes that peers who claim to hold
363+The security model we are currently using assumes that nodes who claim to hold
364 a share will actually provide it when asked. (We validate the data they
365-provide before using it in any way, but if enough peers claim to hold the data
366+provide before using it in any way, but if enough nodes claim to hold the data
367 and are wrong, the file will not be repaired, and may decay beyond
368 recoverability). There are several interesting approaches to mitigate this
369 threat, ranging from challenges to provide a keyed hash of the allegedly-held
370-data (using "buddy nodes", in which two peers hold the same block, and check
371+data (using "buddy nodes", in which two nodes hold the same block, and check
372 up on each other), to reputation systems, or even the original Mojo Nation
373 economic model.
374 
375@@ -475,20 +399,20 @@
376 technique used to generate shares.
377 
378 Many of these security properties depend upon the usual cryptographic
379-assumptions: the resistance of AES and RSA to attack, the resistance of SHA256
380+assumptions: the resistance of AES and RSA to attack, the resistance of SHA-256
381 to pre-image attacks, and upon the proximity of 2^-128 and 2^-256 to zero. A
382 break in AES would allow a confidentiality violation, a pre-image break in
383-SHA256 would allow a consistency violation, and a break in RSA would allow a
384-mutability violation. The discovery of a collision in SHA256 is unlikely to
385+SHA-256 would allow a consistency violation, and a break in RSA would allow a
386+mutability violation. The discovery of a collision in SHA-256 is unlikely to
387 allow much, but could conceivably allow a consistency violation in data that
388-was uploaded by the attacker. If SHA256 is threatened, further analysis will
389+was uploaded by the attacker. If SHA-256 is threatened, further analysis will
390 be warranted.
391 
392 There is no attempt made to provide anonymity, neither of the origin of a
393 piece of data nor the identity of the subsequent downloaders. In general,
394 anyone who already knows the contents of a file will be in a strong position
395 to determine who else is uploading or downloading it. Also, it is quite easy
396-for a sufficiently-large coalition of nodes to correlate the set of peers who
397+for a sufficiently large coalition of nodes to correlate the set of nodes who
398 are all uploading or downloading the same file, even if the attacker does not
399 know the contents of the file in question.
400 
401@@ -522,18 +446,18 @@
402 
403 RELIABILITY
404 
405-File encoding and peer selection parameters can be adjusted to achieve
406+File encoding and peer-node selection parameters can be adjusted to achieve
407 different goals. Each choice results in a number of properties; there are many
408 tradeoffs.
409 
410 First, some terms: the erasure-coding algorithm is described as K-out-of-N
411 (for this release, the default values are K=3 and N=10). Each grid will have
412-some number of peers; this number will rise and fall over time as peers join,
413+some number of nodes; this number will rise and fall over time as nodes join,
414 drop out, come back, and leave forever. Files are of various sizes, some are
415-popular, others are rare. Peers have various capacities, variable
416+popular, others are rare. Nodes have various capacities, variable
417 upload/download bandwidths, and network latency. Most of the mathematical
418-models that look at peer failure assume some average (and independent)
419-probability 'P' of a given peer being available: this can be high (servers
420+models that look at node failure assume some average (and independent)
421+probability 'P' of a given node being available: this can be high (servers
422 tend to be online and available >90% of the time) or low (laptops tend to be
423 turned on for an hour then disappear for several days). Files are encoded in
424 segments of a given maximum size, which affects memory usage.
425@@ -549,24 +473,24 @@
426 roughly 10^50 times better), because there are more shares that can be lost
427 without losing the file.
428 
429-Likewise, the total number of peers in the network affects the same
430-granularity: having only one peer means a single point of failure, no matter
431-how many copies of the file you make. Independent peers (with uncorrelated
432+Likewise, the total number of nodes in the network affects the same
433+granularity: having only one node means a single point of failure, no matter
434+how many copies of the file you make. Independent nodes (with uncorrelated
435 failures) are necessary to hit the mathematical ideals: if you have 100 nodes
436 but they are all in the same office building, then a single power failure will
437 take out all of them at once. The "Sybil Attack" is where a single attacker
438 convinces you that they are actually multiple servers, so that you think you
439-are using a large number of independent peers, but in fact you have a single
440+are using a large number of independent nodes, but in fact you have a single
441 point of failure (where the attacker turns off all their machines at
442-once). Large grids, with lots of truly-independent peers, will enable the use
443+once). Large grids, with lots of truly independent nodes, will enable the use
444 of lower expansion factors to achieve the same reliability, but will increase
445-overhead because each peer needs to know something about every other, and the
446-rate at which peers come and go will be higher (requiring network maintenance
447+overhead because each node needs to know something about every other, and the
448+rate at which nodes come and go will be higher (requiring network maintenance
449 traffic). Also, the File Repairer work will increase with larger grids,
450-although then the job can be distributed out to more peers.
451+although then the job can be distributed out to more nodes.
452 
453 Higher values of N increase overhead: more shares means more Merkle hashes
454-that must be included with the data, and more peers to contact to retrieve the
455+that must be included with the data, and more nodes to contact to retrieve the
456 shares. Smaller segment sizes reduce memory usage (since each segment must be
457 held in memory while erasure coding runs) and improves "alacrity" (since
458 downloading can validate a smaller piece of data faster, delivering it to the
459@@ -592,9 +516,9 @@
460 
461 [2]: all of these names are derived from the location where they were
462      concocted, in this case in a car ride from Boulder to DEN. To be
463-     precise, "tahoe 1" was an unworkable scheme in which everyone who holds
464+     precise, "Tahoe 1" was an unworkable scheme in which everyone who holds
465      shares for a given file would form a sort of cabal which kept track of
466-     all the others, "tahoe 2" is the first-100-peers in the permuted hash
467-     described in this document, and "tahoe 3" (or perhaps "potrero hill 1")
468+     all the others, "Tahoe 2" is the first-100-nodes in the permuted hash
469+     described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1")
470      was the abandoned ring-with-many-hands approach.
471 
472
473--- old-tahoe/src/allmydata/scripts/cli.py      2010-01-14 03:46:11.986000000 +0000
474+++ new-tahoe/src/allmydata/scripts/cli.py      2010-01-14 03:46:12.233000000 +0000
475@@ -69,10 +69,10 @@
476     def getSynopsis(self):
477         return "%s create-alias ALIAS" % (os.path.basename(sys.argv[0]),)
478 
479-    longdesc = """Creates a new directory and adds an alias for it."""
480+    longdesc = """Create a new directory and add an alias for it."""
481 
482 class ListAliasOptions(VDriveOptions):
483-    longdesc = """Displays a table of all configured aliases."""
484+    longdesc = """Display a table of all configured aliases."""
485 
486 class ListOptions(VDriveOptions):
487     optFlags = [
488@@ -85,7 +85,7 @@
489     def parseArgs(self, where=""):
490         self.where = where
491 
492-    longdesc = """List the contents of some portion of the virtual drive."""
493+    longdesc = """List the contents of some portion of the grid."""
494 
495 class GetOptions(VDriveOptions):
496     def parseArgs(self, arg1, arg2=None):
497@@ -100,11 +100,12 @@
498             self.to_file = None
499 
500     def getSynopsis(self):
501-        return "%s get VDRIVE_FILE LOCAL_FILE" % (os.path.basename(sys.argv[0]),)
502+        return "%s get REMOTE_FILE LOCAL_FILE" % (os.path.basename(sys.argv[0]),)
503 
504-    longdesc = """Retrieve a file from the virtual drive and write it to the
505-    local filesystem. If LOCAL_FILE is omitted or '-', the contents of the file
506-    will be written to stdout."""
507+    longdesc = """
508+    Retrieve a file from the grid and write it to the local filesystem. If
509+    LOCAL_FILE is omitted or '-', the contents of the file will be written to
510+    stdout."""
511 
512     def getUsage(self, width=None):
513         t = VDriveOptions.getUsage(self, width)
514@@ -123,12 +124,7 @@
515         ]
516 
517     def parseArgs(self, arg1=None, arg2=None):
518-        # cat FILE | tahoe put           # create unlinked file from stdin
519-        # cat FILE | tahoe put -         # same
520-        # tahoe put bar                  # create unlinked file from local 'bar'
521-        # cat FILE | tahoe put - FOO     # create tahoe:FOO from stdin
522-        # tahoe put bar FOO              # copy local 'bar' to tahoe:FOO
523-        # tahoe put bar tahoe:FOO        # same
524+        # see Examples below
525 
526         if arg1 is not None and arg2 is not None:
527             self.from_file = arg1
528@@ -143,13 +139,14 @@
529             self.from_file = None
530 
531     def getSynopsis(self):
532-        return "%s put LOCAL_FILE VDRIVE_FILE" % (os.path.basename(sys.argv[0]),)
533+        return "%s put LOCAL_FILE REMOTE_FILE" % (os.path.basename(sys.argv[0]),)
534 
535-    longdesc = """Put a file into the virtual drive (copying the file's
536-    contents from the local filesystem). If VDRIVE_FILE is missing, upload
537-    the file but do not link it into a directory: prints the new filecap to
538-    stdout. If LOCAL_FILE is missing or '-', data will be copied from stdin.
539-    VDRIVE_FILE is assumed to start with tahoe: unless otherwise specified."""
540+    longdesc = """
541+    Put a file into the grid, copying its contents from the local filesystem.
542+    If REMOTE_FILE is missing, upload the file but do not link it into a directory;
543+    also print the new filecap to stdout. If LOCAL_FILE is missing or '-', data
544+    will be copied from stdin. REMOTE_FILE is assumed to start with tahoe: unless
545+    otherwise specified."""
546 
547     def getUsage(self, width=None):
548         t = VDriveOptions.getUsage(self, width)
549@@ -171,7 +168,7 @@
550         ("verbose", "v", "Be noisy about what is happening."),
551         ("caps-only", None,
552          "When copying to local files, write out filecaps instead of actual "
553-         "data. (only useful for debugging and tree-comparison purposes)"),
554+         "data (only useful for debugging and tree-comparison purposes)."),
555         ]
556     def parseArgs(self, *args):
557         if len(args) < 2:
558@@ -181,12 +178,12 @@
559     def getSynopsis(self):
560         return "Usage: tahoe [options] cp FROM.. TO"
561     longdesc = """
562-    Use 'tahoe cp' to copy files between a local filesystem and a Tahoe
563-    virtual filesystem. Any FROM/TO arguments that begin with an alias
564-    indicate Tahoe-side files, and arguments which do not indicate local
565-    files. Directories will be copied recursively. New Tahoe-side directories
566-    will be created when necessary. Assuming that you have previously set up
567-    an alias 'home' with 'tahoe create-alias home', here are some examples:
568+    Use 'tahoe cp' to copy files between a local filesystem and a Tahoe grid.
569+    Any FROM/TO arguments that begin with an alias indicate Tahoe-side
570+    files or non-file arguments. Directories will be copied recursively.
571+    New Tahoe-side directories will be created when necessary. Assuming that
572+    you have previously set up an alias 'home' with 'tahoe create-alias home',
573+    here are some examples:
574 
575     tahoe cp ~/foo.txt home:  # creates tahoe-side home:foo.txt
576 
577@@ -210,7 +207,7 @@
578         self.where = where
579 
580     def getSynopsis(self):
581-        return "%s rm VDRIVE_FILE" % (os.path.basename(sys.argv[0]),)
582+        return "%s rm REMOTE_FILE" % (os.path.basename(sys.argv[0]),)
583 
584 class MvOptions(VDriveOptions):
585     def parseArgs(self, frompath, topath):
586@@ -220,11 +217,15 @@
587     def getSynopsis(self):
588         return "%s mv FROM TO" % (os.path.basename(sys.argv[0]),)
589     longdesc = """
590-    Use 'tahoe mv' to move files that are already on the grid elsewhere on the grid, e.g., 'tahoe mv alias:some_file alias:new_file'.
591+    Use 'tahoe mv' to move files that are already on the grid elsewhere on the
592+    grid, e.g., 'tahoe mv alias:some_file alias:new_file'.
593 
594-    If moving a remote file into a remote directory, you'll need to append a '/' to the name of the remote directory, e.g., 'tahoe mv tahoe:file1 tahoe:dir/', not 'tahoe mv tahoe:file1 tahoe:dir'.
595+    If moving a remote file into a remote directory, you'll need to append a '/'
596+    to the name of the remote directory, e.g., 'tahoe mv tahoe:file1 tahoe:dir/',
597+    not 'tahoe mv tahoe:file1 tahoe:dir'.
598 
599-    Note that it is not possible to use this command to move local files to the grid -- use 'tahoe cp' for that.
600+    Note that it is not possible to use this command to move local files to the
601+    grid -- use 'tahoe cp' for that.
602     """
603 
604 class LnOptions(VDriveOptions):
605@@ -241,7 +242,7 @@
606 class BackupOptions(VDriveOptions):
607     optFlags = [
608         ("verbose", "v", "Be noisy about what is happening."),
609-        ("ignore-timestamps", None, "Do not use backupdb timestamps to decide if a local file is unchanged."),
610+        ("ignore-timestamps", None, "Do not use backupdb timestamps to decide whether a local file is unchanged."),
611         ]
612 
613     vcs_patterns = ('CVS', 'RCS', 'SCCS', '.git', '.gitignore', '.cvsignore', '.svn',
614@@ -298,7 +299,12 @@
615             else:
616                 yield filename
617 
618-    longdesc = """Add a versioned backup of the local FROM directory to a timestamped subdir of the (tahoe) TO/Archives directory, sharing as many files and directories as possible with the previous backup. Creates TO/Latest as a reference to the latest backup. Behaves somewhat like 'rsync -a --link-dest=TO/Archives/(previous) FROM TO/Archives/(new); ln -sf TO/Archives/(new) TO/Latest'."""
619+    longdesc = """
620+    Add a versioned backup of the local FROM directory to a timestamped
621+    subdirectory of the TO/Archives directory on the grid, sharing as many
622+    files and directories as possible with the previous backup. Create
623+    TO/Latest as a reference to the latest backup. Behaves somewhat like
624+    'rsync -a --link-dest=TO/Archives/(previous) FROM TO/Archives/(new); ln -sf TO/Archives/(new) TO/Latest'."""
625 
626 class WebopenOptions(VDriveOptions):
627     def parseArgs(self, where=''):
628@@ -307,7 +313,7 @@
629     def getSynopsis(self):
630         return "%s webopen [ALIAS:PATH]" % (os.path.basename(sys.argv[0]),)
631 
632-    longdesc = """Opens a webbrowser to the contents of some portion of the virtual drive. When called without arguments, opens to the Welcome page."""
633+    longdesc = """Open a web browser to the contents of some file or directory on the grid."""
634 
635 class ManifestOptions(VDriveOptions):
636     optFlags = [
637@@ -322,7 +328,7 @@
638     def getSynopsis(self):
639         return "%s manifest [ALIAS:PATH]" % (os.path.basename(sys.argv[0]),)
640 
641-    longdesc = """Print a list of all files/directories reachable from the given starting point."""
642+    longdesc = """Print a list of all files and directories reachable from the given starting point."""
643 
644 class StatsOptions(VDriveOptions):
645     optFlags = [
646@@ -334,7 +340,7 @@
647     def getSynopsis(self):
648         return "%s stats [ALIAS:PATH]" % (os.path.basename(sys.argv[0]),)
649 
650-    longdesc = """Print statistics about of all files/directories reachable from the given starting point."""
651+    longdesc = """Print statistics about of all files and directories reachable from the given starting point."""
652 
653 class CheckOptions(VDriveOptions):
654     optFlags = [
655@@ -349,7 +355,9 @@
656     def getSynopsis(self):
657         return "%s check [ALIAS:PATH]" % (os.path.basename(sys.argv[0]),)
658 
659-    longdesc = """Check a single file or directory: count how many shares are available, verify their hashes. Optionally repair the file if any problems were found."""
660+    longdesc = """
661+    Check a single file or directory: count how many shares are available and
662+    verify their hashes. Optionally repair the file if any problems were found."""
663 
664 class DeepCheckOptions(VDriveOptions):
665     optFlags = [
666@@ -365,7 +373,10 @@
667     def getSynopsis(self):
668         return "%s deep-check [ALIAS:PATH]" % (os.path.basename(sys.argv[0]),)
669 
670-    longdesc = """Check all files/directories reachable from the given starting point (which must be a directory), like 'tahoe check' but for multiple files. Optionally repair any problems found."""
671+    longdesc = """
672+    Check all files and directories reachable from the given starting point
673+    (which must be a directory), like 'tahoe check' but for multiple files.
674+    Optionally repair any problems found."""
675 
676 subCommands = [
677     ["mkdir", None, MakeDirectoryOptions, "Create a new directory"],
678@@ -373,16 +384,16 @@
679     ["create-alias", None, CreateAliasOptions, "Create a new alias cap"],
680     ["list-aliases", None, ListAliasOptions, "List all alias caps"],
681     ["ls", None, ListOptions, "List a directory"],
682-    ["get", None, GetOptions, "Retrieve a file from the virtual drive."],
683-    ["put", None, PutOptions, "Upload a file into the virtual drive."],
684+    ["get", None, GetOptions, "Retrieve a file from the grid."],
685+    ["put", None, PutOptions, "Upload a file into the grid."],
686     ["cp", None, CpOptions, "Copy one or more files."],
687-    ["rm", None, RmOptions, "Unlink a file or directory in the virtual drive."],
688-    ["mv", None, MvOptions, "Move a file within the virtual drive."],
689+    ["rm", None, RmOptions, "Unlink a file or directory on the grid."],
690+    ["mv", None, MvOptions, "Move a file within the grid."],
691     ["ln", None, LnOptions, "Make an additional link to an existing file."],
692     ["backup", None, BackupOptions, "Make target dir look like local dir."],
693-    ["webopen", None, WebopenOptions, "Open a webbrowser to the root_dir"],
694-    ["manifest", None, ManifestOptions, "List all files/dirs in a subtree"],
695-    ["stats", None, StatsOptions, "Print statistics about all files/dirs in a subtree"],
696+    ["webopen", None, WebopenOptions, "Open a web browser to a grid file or directory."],
697+    ["manifest", None, ManifestOptions, "List all files/directories in a subtree"],
698+    ["stats", None, StatsOptions, "Print statistics about all files/directories in a subtree"],
699     ["check", None, CheckOptions, "Check a single file or directory"],
700     ["deep-check", None, DeepCheckOptions, "Check all files/directories reachable from a starting point"],
701     ]
702
703--- old-tahoe/src/allmydata/provisioning.py     2010-01-14 03:46:11.998000000 +0000
704+++ new-tahoe/src/allmydata/provisioning.py     2010-01-14 03:46:12.237000000 +0000
705@@ -128,7 +128,7 @@
706                                                        files_per_user_counts,
707                                                        1000)
708         add_input("Users",
709-                  "How many files in each user's vdrive? (avg)",
710+                  "How many files for each user? (avg)",
711                   i_files_per_user)
712 
713         space_per_user_sizes = [(1e6, "1MB"),
714@@ -147,7 +147,7 @@
715                                                        space_per_user_sizes,
716                                                        200e6)
717         add_input("Users",
718-                  "How much data is in each user's vdrive? (avg)",
719+                  "How much data for each user? (avg)",
720                   i_space_per_user)
721 
722         sharing_ratios = [(1.0, "1.0x"),
723
724--- old-tahoe/src/allmydata/test/check_load.py  2010-01-14 03:46:12.013000000 +0000
725+++ new-tahoe/src/allmydata/test/check_load.py  2010-01-14 03:46:12.253000000 +0000
726@@ -97,12 +97,12 @@
727 directories_read = 0
728 directories_written = 0
729 
730-def listdir(nodeurl, root, vdrive_pathname):
731+def listdir(nodeurl, root, remote_pathname):
732     if nodeurl[-1] != "/":
733         nodeurl += "/"
734     url = nodeurl + "uri/%s/" % urllib.quote(root)
735-    if vdrive_pathname:
736-        url += urllib.quote(vdrive_pathname)
737+    if remote_pathname:
738+        url += urllib.quote(remote_pathname)
739     url += "?t=json"
740     data = urllib.urlopen(url).read()
741     try:
742@@ -203,11 +203,11 @@
743         path = "/"
744     return scheme, host, port, path
745 
746-def generate_and_put(nodeurl, root, vdrive_fname, size):
747+def generate_and_put(nodeurl, root, remote_filename, size):
748     if nodeurl[-1] != "/":
749         nodeurl += "/"
750     url = nodeurl + "uri/%s/" % urllib.quote(root)
751-    url += urllib.quote(vdrive_fname)
752+    url += urllib.quote(remote_filename)
753 
754     scheme, host, port, path = parse_url(url)
755     if scheme == "http":
756
757--- old-tahoe/src/allmydata/test/test_system.py 2010-01-14 03:46:12.046000000 +0000
758+++ new-tahoe/src/allmydata/test/test_system.py 2010-01-14 03:46:12.269000000 +0000
759@@ -28,7 +28,7 @@
760 from allmydata.test.common import SystemTestMixin
761 
762 LARGE_DATA = """
763-This is some data to publish to the virtual drive, which needs to be large
764+This is some data to publish to the remote grid.., which needs to be large
765 enough to not fit inside a LIT uri.
766 """
767 
768@@ -698,8 +698,8 @@
769     # the key, which should cause the download to fail the post-download
770     # plaintext_hash check.
771 
772-    def test_vdrive(self):
773-        self.basedir = "system/SystemTest/test_vdrive"
774+    def test_filesystem(self):
775+        self.basedir = "system/SystemTest/test_filesystem"
776         self.data = LARGE_DATA
777         d = self.set_up_nodes(use_stats_gatherer=True)
778         d.addCallback(self._test_introweb)
779
780--- old-tahoe/src/allmydata/test/test_client.py 2010-01-14 03:46:12.062000000 +0000
781+++ new-tahoe/src/allmydata/test/test_client.py 2010-01-14 03:46:12.273000000 +0000
782@@ -32,20 +32,12 @@
783         basedir = "test_client.Basic.test_loadable"
784         os.mkdir(basedir)
785         open(os.path.join(basedir, "introducer.furl"), "w").write("")
786-        open(os.path.join(basedir, "vdrive.furl"), "w").write("")
787-        c = client.Client(basedir)
788-
789-    def test_loadable_without_vdrive(self):
790-        basedir = "test_client.Basic.test_loadable_without_vdrive"
791-        os.mkdir(basedir)
792-        open(os.path.join(basedir, "introducer.furl"), "w").write("")
793         c = client.Client(basedir)
794 
795     def test_loadable_old_config_bits(self):
796         basedir = "test_client.Basic.test_loadable_old_config_bits"
797         os.mkdir(basedir)
798         open(os.path.join(basedir, "introducer.furl"), "w").write("")
799-        open(os.path.join(basedir, "vdrive.furl"), "w").write("")
800         open(os.path.join(basedir, "no_storage"), "w").write("")
801         open(os.path.join(basedir, "readonly_storage"), "w").write("")
802         open(os.path.join(basedir, "debug_discard_storage"), "w").write("")
803@@ -60,7 +52,6 @@
804         basedir = "test_client.Basic.test_loadable_old_storage_config_bits"
805         os.mkdir(basedir)
806         open(os.path.join(basedir, "introducer.furl"), "w").write("")
807-        open(os.path.join(basedir, "vdrive.furl"), "w").write("")
808         open(os.path.join(basedir, "readonly_storage"), "w").write("")
809         open(os.path.join(basedir, "debug_discard_storage"), "w").write("")
810         c = client.Client(basedir)
811@@ -72,7 +63,6 @@
812         basedir = "test_client.Basic.test_secrets"
813         os.mkdir(basedir)
814         open(os.path.join(basedir, "introducer.furl"), "w").write("")
815-        open(os.path.join(basedir, "vdrive.furl"), "w").write("")
816         c = client.Client(basedir)
817         secret_fname = os.path.join(basedir, "private", "secret")
818         self.failUnless(os.path.exists(secret_fname), secret_fname)
819@@ -161,7 +151,6 @@
820         basedir = "test_client.Basic.test_versions"
821         os.mkdir(basedir)
822         open(os.path.join(basedir, "introducer.furl"), "w").write("")
823-        open(os.path.join(basedir, "vdrive.furl"), "w").write("")
824         c = client.Client(basedir)
825         ss = c.getServiceNamed("storage")
826         verdict = ss.remote_get_version()
827
828--- old-tahoe/src/allmydata/test/test_cli.py    2010-01-14 03:46:12.078000000 +0000
829+++ new-tahoe/src/allmydata/test/test_cli.py    2010-01-14 03:46:12.279000000 +0000
830@@ -376,17 +376,17 @@
831 
832     def test_get(self):
833         help = str(cli.GetOptions())
834-        self.failUnless("get VDRIVE_FILE LOCAL_FILE" in help, help)
835+        self.failUnless("get REMOTE_FILE LOCAL_FILE" in help, help)
836         self.failUnless("% tahoe get FOO |less" in help, help)
837 
838     def test_put(self):
839         help = str(cli.PutOptions())
840-        self.failUnless("put LOCAL_FILE VDRIVE_FILE" in help, help)
841+        self.failUnless("put LOCAL_FILE REMOTE_FILE" in help, help)
842         self.failUnless("% cat FILE | tahoe put" in help, help)
843 
844     def test_rm(self):
845         help = str(cli.RmOptions())
846-        self.failUnless("rm VDRIVE_FILE" in help, help)
847+        self.failUnless("rm REMOTE_FILE" in help, help)
848 
849     def test_mv(self):
850         help = str(cli.MvOptions())
851
852--- old-tahoe/src/allmydata/scripts/tahoe_put.py        2010-01-14 03:46:12.176000000 +0000
853+++ new-tahoe/src/allmydata/scripts/tahoe_put.py        2010-01-14 03:46:12.353000000 +0000
854@@ -34,6 +34,7 @@
855         #  /oops/subdir/foo : DISALLOWED
856         #  ALIAS:foo  : aliases[ALIAS]/foo
857         #  ALIAS:subdir/foo  : aliases[ALIAS]/subdir/foo
858+       
859         #  ALIAS:/oops/subdir/foo : DISALLOWED
860         #  DIRCAP:./foo        : DIRCAP/foo
861         #  DIRCAP:./subdir/foo : DIRCAP/subdir/foo
862@@ -45,7 +46,7 @@
863             rootcap, path = get_alias(aliases, to_file, DEFAULT_ALIAS)
864             if path.startswith("/"):
865                 suggestion = to_file.replace("/", "", 1)
866-                print >>stderr, "ERROR: The VDRIVE filename must not start with a slash"
867+                print >>stderr, "ERROR: The remote filename must not start with a slash"
868                 print >>stderr, "Please try again, perhaps with:", suggestion
869                 return 1
870             url = nodeurl + "uri/%s/" % urllib.quote(rootcap)