Ticket #1225: docs-txt-rst-conversion.patch
File docs-txt-rst-conversion.patch, 130.7 KB (added by p-static, at 2010-10-14T07:49:17Z) |
---|
-
docs/architecture.txt
diff -rN -u old-tahoe-lafs/docs/architecture.txt new-tahoe-lafs/docs/architecture.txt
old new 1 = Tahoe-LAFS Architecture = 1 ======================= 2 Tahoe-LAFS Architecture 3 ======================= 4 5 1. `Overview`_ 6 2. `The Key-Value Store`_ 7 3. `File Encoding`_ 8 4. `Capabilities`_ 9 5. `Server Selection`_ 10 6. `Swarming Download, Trickling Upload`_ 11 7. `The Filesystem Layer`_ 12 8. `Leases, Refreshing, Garbage Collection`_ 13 9. `File Repairer`_ 14 10. `Security`_ 15 11. `Reliability`_ 2 16 3 1. Overview4 2. The Key-Value Store5 3. File Encoding6 4. Capabilities7 5. Server Selection8 6. Swarming Download, Trickling Upload9 7. The Filesystem Layer10 8. Leases, Refreshing, Garbage Collection11 9. File Repairer12 10. Security13 11. Reliability14 17 15 16 == Overview==18 Overview 19 ======== 17 20 18 21 (See the docs/specifications directory for more details.) 19 22 … … 40 43 copies files from the local disk onto the decentralized filesystem. We later 41 44 provide read-only access to those files, allowing users to recover them. 42 45 There are several other applications built on top of the Tahoe-LAFS 43 filesystem (see the RelatedProjects page of the wiki for a list). 46 filesystem (see the `RelatedProjects 47 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/RelatedProjects>`_ page of the 48 wiki for a list). 44 49 45 50 46 == The Key-Value Store == 51 The Key-Value Store 52 =================== 47 53 48 54 The key-value store is implemented by a grid of Tahoe-LAFS storage servers -- 49 55 user-space processes. Tahoe-LAFS storage clients communicate with the storage … … 76 82 server to tell a new client about all the others. 77 83 78 84 79 == File Encoding == 85 File Encoding 86 ============= 80 87 81 88 When a client stores a file on the grid, it first encrypts the file. It then 82 89 breaks the encrypted file into small segments, in order to reduce the memory … … 117 124 into plaintext, then emit the plaintext bytes to the output target. 118 125 119 126 120 == Capabilities == 127 Capabilities 128 ============ 121 129 122 130 Capabilities to immutable files represent a specific set of bytes. Think of 123 131 it like a hash function: you feed in a bunch of bytes, and you get out a … … 142 150 that these potential bytes are indeed the ones that you were looking for. 143 151 144 152 The "key-value store" layer doesn't include human-meaningful names. 145 Capabilities sit on the "global+secure" edge of Zooko's Triangle[1]. They are153 Capabilities sit on the "global+secure" edge of `Zooko's Triangle`_. They are 146 154 self-authenticating, meaning that nobody can trick you into accepting a file 147 155 that doesn't match the capability you used to refer to that file. The 148 156 filesystem layer (described below) adds human-meaningful names atop the 149 157 key-value layer. 150 158 159 .. _`Zooko's Triangle`: http://en.wikipedia.org/wiki/Zooko%27s_triangle 160 151 161 152 == Server Selection == 162 Server Selection 163 ================ 153 164 154 165 When a file is uploaded, the encoded shares are sent to some servers. But to 155 166 which ones? The "server selection" algorithm is used to make this choice. 156 167 157 168 The storage index is used to consistently-permute the set of all servers nodes 158 (by sorting them by HASH(storage_index+nodeid)). Each file gets a different169 (by sorting them by ``HASH(storage_index+nodeid)``). Each file gets a different 159 170 permutation, which (on average) will evenly distribute shares among the grid 160 171 and avoid hotspots. Each server has announced its available space when it 161 172 connected to the introducer, and we use that available space information to … … 254 265 significantly hurt reliability (sometimes the permutation resulted in most 255 266 of the shares being dumped on a single node). 256 267 257 Another algorithm (known as "denver airport" [2]) uses the permuted hash to268 Another algorithm (known as "denver airport" [#naming]_) uses the permuted hash to 258 269 decide on an approximate target for each share, then sends lease requests 259 270 via Chord routing. The request includes the contact information of the 260 271 uploading node, and asks that the node which eventually accepts the lease … … 263 274 the same approach. This allows nodes to avoid maintaining a large number of 264 275 long-term connections, at the expense of complexity and latency. 265 276 277 .. [#naming] all of these names are derived from the location where they were 278 concocted, in this case in a car ride from Boulder to DEN. To be 279 precise, "Tahoe 1" was an unworkable scheme in which everyone who holds 280 shares for a given file would form a sort of cabal which kept track of 281 all the others, "Tahoe 2" is the first-100-nodes in the permuted hash 282 described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1") 283 was the abandoned ring-with-many-hands approach. 266 284 267 == Swarming Download, Trickling Upload == 285 286 Swarming Download, Trickling Upload 287 =================================== 268 288 269 289 Because the shares being downloaded are distributed across a large number of 270 290 nodes, the download process will pull from many of them at the same time. The … … 295 315 See "helper.txt" for details about the upload helper. 296 316 297 317 298 == The Filesystem Layer == 318 The Filesystem Layer 319 ==================== 299 320 300 321 The "filesystem" layer is responsible for mapping human-meaningful pathnames 301 322 (directories and filenames) to pieces of data. The actual bytes inside these … … 325 346 that are globally visible. 326 347 327 348 328 == Leases, Refreshing, Garbage Collection == 349 Leases, Refreshing, Garbage Collection 350 ====================================== 329 351 330 352 When a file or directory in the virtual filesystem is no longer referenced, 331 353 the space that its shares occupied on each storage server can be freed, … … 346 368 garbage collection. 347 369 348 370 349 == File Repairer == 371 File Repairer 372 ============= 350 373 351 374 Shares may go away because the storage server hosting them has suffered a 352 375 failure: either temporary downtime (affecting availability of the file), or a … … 403 426 in client behavior. 404 427 405 428 406 == Security == 429 Security 430 ======== 407 431 408 432 The design goal for this project is that an attacker may be able to deny 409 433 service (i.e. prevent you from recovering a file that was uploaded earlier) 410 434 but can accomplish none of the following three attacks: 411 435 412 413 414 415 416 417 418 436 1) violate confidentiality: the attacker gets to view data to which you have 437 not granted them access 438 2) violate integrity: the attacker convinces you that the wrong data is 439 actually the data you were intending to retrieve 440 3) violate unforgeability: the attacker gets to modify a mutable file or 441 directory (either the pathnames or the file contents) to which you have 442 not given them write permission 419 443 420 444 Integrity (the promise that the downloaded data will match the uploaded data) 421 445 is provided by the hashes embedded in the capability (for immutable files) or … … 467 491 capabilities). 468 492 469 493 470 == Reliability == 494 Reliability 495 =========== 471 496 472 497 File encoding and peer-node selection parameters can be adjusted to achieve 473 498 different goals. Each choice results in a number of properties; there are … … 532 557 view the disk consumption of each. It is also acquiring some sections with 533 558 availability/reliability numbers, as well as preliminary cost analysis data. 534 559 This tool will continue to evolve as our analysis improves. 535 536 ------------------------------537 538 [1]: http://en.wikipedia.org/wiki/Zooko%27s_triangle539 540 [2]: all of these names are derived from the location where they were541 concocted, in this case in a car ride from Boulder to DEN. To be542 precise, "Tahoe 1" was an unworkable scheme in which everyone who holds543 shares for a given file would form a sort of cabal which kept track of544 all the others, "Tahoe 2" is the first-100-nodes in the permuted hash545 described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1")546 was the abandoned ring-with-many-hands approach.547 -
docs/backdoors.txt
diff -rN -u old-tahoe-lafs/docs/backdoors.txt new-tahoe-lafs/docs/backdoors.txt
old new 1 Statement on Backdoors 1 ====================== 2 Statement on Backdoors 3 ====================== 2 4 3 5 October 5, 2010 4 6 5 The New York Times has recently reported that the current U.S. administration is proposing a bill that would apparently, if passed, require communication systems to facilitate government wiretapping and access to encrypted data: 7 The New York Times has recently reported that the current U.S. administration 8 is proposing a bill that would apparently, if passed, require communication 9 systems to facilitate government wiretapping and access to encrypted data: 6 10 7 11 http://www.nytimes.com/2010/09/27/us/27wiretap.html (login required; username/password pairs available at http://www.bugmenot.com/view/nytimes.com). 8 12 9 Commentary by the Electronic Frontier Foundation (https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman / Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ), Julian Sanchez / Cato Institute (http://www.cato-at-liberty.org/designing-an-insecure-internet/ ). 10 11 The core Tahoe developers promise never to change Tahoe-LAFS to facilitate government access to data stored or transmitted by it. Even if it were desirable to facilitate such access—which it is not—we believe it would not be technically feasible to do so without severely compromising Tahoe-LAFS' security against other attackers. There have been many examples in which backdoors intended for use by government have introduced vulnerabilities exploitable by other parties (a notable example being the Greek cellphone eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the security case against such backdoors. 12 13 Note that since Tahoe-LAFS is open-source software, forks by people other than the current core developers are possible. In that event, we would try to persuade any such forks to adopt a similar policy. 13 Commentary by the Electronic Frontier Foundation 14 (https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman / 15 Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ), 16 Julian Sanchez / Cato Institute 17 (http://www.cato-at-liberty.org/designing-an-insecure-internet/ ). 18 19 The core Tahoe developers promise never to change Tahoe-LAFS to facilitate 20 government access to data stored or transmitted by it. Even if it were 21 desirable to facilitate such access—which it is not—we believe it would not be 22 technically feasible to do so without severely compromising Tahoe-LAFS' 23 security against other attackers. There have been many examples in which 24 backdoors intended for use by government have introduced vulnerabilities 25 exploitable by other parties (a notable example being the Greek cellphone 26 eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the 27 security case against such backdoors. 28 29 Note that since Tahoe-LAFS is open-source software, forks by people other than 30 the current core developers are possible. In that event, we would try to 31 persuade any such forks to adopt a similar policy. 14 32 15 33 The following Tahoe-LAFS developers agree with this statement: 16 34 17 35 David-Sarah Hopwood 36 18 37 Zooko Wilcox-O'Hearn 38 19 39 Brian Warner 40 20 41 Kevan Carstensen 42 21 43 Frédéric Marti 44 22 45 Jack Lloyd 46 23 47 François Deppierraz 48 24 49 Yu Xue 50 25 51 Marc Tooley -
docs/backupdb.txt
diff -rN -u old-tahoe-lafs/docs/backupdb.txt new-tahoe-lafs/docs/backupdb.txt
old new 1 = The Tahoe BackupDB = 1 ================== 2 The Tahoe BackupDB 3 ================== 4 5 1. `Overview`_ 6 2. `Schema`_ 7 3. `Upload Operation`_ 8 4. `Directory Operations`_ 2 9 3 == Overview == 10 Overview 11 ======== 4 12 To speed up backup operations, Tahoe maintains a small database known as the 5 13 "backupdb". This is used to avoid re-uploading files which have already been 6 14 uploaded recently. … … 33 41 as Debian etch (4.0 "oldstable") or Ubuntu Edgy (6.10) the "python-pysqlite2" 34 42 package won't work, but the "sqlite3-dev" package will. 35 43 36 == Schema == 44 Schema 45 ====== 37 46 38 The database contains the following tables: 47 The database contains the following tables:: 39 48 40 CREATE TABLE version41 (42 version integer # contains one row, set to 143 );44 45 CREATE TABLE local_files46 (47 path varchar(1024), PRIMARY KEY -- index, this is os.path.abspath(fn)48 size integer, -- os.stat(fn)[stat.ST_SIZE]49 mtime number, -- os.stat(fn)[stat.ST_MTIME]50 ctime number, -- os.stat(fn)[stat.ST_CTIME]51 fileid integer52 );53 54 CREATE TABLE caps55 (56 fileid integer PRIMARY KEY AUTOINCREMENT,57 filecap varchar(256) UNIQUE -- URI:CHK:...58 );59 60 CREATE TABLE last_upload61 (62 fileid INTEGER PRIMARY KEY,63 last_uploaded TIMESTAMP,64 last_checked TIMESTAMP65 );66 67 CREATE TABLE directories68 (69 dirhash varchar(256) PRIMARY KEY,70 dircap varchar(256),71 last_uploaded TIMESTAMP,72 last_checked TIMESTAMP73 );49 CREATE TABLE version 50 ( 51 version integer # contains one row, set to 1 52 ); 53 54 CREATE TABLE local_files 55 ( 56 path varchar(1024), PRIMARY KEY -- index, this is os.path.abspath(fn) 57 size integer, -- os.stat(fn)[stat.ST_SIZE] 58 mtime number, -- os.stat(fn)[stat.ST_MTIME] 59 ctime number, -- os.stat(fn)[stat.ST_CTIME] 60 fileid integer 61 ); 62 63 CREATE TABLE caps 64 ( 65 fileid integer PRIMARY KEY AUTOINCREMENT, 66 filecap varchar(256) UNIQUE -- URI:CHK:... 67 ); 68 69 CREATE TABLE last_upload 70 ( 71 fileid INTEGER PRIMARY KEY, 72 last_uploaded TIMESTAMP, 73 last_checked TIMESTAMP 74 ); 75 76 CREATE TABLE directories 77 ( 78 dirhash varchar(256) PRIMARY KEY, 79 dircap varchar(256), 80 last_uploaded TIMESTAMP, 81 last_checked TIMESTAMP 82 ); 74 83 75 == Upload Operation == 84 Upload Operation 85 ================ 76 86 77 87 The upload process starts with a pathname (like ~/.emacs) and wants to end up 78 88 with a file-cap (like URI:CHK:...). … … 82 92 is not present in this table, the file must be uploaded. The upload process 83 93 is: 84 94 85 1. record the file's size, creation time, and modification time 86 2. upload the file into the grid, obtaining an immutable file read-cap 87 3. add an entry to the 'caps' table, with the read-cap, to get a fileid 88 4. add an entry to the 'last_upload' table, with the current time 89 5. add an entry to the 'local_files' table, with the fileid, the path, 90 and the local file's size/ctime/mtime 95 1. record the file's size, creation time, and modification time 96 97 2. upload the file into the grid, obtaining an immutable file read-cap 98 99 3. add an entry to the 'caps' table, with the read-cap, to get a fileid 100 101 4. add an entry to the 'last_upload' table, with the current time 102 103 5. add an entry to the 'local_files' table, with the fileid, the path, 104 and the local file's size/ctime/mtime 91 105 92 106 If the path *is* present in 'local_files', the easy-to-compute identifying 93 107 information is compared: file size and ctime/mtime. If these differ, the file … … 140 154 into the grid. The --no-timestamps can be used to disable this optimization, 141 155 forcing every byte of the file to be hashed and encoded. 142 156 143 == Directory Operations == 157 Directory Operations 158 ==================== 144 159 145 160 Once the contents of a directory are known (a filecap for each file, and a 146 161 dircap for each directory), the backup process must find or create a tahoe -
docs/configuration.txt
diff -rN -u old-tahoe-lafs/docs/configuration.txt new-tahoe-lafs/docs/configuration.txt
old new 1 2 = Configuring a Tahoe node = 1 ======================== 2 Configuring a Tahoe node 3 ======================== 4 5 1. `Overall Node Configuration`_ 6 2. `Client Configuration`_ 7 3. `Storage Server Configuration`_ 8 4. `Running A Helper`_ 9 5. `Running An Introducer`_ 10 6. `Other Files in BASEDIR`_ 11 7. `Other files`_ 12 8. `Backwards Compatibility Files`_ 13 9. `Example`_ 3 14 4 15 A Tahoe node is configured by writing to files in its base directory. These 5 16 files are read by the node when it starts, so each time you change them, you … … 22 33 23 34 The item descriptions below use the following types: 24 35 25 boolean: one of (True, yes, on, 1, False, off, no, 0), case-insensitive 26 strports string: a Twisted listening-port specification string, like "tcp:80" 27 or "tcp:3456:interface=127.0.0.1". For a full description of 28 the format, see 29 http://twistedmatrix.com/documents/current/api/twisted.application.strports.html 30 FURL string: a Foolscap endpoint identifier, like 31 pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm 36 boolean 37 one of (True, yes, on, 1, False, off, no, 0), case-insensitive 38 39 strports string 40 a Twisted listening-port specification string, like "tcp:80" 41 or "tcp:3456:interface=127.0.0.1". For a full description of 42 the format, see 43 http://twistedmatrix.com/documents/current/api/twisted.application.strports.html 44 45 FURL string 46 a Foolscap endpoint identifier, like 47 pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm 32 48 33 49 34 == Overall Node Configuration == 50 Overall Node Configuration 51 ========================== 35 52 36 53 This section controls the network behavior of the node overall: which ports 37 54 and IP addresses are used, when connections are timed out, etc. This … … 43 60 that port number in the tub.port option. If behind a NAT, you *may* need to 44 61 set the tub.location option described below. 45 62 63 :: 46 64 47 [node]65 [node] 48 66 49 nickname = (UTF-8 string, optional)67 nickname = (UTF-8 string, optional) 50 68 51 This value will be displayed in management tools as this node's "nickname". 52 If not provided, the nickname will be set to "<unspecified>". This string 53 shall be a UTF-8 encoded unicode string. 54 55 web.port = (strports string, optional) 56 57 This controls where the node's webserver should listen, providing filesystem 58 access and node status as defined in webapi.txt . This file contains a 59 Twisted "strports" specification such as "3456" or 60 "tcp:3456:interface=127.0.0.1". The 'tahoe create-node' or 'tahoe create-client' 61 commands set the web.port to "tcp:3456:interface=127.0.0.1" by default; this 62 is overridable by the "--webport" option. You can make it use SSL by writing 63 "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" instead. 64 65 If this is not provided, the node will not run a web server. 66 67 web.static = (string, optional) 68 69 This controls where the /static portion of the URL space is served. The 70 value is a directory name (~username is allowed, and non-absolute names are 71 interpreted relative to the node's basedir) which can contain HTML and other 72 files. This can be used to serve a javascript-based frontend to the Tahoe 73 node, or other services. 74 75 The default value is "public_html", which will serve $BASEDIR/public_html . 76 With the default settings, http://127.0.0.1:3456/static/foo.html will serve 77 the contents of $BASEDIR/public_html/foo.html . 78 79 tub.port = (integer, optional) 80 81 This controls which port the node uses to accept Foolscap connections from 82 other nodes. If not provided, the node will ask the kernel for any available 83 port. The port will be written to a separate file (named client.port or 84 introducer.port), so that subsequent runs will re-use the same port. 85 86 tub.location = (string, optional) 87 88 In addition to running as a client, each Tahoe node also runs as a server, 89 listening for connections from other Tahoe clients. The node announces its 90 location by publishing a "FURL" (a string with some connection hints) to the 91 Introducer. The string it publishes can be found in 92 $BASEDIR/private/storage.furl . The "tub.location" configuration controls 93 what location is published in this announcement. 94 95 If you don't provide tub.location, the node will try to figure out a useful 96 one by itself, by using tools like 'ifconfig' to determine the set of IP 97 addresses on which it can be reached from nodes both near and far. It will 98 also include the TCP port number on which it is listening (either the one 99 specified by tub.port, or whichever port was assigned by the kernel when 100 tub.port is left unspecified). 101 102 You might want to override this value if your node lives behind a firewall 103 that is doing inbound port forwarding, or if you are using other proxies 104 such that the local IP address or port number is not the same one that 105 remote clients should use to connect. You might also want to control this 106 when using a Tor proxy to avoid revealing your actual IP address through the 107 Introducer announcement. 108 109 The value is a comma-separated string of host:port location hints, like 110 this: 111 112 123.45.67.89:8098,tahoe.example.com:8098,127.0.0.1:8098 113 114 A few examples: 115 116 Emulate default behavior, assuming your host has IP address 123.45.67.89 117 and the kernel-allocated port number was 8098: 118 119 tub.port = 8098 120 tub.location = 123.45.67.89:8098,127.0.0.1:8098 121 122 Use a DNS name so you can change the IP address more easily: 123 124 tub.port = 8098 125 tub.location = tahoe.example.com:8098 126 127 Run a node behind a firewall (which has an external IP address) that has 128 been configured to forward port 7912 to our internal node's port 8098: 129 130 tub.port = 8098 131 tub.location = external-firewall.example.com:7912 132 133 Run a node behind a Tor proxy (perhaps via torsocks), in client-only mode 134 (i.e. we can make outbound connections, but other nodes will not be able to 135 connect to us). The literal 'unreachable.example.org' will not resolve, but 136 will serve as a reminder to human observers that this node cannot be 137 reached. "Don't call us.. we'll call you": 138 139 tub.port = 8098 140 tub.location = unreachable.example.org:0 141 142 Run a node behind a Tor proxy, and make the server available as a Tor 143 "hidden service". (this assumes that other clients are running their node 144 with torsocks, such that they are prepared to connect to a .onion address). 145 The hidden service must first be configured in Tor, by giving it a local 146 port number and then obtaining a .onion name, using something in the torrc 147 file like: 148 149 HiddenServiceDir /var/lib/tor/hidden_services/tahoe 150 HiddenServicePort 29212 127.0.0.1:8098 151 152 once Tor is restarted, the .onion hostname will be in 153 /var/lib/tor/hidden_services/tahoe/hostname . Then set up your tahoe.cfg 154 like: 155 156 tub.port = 8098 157 tub.location = ualhejtq2p7ohfbb.onion:29212 158 159 Most users will not need to set tub.location . 160 161 Note that the old 'advertised_ip_addresses' file from earlier releases is no 162 longer supported. Tahoe 1.3.0 and later will ignore this file. 163 164 log_gatherer.furl = (FURL, optional) 165 166 If provided, this contains a single FURL string which is used to contact a 167 'log gatherer', which will be granted access to the logport. This can be 168 used by centralized storage meshes to gather operational logs in a single 169 place. Note that when an old-style BASEDIR/log_gatherer.furl file exists 170 (see 'Backwards Compatibility Files', below), both are used. (for most other 171 items, the separate config file overrides the entry in tahoe.cfg) 172 173 timeout.keepalive = (integer in seconds, optional) 174 timeout.disconnect = (integer in seconds, optional) 175 176 If timeout.keepalive is provided, it is treated as an integral number of 177 seconds, and sets the Foolscap "keepalive timer" to that value. For each 178 connection to another node, if nothing has been heard for a while, we will 179 attempt to provoke the other end into saying something. The duration of 180 silence that passes before sending the PING will be between KT and 2*KT. 181 This is mainly intended to keep NAT boxes from expiring idle TCP sessions, 182 but also gives TCP's long-duration keepalive/disconnect timers some traffic 183 to work with. The default value is 240 (i.e. 4 minutes). 184 185 If timeout.disconnect is provided, this is treated as an integral number of 186 seconds, and sets the Foolscap "disconnect timer" to that value. For each 187 connection to another node, if nothing has been heard for a while, we will 188 drop the connection. The duration of silence that passes before dropping the 189 connection will be between DT-2*KT and 2*DT+2*KT (please see ticket #521 for 190 more details). If we are sending a large amount of data to the other end 191 (which takes more than DT-2*KT to deliver), we might incorrectly drop the 192 connection. The default behavior (when this value is not provided) is to 193 disable the disconnect timer. 194 195 See ticket #521 for a discussion of how to pick these timeout values. Using 196 30 minutes means we'll disconnect after 22 to 68 minutes of inactivity. 197 Receiving data will reset this timeout, however if we have more than 22min 198 of data in the outbound queue (such as 800kB in two pipelined segments of 10 199 shares each) and the far end has no need to contact us, our ping might be 200 delayed, so we may disconnect them by accident. 201 202 ssh.port = (strports string, optional) 203 ssh.authorized_keys_file = (filename, optional) 204 205 This enables an SSH-based interactive Python shell, which can be used to 206 inspect the internal state of the node, for debugging. To cause the node to 207 accept SSH connections on port 8022 from the same keys as the rest of your 208 account, use: 209 210 [tub] 211 ssh.port = 8022 212 ssh.authorized_keys_file = ~/.ssh/authorized_keys 213 214 tempdir = (string, optional) 215 216 This specifies a temporary directory for the webapi server to use, for 217 holding large files while they are being uploaded. If a webapi client 218 attempts to upload a 10GB file, this tempdir will need to have at least 10GB 219 available for the upload to complete. 220 221 The default value is the "tmp" directory in the node's base directory (i.e. 222 $NODEDIR/tmp), but it can be placed elsewhere. This directory is used for 223 files that usually (on a unix system) go into /tmp . The string will be 224 interpreted relative to the node's base directory. 225 226 == Client Configuration == 227 228 [client] 229 introducer.furl = (FURL string, mandatory) 230 231 This FURL tells the client how to connect to the introducer. Each Tahoe grid 232 is defined by an introducer. The introducer's furl is created by the 233 introducer node and written into its base directory when it starts, 234 whereupon it should be published to everyone who wishes to attach a client 235 to that grid 236 237 helper.furl = (FURL string, optional) 238 239 If provided, the node will attempt to connect to and use the given helper 240 for uploads. See docs/helper.txt for details. 241 242 key_generator.furl = (FURL string, optional) 243 244 If provided, the node will attempt to connect to and use the given 245 key-generator service, using RSA keys from the external process rather than 246 generating its own. 247 248 stats_gatherer.furl = (FURL string, optional) 249 250 If provided, the node will connect to the given stats gatherer and provide 251 it with operational statistics. 252 253 shares.needed = (int, optional) aka "k", default 3 254 shares.total = (int, optional) aka "N", N >= k, default 10 255 shares.happy = (int, optional) 1 <= happy <= N, default 7 256 257 These three values set the default encoding parameters. Each time a new file 258 is uploaded, erasure-coding is used to break the ciphertext into separate 259 pieces. There will be "N" (i.e. shares.total) pieces created, and the file 260 will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved. 261 The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10). 262 Setting k to 1 is equivalent to simple replication (uploading N copies of 263 the file). 264 265 These values control the tradeoff between storage overhead, performance, and 266 reliability. To a first approximation, a 1MB file will use (1MB*N/k) of 267 backend storage space (the actual value will be a bit more, because of other 268 forms of overhead). Up to N-k shares can be lost before the file becomes 269 unrecoverable, so assuming there are at least N servers, up to N-k servers 270 can be offline without losing the file. So large N/k ratios are more 271 reliable, and small N/k ratios use less disk space. Clearly, k must never be 272 smaller than N. 273 274 Large values of N will slow down upload operations slightly, since more 275 servers must be involved, and will slightly increase storage overhead due to 276 the hash trees that are created. Large values of k will cause downloads to 277 be marginally slower, because more servers must be involved. N cannot be 278 larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe 279 uses. 280 281 shares.happy allows you control over the distribution of your immutable file. 282 For a successful upload, shares are guaranteed to be initially placed on 283 at least 'shares.happy' distinct servers, the correct functioning of any 284 k of which is sufficient to guarantee the availability of the uploaded file. 285 This value should not be larger than the number of servers on your grid. 286 287 A value of shares.happy <= k is allowed, but does not provide any redundancy 288 if some servers fail or lose shares. 289 290 (Mutable files use a different share placement algorithm that does not 291 consider this parameter.) 292 293 294 == Storage Server Configuration == 295 296 [storage] 297 enabled = (boolean, optional) 298 299 If this is True, the node will run a storage server, offering space to other 300 clients. If it is False, the node will not run a storage server, meaning 301 that no shares will be stored on this node. Use False this for clients who 302 do not wish to provide storage service. The default value is True. 303 304 readonly = (boolean, optional) 305 306 If True, the node will run a storage server but will not accept any shares, 307 making it effectively read-only. Use this for storage servers which are 308 being decommissioned: the storage/ directory could be mounted read-only, 309 while shares are moved to other servers. Note that this currently only 310 affects immutable shares. Mutable shares (used for directories) will be 311 written and modified anyway. See ticket #390 for the current status of this 312 bug. The default value is False. 313 314 reserved_space = (str, optional) 315 316 If provided, this value defines how much disk space is reserved: the storage 317 server will not accept any share which causes the amount of free disk space 318 to drop below this value. (The free space is measured by a call to statvfs(2) 319 on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the 320 user account under which the storage server runs.) 321 322 This string contains a number, with an optional case-insensitive scale 323 suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So 324 "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same 325 thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing. 326 327 expire.enabled = 328 expire.mode = 329 expire.override_lease_duration = 330 expire.cutoff_date = 331 expire.immutable = 332 expire.mutable = 333 334 These settings control garbage-collection, in which the server will delete 335 shares that no longer have an up-to-date lease on them. Please see the 336 neighboring "garbage-collection.txt" document for full details. 69 This value will be displayed in management tools as this node's "nickname". 70 If not provided, the nickname will be set to "<unspecified>". This string 71 shall be a UTF-8 encoded unicode string. 72 73 web.port = (strports string, optional) 74 75 This controls where the node's webserver should listen, providing filesystem 76 access and node status as defined in webapi.txt . This file contains a 77 Twisted "strports" specification such as "3456" or 78 "tcp:3456:interface=127.0.0.1". The 'tahoe create-node' or 'tahoe create-client' 79 commands set the web.port to "tcp:3456:interface=127.0.0.1" by default; this 80 is overridable by the "--webport" option. You can make it use SSL by writing 81 "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" instead. 82 83 If this is not provided, the node will not run a web server. 84 85 web.static = (string, optional) 86 87 This controls where the /static portion of the URL space is served. The 88 value is a directory name (~username is allowed, and non-absolute names are 89 interpreted relative to the node's basedir) which can contain HTML and other 90 files. This can be used to serve a javascript-based frontend to the Tahoe 91 node, or other services. 92 93 The default value is "public_html", which will serve $BASEDIR/public_html . 94 With the default settings, http://127.0.0.1:3456/static/foo.html will serve 95 the contents of $BASEDIR/public_html/foo.html . 96 97 tub.port = (integer, optional) 98 99 This controls which port the node uses to accept Foolscap connections from 100 other nodes. If not provided, the node will ask the kernel for any available 101 port. The port will be written to a separate file (named client.port or 102 introducer.port), so that subsequent runs will re-use the same port. 103 104 tub.location = (string, optional) 105 106 In addition to running as a client, each Tahoe node also runs as a server, 107 listening for connections from other Tahoe clients. The node announces its 108 location by publishing a "FURL" (a string with some connection hints) to the 109 Introducer. The string it publishes can be found in 110 $BASEDIR/private/storage.furl . The "tub.location" configuration controls 111 what location is published in this announcement. 112 113 If you don't provide tub.location, the node will try to figure out a useful 114 one by itself, by using tools like 'ifconfig' to determine the set of IP 115 addresses on which it can be reached from nodes both near and far. It will 116 also include the TCP port number on which it is listening (either the one 117 specified by tub.port, or whichever port was assigned by the kernel when 118 tub.port is left unspecified). 119 120 You might want to override this value if your node lives behind a firewall 121 that is doing inbound port forwarding, or if you are using other proxies 122 such that the local IP address or port number is not the same one that 123 remote clients should use to connect. You might also want to control this 124 when using a Tor proxy to avoid revealing your actual IP address through the 125 Introducer announcement. 126 127 The value is a comma-separated string of host:port location hints, like 128 this: 129 130 123.45.67.89:8098,tahoe.example.com:8098,127.0.0.1:8098 131 132 A few examples: 133 134 Emulate default behavior, assuming your host has IP address 123.45.67.89 135 and the kernel-allocated port number was 8098: 136 137 tub.port = 8098 138 tub.location = 123.45.67.89:8098,127.0.0.1:8098 139 140 Use a DNS name so you can change the IP address more easily: 141 142 tub.port = 8098 143 tub.location = tahoe.example.com:8098 144 145 Run a node behind a firewall (which has an external IP address) that has 146 been configured to forward port 7912 to our internal node's port 8098: 147 148 tub.port = 8098 149 tub.location = external-firewall.example.com:7912 150 151 Run a node behind a Tor proxy (perhaps via torsocks), in client-only mode 152 (i.e. we can make outbound connections, but other nodes will not be able to 153 connect to us). The literal 'unreachable.example.org' will not resolve, but 154 will serve as a reminder to human observers that this node cannot be 155 reached. "Don't call us.. we'll call you": 156 157 tub.port = 8098 158 tub.location = unreachable.example.org:0 159 160 Run a node behind a Tor proxy, and make the server available as a Tor 161 "hidden service". (this assumes that other clients are running their node 162 with torsocks, such that they are prepared to connect to a .onion address). 163 The hidden service must first be configured in Tor, by giving it a local 164 port number and then obtaining a .onion name, using something in the torrc 165 file like: 166 167 HiddenServiceDir /var/lib/tor/hidden_services/tahoe 168 HiddenServicePort 29212 127.0.0.1:8098 169 170 once Tor is restarted, the .onion hostname will be in 171 /var/lib/tor/hidden_services/tahoe/hostname . Then set up your tahoe.cfg 172 like: 173 174 tub.port = 8098 175 tub.location = ualhejtq2p7ohfbb.onion:29212 176 177 Most users will not need to set tub.location . 178 179 Note that the old 'advertised_ip_addresses' file from earlier releases is no 180 longer supported. Tahoe 1.3.0 and later will ignore this file. 181 182 log_gatherer.furl = (FURL, optional) 183 184 If provided, this contains a single FURL string which is used to contact a 185 'log gatherer', which will be granted access to the logport. This can be 186 used by centralized storage meshes to gather operational logs in a single 187 place. Note that when an old-style BASEDIR/log_gatherer.furl file exists 188 (see 'Backwards Compatibility Files', below), both are used. (for most other 189 items, the separate config file overrides the entry in tahoe.cfg) 190 191 timeout.keepalive = (integer in seconds, optional) 192 timeout.disconnect = (integer in seconds, optional) 193 194 If timeout.keepalive is provided, it is treated as an integral number of 195 seconds, and sets the Foolscap "keepalive timer" to that value. For each 196 connection to another node, if nothing has been heard for a while, we will 197 attempt to provoke the other end into saying something. The duration of 198 silence that passes before sending the PING will be between KT and 2*KT. 199 This is mainly intended to keep NAT boxes from expiring idle TCP sessions, 200 but also gives TCP's long-duration keepalive/disconnect timers some traffic 201 to work with. The default value is 240 (i.e. 4 minutes). 202 203 If timeout.disconnect is provided, this is treated as an integral number of 204 seconds, and sets the Foolscap "disconnect timer" to that value. For each 205 connection to another node, if nothing has been heard for a while, we will 206 drop the connection. The duration of silence that passes before dropping the 207 connection will be between DT-2*KT and 2*DT+2*KT (please see ticket #521 for 208 more details). If we are sending a large amount of data to the other end 209 (which takes more than DT-2*KT to deliver), we might incorrectly drop the 210 connection. The default behavior (when this value is not provided) is to 211 disable the disconnect timer. 212 213 See ticket #521 for a discussion of how to pick these timeout values. Using 214 30 minutes means we'll disconnect after 22 to 68 minutes of inactivity. 215 Receiving data will reset this timeout, however if we have more than 22min 216 of data in the outbound queue (such as 800kB in two pipelined segments of 10 217 shares each) and the far end has no need to contact us, our ping might be 218 delayed, so we may disconnect them by accident. 219 220 ssh.port = (strports string, optional) 221 ssh.authorized_keys_file = (filename, optional) 222 223 This enables an SSH-based interactive Python shell, which can be used to 224 inspect the internal state of the node, for debugging. To cause the node to 225 accept SSH connections on port 8022 from the same keys as the rest of your 226 account, use: 227 228 [tub] 229 ssh.port = 8022 230 ssh.authorized_keys_file = ~/.ssh/authorized_keys 231 232 tempdir = (string, optional) 233 234 This specifies a temporary directory for the webapi server to use, for 235 holding large files while they are being uploaded. If a webapi client 236 attempts to upload a 10GB file, this tempdir will need to have at least 10GB 237 available for the upload to complete. 238 239 The default value is the "tmp" directory in the node's base directory (i.e. 240 $NODEDIR/tmp), but it can be placed elsewhere. This directory is used for 241 files that usually (on a unix system) go into /tmp . The string will be 242 interpreted relative to the node's base directory. 243 244 Client Configuration 245 ==================== 246 247 :: 248 249 [client] 250 introducer.furl = (FURL string, mandatory) 251 252 This FURL tells the client how to connect to the introducer. Each Tahoe grid 253 is defined by an introducer. The introducer's furl is created by the 254 introducer node and written into its base directory when it starts, 255 whereupon it should be published to everyone who wishes to attach a client 256 to that grid 257 258 helper.furl = (FURL string, optional) 259 260 If provided, the node will attempt to connect to and use the given helper 261 for uploads. See docs/helper.txt for details. 262 263 key_generator.furl = (FURL string, optional) 264 265 If provided, the node will attempt to connect to and use the given 266 key-generator service, using RSA keys from the external process rather than 267 generating its own. 268 269 stats_gatherer.furl = (FURL string, optional) 270 271 If provided, the node will connect to the given stats gatherer and provide 272 it with operational statistics. 273 274 shares.needed = (int, optional) aka "k", default 3 275 shares.total = (int, optional) aka "N", N >= k, default 10 276 shares.happy = (int, optional) 1 <= happy <= N, default 7 277 278 These three values set the default encoding parameters. Each time a new file 279 is uploaded, erasure-coding is used to break the ciphertext into separate 280 pieces. There will be "N" (i.e. shares.total) pieces created, and the file 281 will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved. 282 The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10). 283 Setting k to 1 is equivalent to simple replication (uploading N copies of 284 the file). 285 286 These values control the tradeoff between storage overhead, performance, and 287 reliability. To a first approximation, a 1MB file will use (1MB*N/k) of 288 backend storage space (the actual value will be a bit more, because of other 289 forms of overhead). Up to N-k shares can be lost before the file becomes 290 unrecoverable, so assuming there are at least N servers, up to N-k servers 291 can be offline without losing the file. So large N/k ratios are more 292 reliable, and small N/k ratios use less disk space. Clearly, k must never be 293 smaller than N. 294 295 Large values of N will slow down upload operations slightly, since more 296 servers must be involved, and will slightly increase storage overhead due to 297 the hash trees that are created. Large values of k will cause downloads to 298 be marginally slower, because more servers must be involved. N cannot be 299 larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe 300 uses. 301 302 shares.happy allows you control over the distribution of your immutable file. 303 For a successful upload, shares are guaranteed to be initially placed on 304 at least 'shares.happy' distinct servers, the correct functioning of any 305 k of which is sufficient to guarantee the availability of the uploaded file. 306 This value should not be larger than the number of servers on your grid. 307 308 A value of shares.happy <= k is allowed, but does not provide any redundancy 309 if some servers fail or lose shares. 310 311 (Mutable files use a different share placement algorithm that does not 312 consider this parameter.) 313 314 315 Storage Server Configuration 316 ============================ 317 318 :: 319 320 [storage] 321 enabled = (boolean, optional) 322 323 If this is True, the node will run a storage server, offering space to other 324 clients. If it is False, the node will not run a storage server, meaning 325 that no shares will be stored on this node. Use False this for clients who 326 do not wish to provide storage service. The default value is True. 327 328 readonly = (boolean, optional) 329 330 If True, the node will run a storage server but will not accept any shares, 331 making it effectively read-only. Use this for storage servers which are 332 being decommissioned: the storage/ directory could be mounted read-only, 333 while shares are moved to other servers. Note that this currently only 334 affects immutable shares. Mutable shares (used for directories) will be 335 written and modified anyway. See ticket #390 for the current status of this 336 bug. The default value is False. 337 338 reserved_space = (str, optional) 339 340 If provided, this value defines how much disk space is reserved: the storage 341 server will not accept any share which causes the amount of free disk space 342 to drop below this value. (The free space is measured by a call to statvfs(2) 343 on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the 344 user account under which the storage server runs.) 345 346 This string contains a number, with an optional case-insensitive scale 347 suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So 348 "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same 349 thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing. 350 351 expire.enabled = 352 expire.mode = 353 expire.override_lease_duration = 354 expire.cutoff_date = 355 expire.immutable = 356 expire.mutable = 357 358 These settings control garbage-collection, in which the server will delete 359 shares that no longer have an up-to-date lease on them. Please see the 360 neighboring "garbage-collection.txt" document for full details. 337 361 338 362 339 == Running A Helper == 363 Running A Helper 364 ================ 340 365 341 366 A "helper" is a regular client node that also offers the "upload helper" 342 367 service. 343 368 344 [helper] 345 enabled = (boolean, optional) 369 :: 346 370 347 If True, the node will run a helper (see docs/helper.txt for details). The 348 helper's contact FURL will be placed in private/helper.furl, from which it 349 can be copied to any clients which wish to use it. Clearly nodes should not 350 both run a helper and attempt to use one: do not create both helper.furl and 351 run_helper in the same node. The default is False. 371 [helper] 372 enabled = (boolean, optional) 373 374 If True, the node will run a helper (see docs/helper.txt for details). The 375 helper's contact FURL will be placed in private/helper.furl, from which it 376 can be copied to any clients which wish to use it. Clearly nodes should not 377 both run a helper and attempt to use one: do not create both helper.furl and 378 run_helper in the same node. The default is False. 352 379 353 380 354 == Running An Introducer == 381 Running An Introducer 382 ===================== 355 383 356 384 The introducer node uses a different '.tac' file (named introducer.tac), and 357 385 pays attention to the "[node]" section, but not the others. … … 365 393 copied into new client nodes before they are started for the first time. 366 394 367 395 368 == Other Files in BASEDIR == 396 Other Files in BASEDIR 397 ====================== 369 398 370 399 Some configuration is not kept in tahoe.cfg, for the following reasons: 371 400 372 373 374 401 * it is generated by the node at startup, e.g. encryption keys. The node 402 never writes to tahoe.cfg 403 * it is generated by user action, e.g. the 'tahoe create-alias' command 375 404 376 405 In addition, non-configuration persistent state is kept in the node's base 377 406 directory, next to the configuration knobs. 378 407 379 408 This section describes these other files. 380 409 381 382 private/node.pem : This contains an SSL private-key certificate. The node 383 generates this the first time it is started, and re-uses it on subsequent 384 runs. This certificate allows the node to have a cryptographically-strong 385 identifier (the Foolscap "TubID"), and to establish secure connections to 386 other nodes. 387 388 storage/ : Nodes which host StorageServers will create this directory to hold 389 shares of files on behalf of other clients. There will be a directory 390 underneath it for each StorageIndex for which this node is holding shares. 391 There is also an "incoming" directory where partially-completed shares are 392 held while they are being received. 393 394 client.tac : this file defines the client, by constructing the actual Client 395 instance each time the node is started. It is used by the 'twistd' 396 daemonization program (in the "-y" mode), which is run internally by the 397 "tahoe start" command. This file is created by the "tahoe create-node" or 398 "tahoe create-client" commands. 399 400 private/control.furl : this file contains a FURL that provides access to a 401 control port on the client node, from which files can be uploaded and 402 downloaded. This file is created with permissions that prevent anyone else 403 from reading it (on operating systems that support such a concept), to insure 404 that only the owner of the client node can use this feature. This port is 405 intended for debugging and testing use. 406 407 private/logport.furl : this file contains a FURL that provides access to a 408 'log port' on the client node, from which operational logs can be retrieved. 409 Do not grant logport access to strangers, because occasionally secret 410 information may be placed in the logs. 411 412 private/helper.furl : if the node is running a helper (for use by other 413 clients), its contact FURL will be placed here. See docs/helper.txt for more 414 details. 415 416 private/root_dir.cap (optional): The command-line tools will read a directory 417 cap out of this file and use it, if you don't specify a '--dir-cap' option or 418 if you specify '--dir-cap=root'. 419 420 private/convergence (automatically generated): An added secret for encrypting 421 immutable files. Everyone who has this same string in their 422 private/convergence file encrypts their immutable files in the same way when 423 uploading them. This causes identical files to "converge" -- to share the 424 same storage space since they have identical ciphertext -- which conserves 425 space and optimizes upload time, but it also exposes files to the possibility 426 of a brute-force attack by people who know that string. In this attack, if 427 the attacker can guess most of the contents of a file, then they can use 428 brute-force to learn the remaining contents. 410 private/node.pem 411 This contains an SSL private-key certificate. The node 412 generates this the first time it is started, and re-uses it on subsequent 413 runs. This certificate allows the node to have a cryptographically-strong 414 identifier (the Foolscap "TubID"), and to establish secure connections to 415 other nodes. 416 417 storage/ 418 Nodes which host StorageServers will create this directory to hold 419 shares of files on behalf of other clients. There will be a directory 420 underneath it for each StorageIndex for which this node is holding shares. 421 There is also an "incoming" directory where partially-completed shares are 422 held while they are being received. 423 424 client.tac 425 this file defines the client, by constructing the actual Client 426 instance each time the node is started. It is used by the 'twistd' 427 daemonization program (in the "-y" mode), which is run internally by the 428 "tahoe start" command. This file is created by the "tahoe create-node" or 429 "tahoe create-client" commands. 430 431 private/control.furl 432 this file contains a FURL that provides access to a 433 control port on the client node, from which files can be uploaded and 434 downloaded. This file is created with permissions that prevent anyone else 435 from reading it (on operating systems that support such a concept), to insure 436 that only the owner of the client node can use this feature. This port is 437 intended for debugging and testing use. 438 439 private/logport.furl 440 this file contains a FURL that provides access to a 441 'log port' on the client node, from which operational logs can be retrieved. 442 Do not grant logport access to strangers, because occasionally secret 443 information may be placed in the logs. 444 445 private/helper.furl 446 if the node is running a helper (for use by other 447 clients), its contact FURL will be placed here. See docs/helper.txt for more 448 details. 449 450 private/root_dir.cap (optional) 451 The command-line tools will read a directory 452 cap out of this file and use it, if you don't specify a '--dir-cap' option or 453 if you specify '--dir-cap=root'. 454 455 private/convergence (automatically generated) 456 An added secret for encrypting 457 immutable files. Everyone who has this same string in their 458 private/convergence file encrypts their immutable files in the same way when 459 uploading them. This causes identical files to "converge" -- to share the 460 same storage space since they have identical ciphertext -- which conserves 461 space and optimizes upload time, but it also exposes files to the possibility 462 of a brute-force attack by people who know that string. In this attack, if 463 the attacker can guess most of the contents of a file, then they can use 464 brute-force to learn the remaining contents. 429 465 430 466 So the set of people who know your private/convergence string is the set of 431 467 people who converge their storage space with you when you and they upload … … 439 475 possible, put the empty string (so that private/convergence is a zero-length 440 476 file). 441 477 478 Other files 479 =========== 442 480 443 == Other files == 481 logs/ 482 Each Tahoe node creates a directory to hold the log messages produced 483 as the node runs. These logfiles are created and rotated by the "twistd" 484 daemonization program, so logs/twistd.log will contain the most recent 485 messages, logs/twistd.log.1 will contain the previous ones, logs/twistd.log.2 486 will be older still, and so on. twistd rotates logfiles after they grow 487 beyond 1MB in size. If the space consumed by logfiles becomes troublesome, 488 they should be pruned: a cron job to delete all files that were created more 489 than a month ago in this logs/ directory should be sufficient. 490 491 my_nodeid 492 this is written by all nodes after startup, and contains a 493 base32-encoded (i.e. human-readable) NodeID that identifies this specific 494 node. This NodeID is the same string that gets displayed on the web page (in 495 the "which peers am I connected to" list), and the shortened form (the first 496 characters) is recorded in various log messages. 444 497 445 logs/ : Each Tahoe node creates a directory to hold the log messages produced 446 as the node runs. These logfiles are created and rotated by the "twistd" 447 daemonization program, so logs/twistd.log will contain the most recent 448 messages, logs/twistd.log.1 will contain the previous ones, logs/twistd.log.2 449 will be older still, and so on. twistd rotates logfiles after they grow 450 beyond 1MB in size. If the space consumed by logfiles becomes troublesome, 451 they should be pruned: a cron job to delete all files that were created more 452 than a month ago in this logs/ directory should be sufficient. 453 454 my_nodeid : this is written by all nodes after startup, and contains a 455 base32-encoded (i.e. human-readable) NodeID that identifies this specific 456 node. This NodeID is the same string that gets displayed on the web page (in 457 the "which peers am I connected to" list), and the shortened form (the first 458 characters) is recorded in various log messages. 459 460 461 == Backwards Compatibility Files == 498 Backwards Compatibility Files 499 ============================= 462 500 463 501 Tahoe releases before 1.3.0 had no 'tahoe.cfg' file, and used distinct files 464 502 for each item listed below. For each configuration knob, if the distinct file 465 exists, it will take precedence over the corresponding item in tahoe.cfg . 466 503 exists, it will take precedence over the corresponding item in tahoe.cfg. 467 504 468 [node]nickname : BASEDIR/nickname 469 [node]web.port : BASEDIR/webport 470 [node]tub.port : BASEDIR/client.port (for Clients, not Introducers) 471 [node]tub.port : BASEDIR/introducer.port (for Introducers, not Clients) 472 (note that, unlike other keys, tahoe.cfg overrides the *.port file) 473 [node]tub.location : replaces BASEDIR/advertised_ip_addresses 474 [node]log_gatherer.furl : BASEDIR/log_gatherer.furl (one per line) 475 [node]timeout.keepalive : BASEDIR/keepalive_timeout 476 [node]timeout.disconnect : BASEDIR/disconnect_timeout 477 [client]introducer.furl : BASEDIR/introducer.furl 478 [client]helper.furl : BASEDIR/helper.furl 479 [client]key_generator.furl : BASEDIR/key_generator.furl 480 [client]stats_gatherer.furl : BASEDIR/stats_gatherer.furl 481 [storage]enabled : BASEDIR/no_storage (False if no_storage exists) 482 [storage]readonly : BASEDIR/readonly_storage (True if readonly_storage exists) 483 [storage]sizelimit : BASEDIR/sizelimit 484 [storage]debug_discard : BASEDIR/debug_discard_storage 485 [helper]enabled : BASEDIR/run_helper (True if run_helper exists) 505 =========================== =============================== ================= 506 Config setting File Comment 507 =========================== =============================== ================= 508 [node]nickname BASEDIR/nickname 509 [node]web.port BASEDIR/webport 510 [node]tub.port BASEDIR/client.port (for Clients, not Introducers) 511 [node]tub.port BASEDIR/introducer.port (for Introducers, not Clients) (note that, unlike other keys, tahoe.cfg overrides this file) 512 [node]tub.location BASEDIR/advertised_ip_addresses 513 [node]log_gatherer.furl BASEDIR/log_gatherer.furl (one per line) 514 [node]timeout.keepalive BASEDIR/keepalive_timeout 515 [node]timeout.disconnect BASEDIR/disconnect_timeout 516 [client]introducer.furl BASEDIR/introducer.furl 517 [client]helper.furl BASEDIR/helper.furl 518 [client]key_generator.furl BASEDIR/key_generator.furl 519 [client]stats_gatherer.furl BASEDIR/stats_gatherer.furl 520 [storage]enabled BASEDIR/no_storage (False if no_storage exists) 521 [storage]readonly BASEDIR/readonly_storage (True if readonly_storage exists) 522 [storage]sizelimit BASEDIR/sizelimit 523 [storage]debug_discard BASEDIR/debug_discard_storage 524 [helper]enabled BASEDIR/run_helper (True if run_helper exists) 525 =========================== =============================== ================= 486 526 487 527 Note: the functionality of [node]ssh.port and [node]ssh.authorized_keys_file 488 528 were previously combined, controlled by the presence of a … … 490 530 indicated which port the ssh server should listen on, and the contents of the 491 531 file provided the ssh public keys to accept. Support for these files has been 492 532 removed completely. To ssh into your Tahoe node, add [node]ssh.port and 493 [node].ssh_authorized_keys_file statements to your tahoe.cfg 533 [node].ssh_authorized_keys_file statements to your tahoe.cfg. 494 534 495 535 Likewise, the functionality of [node]tub.location is a variant of the 496 536 now-unsupported BASEDIR/advertised_ip_addresses . The old file was additive … … 499 539 is not (tub.location is used verbatim). 500 540 501 541 502 == Example == 542 Example 543 ======= 503 544 504 545 The following is a sample tahoe.cfg file, containing values for all keys 505 546 described above. Note that this is not a recommended configuration (most of 506 547 these are not the default values), merely a legal one. 507 548 508 [node] 509 nickname = Bob's Tahoe Node 510 tub.port = 34912 511 tub.location = 123.45.67.89:8098,44.55.66.77:8098 512 web.port = 3456 513 log_gatherer.furl = pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm 514 timeout.keepalive = 240 515 timeout.disconnect = 1800 516 ssh.port = 8022 517 ssh.authorized_keys_file = ~/.ssh/authorized_keys 518 519 [client] 520 introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo 521 helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr 522 523 [storage] 524 enabled = True 525 readonly_storage = True 526 sizelimit = 10000000000 549 :: 527 550 528 [helper] 529 run_helper = True 551 [node] 552 nickname = Bob's Tahoe Node 553 tub.port = 34912 554 tub.location = 123.45.67.89:8098,44.55.66.77:8098 555 web.port = 3456 556 log_gatherer.furl = pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm 557 timeout.keepalive = 240 558 timeout.disconnect = 1800 559 ssh.port = 8022 560 ssh.authorized_keys_file = ~/.ssh/authorized_keys 561 562 [client] 563 introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo 564 helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr 565 566 [storage] 567 enabled = True 568 readonly_storage = True 569 sizelimit = 10000000000 570 571 [helper] 572 run_helper = True -
docs/debian.txt
diff -rN -u old-tahoe-lafs/docs/debian.txt new-tahoe-lafs/docs/debian.txt
old new 1 = Debian Support = 1 ============== 2 Debian Support 3 ============== 4 5 1. `Overview`_ 6 2. `TL;DR supporting package building instructions`_ 7 3. `TL;DR package building instructions for Tahoe`_ 8 4. `Building Debian Packages`_ 9 5. `Using Pre-Built Debian Packages`_ 10 6. `Building From Source on Debian Systems`_ 2 11 3 1. Overview 4 2. TL;DR supporting package building instructions 5 3. TL;DR package building instructions for Tahoe 6 4. Building Debian Packages 7 5. Using Pre-Built Debian Packages 8 6. Building From Source on Debian Systems 9 10 = Overview == 12 Overview 13 ======== 11 14 12 15 One convenient way to install Tahoe-LAFS is with debian packages. 13 16 This document attempts to explain how to complete a desert island build for 14 17 people in a hurry. It also attempts to explain more about our Debian packaging 15 18 for those willing to read beyond the simple pragmatic packaging exercises. 16 19 17 == TL;DR supporting package building instructions == 20 TL;DR supporting package building instructions 21 ============================================== 18 22 19 23 There are only four supporting packages that are currently not available from 20 the debian apt repositories in Debian Lenny: 24 the debian apt repositories in Debian Lenny:: 21 25 22 26 python-foolscap python-zfec argparse zbase32 23 27 24 First, we'll install some common packages for development: 28 First, we'll install some common packages for development:: 25 29 26 30 sudo apt-get install -y build-essential debhelper cdbs python-central \ 27 31 python-setuptools python python-dev python-twisted-core \ … … 31 35 sudo apt-file update 32 36 33 37 34 To create packages for Lenny, we'll also install stdeb: 38 To create packages for Lenny, we'll also install stdeb:: 35 39 36 40 sudo apt-get install python-all-dev 37 41 STDEB_VERSION="0.5.1" … … 41 45 python setup.py --command-packages=stdeb.command bdist_deb 42 46 sudo dpkg -i deb_dist/python-stdeb_$STDEB_VERSION-1_all.deb 43 47 44 Now we're ready to build and install the zfec Debian package: 48 Now we're ready to build and install the zfec Debian package:: 45 49 46 50 darcs get http://allmydata.org/source/zfec/trunk zfac 47 51 cd zfac/zfec/ … … 50 54 dpkg-buildpackage -rfakeroot -uc -us 51 55 sudo dpkg -i ../python-zfec_1.4.6-r333-1_amd64.deb 52 56 53 We need to build a pyutil package: 57 We need to build a pyutil package:: 54 58 55 59 wget http://pypi.python.org/packages/source/p/pyutil/pyutil-1.6.1.tar.gz 56 60 tar -xvzf pyutil-1.6.1.tar.gz … … 60 64 dpkg-buildpackage -rfakeroot -uc -us 61 65 sudo dpkg -i ../python-pyutil_1.6.1-1_all.deb 62 66 63 We also need to install argparse and zbase32: 67 We also need to install argparse and zbase32:: 64 68 65 69 sudo easy_install argparse # argparse won't install with stdeb (!) :-( 66 70 sudo easy_install zbase32 # XXX TODO: package with stdeb 67 71 68 Finally, we'll fetch, unpack, build and install foolscap: 72 Finally, we'll fetch, unpack, build and install foolscap:: 69 73 70 74 # You may not already have Brian's key: 71 75 # gpg --recv-key 0x1514A7BD … … 79 83 dpkg-buildpackage -rfakeroot -uc -us 80 84 sudo dpkg -i ../python-foolscap_0.5.0-1_all.deb 81 85 82 == TL;DR package building instructions for Tahoe == 86 TL;DR package building instructions for Tahoe 87 ============================================= 83 88 84 89 If you want to build your own Debian packages from the darcs tree or from 85 a source release, do the following: 90 a source release, do the following:: 86 91 87 92 cd ~/ 88 93 mkdir src && cd src/ … … 98 103 /etc/defaults/allmydata-tahoe file to get Tahoe started. Data is by default 99 104 stored in /var/lib/tahoelafsd/ and Tahoe runs as the 'tahoelafsd' user. 100 105 101 == Building Debian Packages == 106 Building Debian Packages 107 ======================== 102 108 103 109 The Tahoe source tree comes with limited support for building debian packages 104 110 on a variety of Debian and Ubuntu platforms. For each supported platform, … … 109 115 110 116 To create debian packages from a Tahoe tree, you will need some additional 111 117 tools installed. The canonical list of these packages is in the 112 "Build-Depends" clause of misc/sid/debian/control , and includes: 118 "Build-Depends" clause of misc/sid/debian/control , and includes:: 113 119 114 120 build-essential 115 121 debhelper … … 130 136 Note that we haven't tried to build source packages (.orig.tar.gz + dsc) yet, 131 137 and there are no such source packages in our APT repository. 132 138 133 == Using Pre-Built Debian Packages == 139 Using Pre-Built Debian Packages 140 =============================== 134 141 135 142 The allmydata.org site hosts an APT repository with debian packages that are 136 built after each checkin. The following wiki page describes this repository:137 138 http://allmydata.org/trac/tahoe/wiki/DownloadDebianPackages 143 built after each checkin. `This wiki page 144 <http://allmydata.org/trac/tahoe/wiki/DownloadDebianPackages>`_ describes this 145 repository. 139 146 140 147 The allmydata.org APT repository also includes debian packages of support 141 148 libraries, like Foolscap, zfec, pycryptopp, and everything else you need that 142 149 isn't already in debian. 143 150 144 == Building From Source on Debian Systems == 151 Building From Source on Debian Systems 152 ====================================== 145 153 146 154 Many of Tahoe's build dependencies can be satisfied by first installing 147 155 certain debian packages: simplejson is one of these. Some debian/ubuntu -
docs/filesystem-notes.txt
diff -rN -u old-tahoe-lafs/docs/filesystem-notes.txt new-tahoe-lafs/docs/filesystem-notes.txt
old new 1 ========================= 2 Filesystem-specific notes 3 ========================= 4 5 1. ext3_ 1 6 2 7 Tahoe storage servers use a large number of subdirectories to store their 3 8 shares on local disk. This format is simple and robust, but depends upon the 4 9 local filesystem to provide fast access to those directories. 5 10 6 = ext3 = 11 ext3 12 ==== 7 13 8 14 For moderate- or large-sized storage servers, you'll want to make sure the 9 15 "directory index" feature is enabled on your ext3 directories, otherwise 10 16 share lookup may be very slow. Recent versions of ext3 enable this 11 automatically, but older filesystems may not have it enabled .17 automatically, but older filesystems may not have it enabled:: 12 18 13 $ sudo tune2fs -l /dev/sda1 |grep feature14 Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file19 $ sudo tune2fs -l /dev/sda1 |grep feature 20 Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file 15 21 16 22 If "dir_index" is present in the "features:" line, then you're all set. If 17 23 not, you'll need to use tune2fs and e2fsck to enable and build the index. See 18 this page for some hints: http://wiki.dovecot.org/MailboxFormat/Maildir.24 <http://wiki.dovecot.org/MailboxFormat/Maildir> for some hints. -
docs/garbage-collection.txt
diff -rN -u old-tahoe-lafs/docs/garbage-collection.txt new-tahoe-lafs/docs/garbage-collection.txt
old new 1 = Garbage Collection in Tahoe = 1 =========================== 2 Garbage Collection in Tahoe 3 =========================== 4 5 1. `Overview`_ 6 2. `Client-side Renewal`_ 7 3. `Server Side Expiration`_ 8 4. `Expiration Progress`_ 9 5. `Future Directions`_ 2 10 3 1. Overview 4 2. Client-side Renewal 5 3. Server Side Expiration 6 4. Expiration Progress 7 5. Future Directions 8 9 == Overview == 11 Overview 12 ======== 10 13 11 14 When a file or directory in the virtual filesystem is no longer referenced, 12 15 the space that its shares occupied on each storage server can be freed, … … 40 43 server can use the "expire.override_lease_duration" configuration setting to 41 44 increase or decrease the effective duration to something other than 31 days). 42 45 43 == Client-side Renewal == 46 Client-side Renewal 47 =================== 44 48 45 49 If all of the files and directories which you care about are reachable from a 46 50 single starting point (usually referred to as a "rootcap"), and you store … … 69 73 appropriate for use by individual users as well, and may be incorporated 70 74 directly into the client node. 71 75 72 == Server Side Expiration == 76 Server Side Expiration 77 ====================== 73 78 74 79 Expiration must be explicitly enabled on each storage server, since the 75 80 default behavior is to never expire shares. Expiration is enabled by adding … … 112 117 expired whatever it is going to expire, the second and subsequent passes are 113 118 not going to find any new leases to remove. 114 119 115 The tahoe.cfg file uses the following keys to control lease expiration: 120 The tahoe.cfg file uses the following keys to control lease expiration:: 116 121 117 [storage]122 [storage] 118 123 119 expire.enabled = (boolean, optional)124 expire.enabled = (boolean, optional) 120 125 121 If this is True, the storage server will delete shares on which all leases122 have expired. Other controls dictate when leases are considered to have123 expired. The default is False.126 If this is True, the storage server will delete shares on which all leases 127 have expired. Other controls dictate when leases are considered to have 128 expired. The default is False. 124 129 125 expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)130 expire.mode = (string, "age" or "cutoff-date", required if expiration enabled) 126 131 127 128 129 130 131 132 If this string is "age", the age-based expiration scheme is used, and the 133 "expire.override_lease_duration" setting can be provided to influence the 134 lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is used, 135 and the "expire.cutoff_date" setting must be provided to specify the cutoff 136 date. The mode setting currently has no default: you must provide a value. 132 137 133 134 138 In a future release, this setting is likely to default to "age", but in this 139 release it was deemed safer to require an explicit mode specification. 135 140 136 expire.override_lease_duration = (duration string, optional)141 expire.override_lease_duration = (duration string, optional) 137 142 138 139 140 141 143 When age-based expiration is in use, a lease will be expired if its 144 "lease.create_renew" timestamp plus its "lease.duration" time is 145 earlier/older than the current time. This key, if present, overrides the 146 duration value for all leases, changing the algorithm from: 142 147 143 if (lease.create_renew_timestamp + lease.duration) < now:144 expire_lease()148 if (lease.create_renew_timestamp + lease.duration) < now: 149 expire_lease() 145 150 146 to:151 to: 147 152 148 if (lease.create_renew_timestamp + override_lease_duration) < now:149 expire_lease()153 if (lease.create_renew_timestamp + override_lease_duration) < now: 154 expire_lease() 150 155 151 152 153 156 The value of this setting is a "duration string", which is a number of days, 157 months, or years, followed by a units suffix, and optionally separated by a 158 space, such as one of the following: 154 159 155 7days156 31day157 60 days158 2mo159 3 month160 12 months161 2years160 7days 161 31day 162 60 days 163 2mo 164 3 month 165 12 months 166 2years 162 167 163 164 165 166 167 168 168 This key is meant to compensate for the fact that clients do not yet have 169 the ability to ask for leases that last longer than 31 days. A grid which 170 wants to use faster or slower GC than a 31-day lease timer permits can use 171 this parameter to implement it. The current fixed 31-day lease duration 172 makes the server behave as if "lease.override_lease_duration = 31days" had 173 been passed. 169 174 170 171 172 175 This key is only valid when age-based expiration is in use (i.e. when 176 "expire.mode = age" is used). It will be rejected if cutoff-date expiration 177 is in use. 173 178 174 expire.cutoff_date = (date string, required if mode=cutoff-date)179 expire.cutoff_date = (date string, required if mode=cutoff-date) 175 180 176 177 178 181 When cutoff-date expiration is in use, a lease will be expired if its 182 create/renew timestamp is older than the cutoff date. This string will be a 183 date in the following format: 179 184 180 2009-01-16 (January 16th, 2009)181 2008-02-02182 2007-12-25185 2009-01-16 (January 16th, 2009) 186 2008-02-02 187 2007-12-25 183 188 184 185 186 187 189 The actual cutoff time shall be midnight UTC at the beginning of the given 190 day. Lease timers should naturally be generous enough to not depend upon 191 differences in timezone: there should be at least a few days between the 192 last renewal time and the cutoff date. 188 193 189 190 191 194 This key is only valid when cutoff-based expiration is in use (i.e. when 195 "expire.mode = cutoff-date"). It will be rejected if age-based expiration is 196 in use. 192 197 193 expire.immutable = (boolean, optional)198 expire.immutable = (boolean, optional) 194 199 195 196 197 200 If this is False, then immutable shares will never be deleted, even if their 201 leases have expired. This can be used in special situations to perform GC on 202 mutable files but not immutable ones. The default is True. 198 203 199 expire.mutable = (boolean, optional)204 expire.mutable = (boolean, optional) 200 205 201 202 203 206 If this is False, then mutable shares will never be deleted, even if their 207 leases have expired. This can be used in special situations to perform GC on 208 immutable files but not mutable ones. The default is True. 204 209 205 == Expiration Progress == 210 Expiration Progress 211 =================== 206 212 207 213 In the current release, leases are stored as metadata in each share file, and 208 214 no separate database is maintained. As a result, checking and expiring leases … … 229 235 crawler can be forcibly reset by stopping the node, deleting these two files, 230 236 then restarting the node. 231 237 232 == Future Directions == 238 Future Directions 239 ================= 233 240 234 241 Tahoe's GC mechanism is undergoing significant changes. The global 235 242 mark-and-sweep garbage-collection scheme can require considerable network -
docs/helper.txt
diff -rN -u old-tahoe-lafs/docs/helper.txt new-tahoe-lafs/docs/helper.txt
old new 1 = The Tahoe Upload Helper = 1 ======================= 2 The Tahoe Upload Helper 3 ======================= 4 5 1. `Overview`_ 6 2. `Setting Up A Helper`_ 7 3. `Using a Helper`_ 8 4. `Other Helper Modes`_ 2 9 3 1. Overview 4 2. Setting Up A Helper 5 3. Using a Helper 6 4. Other Helper Modes 7 8 == Overview == 10 Overview 11 ======== 9 12 10 13 As described in the "SWARMING DOWNLOAD, TRICKLING UPLOAD" section of 11 14 architecture.txt, Tahoe uploads require more bandwidth than downloads: you … … 45 48 other applications that are sharing the same uplink to compete more evenly 46 49 for the limited bandwidth. 47 50 48 49 50 == Setting Up A Helper == 51 Setting Up A Helper 52 =================== 51 53 52 54 Who should consider running a helper? 53 55 54 55 56 57 58 59 56 * Benevolent entities which wish to provide better upload speed for clients 57 that have slow uplinks 58 * Folks which have machines with upload bandwidth to spare. 59 * Server grid operators who want clients to connect to a small number of 60 helpers rather than a large number of storage servers (a "multi-tier" 61 architecture) 60 62 61 63 What sorts of machines are good candidates for running a helper? 62 64 63 64 65 66 67 68 69 70 71 72 65 * The Helper needs to have good bandwidth to the storage servers. In 66 particular, it needs to have at least 3.3x better upload bandwidth than 67 the client does, or the client might as well upload directly to the 68 storage servers. In a commercial grid, the helper should be in the same 69 colo (and preferably in the same rack) as the storage servers. 70 * The Helper will take on most of the CPU load involved in uploading a file. 71 So having a dedicated machine will give better results. 72 * The Helper buffers ciphertext on disk, so the host will need at least as 73 much free disk space as there will be simultaneous uploads. When an upload 74 is interrupted, that space will be used for a longer period of time. 73 75 74 76 To turn a Tahoe-LAFS node into a helper (i.e. to run a helper service in 75 77 addition to whatever else that node is doing), edit the tahoe.cfg file in your … … 82 84 helper: you will need to give this FURL to any clients that wish to use your 83 85 helper. 84 86 85 cat $BASEDIR/private/helper.furl |mail -s "helper furl" friend@example.com 87 :: 88 89 cat $BASEDIR/private/helper.furl | mail -s "helper furl" friend@example.com 86 90 87 91 You can tell if your node is running a helper by looking at its web status 88 92 page. Assuming that you've set up the 'webport' to use port 3456, point your … … 105 109 files in these directories that have not been modified for a week or two. 106 110 Future versions of tahoe will try to self-manage these files a bit better. 107 111 108 == Using a Helper == 112 Using a Helper 113 ============== 109 114 110 115 Who should consider using a Helper? 111 116 112 113 114 115 116 117 118 119 120 121 117 * clients with limited upstream bandwidth, such as a consumer ADSL line 118 * clients who believe that the helper will give them faster uploads than 119 they could achieve with a direct upload 120 * clients who experience problems with TCP connection fairness: if other 121 programs or machines in the same home are getting less than their fair 122 share of upload bandwidth. If the connection is being shared fairly, then 123 a Tahoe upload that is happening at the same time as a single FTP upload 124 should get half the bandwidth. 125 * clients who have been given the helper.furl by someone who is running a 126 Helper and is willing to let them use it 122 127 123 128 To take advantage of somebody else's Helper, take the helper.furl file that 124 129 they give you, and copy it into your node's base directory, then restart the 125 130 node: 126 131 127 cat email >$BASEDIR/helper.furl 128 tahoe restart $BASEDIR 132 :: 133 134 cat email >$BASEDIR/helper.furl 135 tahoe restart $BASEDIR 129 136 130 137 This will signal the client to try and connect to the helper. Subsequent 131 138 uploads will use the helper rather than using direct connections to the … … 146 153 The upload/download status page (http://localhost:3456/status) will announce 147 154 the using-helper-or-not state of each upload, in the "Helper?" column. 148 155 149 == Other Helper Modes == 156 Other Helper Modes 157 ================== 150 158 151 159 The Tahoe Helper only currently helps with one kind of operation: uploading 152 160 immutable files. There are three other things it might be able to help with 153 161 in the future: 154 162 155 156 157 163 * downloading immutable files 164 * uploading mutable files (such as directories) 165 * downloading mutable files (like directories) 158 166 159 167 Since mutable files are currently limited in size, the ADSL upstream penalty 160 168 is not so severe for them. There is no ADSL penalty to downloads, but there -
docs/known_issues.txt
diff -rN -u old-tahoe-lafs/docs/known_issues.txt new-tahoe-lafs/docs/known_issues.txt
old new 1 = known issues = 1 ============ 2 Known issues 3 ============ 4 5 * `Overview`_ 6 * `Issues in Tahoe-LAFS v1.8.0, released 2010-09-23` 7 8 * `Potential unauthorized access by JavaScript in unrelated files`_ 9 * `Potential disclosure of file through embedded hyperlinks or JavaScript in that file`_ 10 * `Command-line arguments are leaked to other local users`_ 11 * `Capabilities may be leaked to web browser phishing filter / "safe browsing" servers`_ 12 * `Known issues in the FTP and SFTP frontends`_ 2 13 3 * overview 4 * issues in Tahoe-LAFS v1.8.0, released 2010-09-23 5 - potential unauthorized access by JavaScript in unrelated files 6 - potential disclosure of file through embedded hyperlinks or JavaScript in that file 7 - command-line arguments are leaked to other local users 8 - capabilities may be leaked to web browser phishing filter / "safe browsing" servers === 9 - known issues in the FTP and SFTP frontends === 10 11 == overview == 14 Overview 15 ======== 12 16 13 17 Below is a list of known issues in recent releases of Tahoe-LAFS, and how to 14 18 manage them. The current version of this file can be found at … … 21 25 22 26 http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/historical/historical_known_issues.txt 23 27 24 == issues in Tahoe-LAFS v1.8.0, released 2010-09-18 == 28 Issues in Tahoe-LAFS v1.8.0, released 2010-09-23 29 ================================================ 25 30 26 === potential unauthorized access by JavaScript in unrelated files === 31 Potential unauthorized access by JavaScript in unrelated files 32 -------------------------------------------------------------- 27 33 28 34 If you view a file stored in Tahoe-LAFS through a web user interface, 29 35 JavaScript embedded in that file might be able to access other files or … … 33 39 have the ability to modify the contents of those files or directories, 34 40 then that script could modify or delete those files or directories. 35 41 36 ==== how to manage it ==== 42 how to manage it 43 ~~~~~~~~~~~~~~~~ 37 44 38 45 For future versions of Tahoe-LAFS, we are considering ways to close off 39 46 this leakage of authority while preserving ease of use -- the discussion 40 of this issue is ticket #615.47 of this issue is ticket `#615 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/615>`_. 41 48 42 49 For the present, either do not view files stored in Tahoe-LAFS through a 43 50 web user interface, or turn off JavaScript in your web browser before … … 45 52 malicious JavaScript. 46 53 47 54 48 === potential disclosure of file through embedded hyperlinks or JavaScript in that file === 55 Potential disclosure of file through embedded hyperlinks or JavaScript in that file 56 ----------------------------------------------------------------------------------- 49 57 50 58 If there is a file stored on a Tahoe-LAFS storage grid, and that file 51 59 gets downloaded and displayed in a web browser, then JavaScript or … … 61 69 browsers, so being careful which hyperlinks you click on is not 62 70 sufficient to prevent this from happening. 63 71 64 ==== how to manage it ==== 72 how to manage it 73 ~~~~~~~~~~~~~~~~ 65 74 66 75 For future versions of Tahoe-LAFS, we are considering ways to close off 67 76 this leakage of authority while preserving ease of use -- the discussion 68 of this issue is ticket #127.77 of this issue is ticket `#127 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/127>`_. 69 78 70 79 For the present, a good work-around is that if you want to store and 71 80 view a file on Tahoe-LAFS and you want that file to remain private, then … … 74 83 written to maliciously leak access. 75 84 76 85 77 === command-line arguments are leaked to other local users === 86 Command-line arguments are leaked to other local users 87 ------------------------------------------------------ 78 88 79 89 Remember that command-line arguments are visible to other users (through 80 90 the 'ps' command, or the windows Process Explorer tool), so if you are … … 83 93 arguments. This includes directory caps that you set up with the "tahoe 84 94 add-alias" command. 85 95 86 ==== how to manage it ==== 96 how to manage it 97 ~~~~~~~~~~~~~~~~ 87 98 88 99 As of Tahoe-LAFS v1.3.0 there is a "tahoe create-alias" command that does 89 100 the following technique for you. … … 91 102 Bypass add-alias and edit the NODEDIR/private/aliases file directly, by 92 103 adding a line like this: 93 104 94 fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa105 fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa 95 106 96 107 By entering the dircap through the editor, the command-line arguments 97 108 are bypassed, and other users will not be able to see them. Once you've … … 102 113 access to your files and directories. 103 114 104 115 105 === capabilities may be leaked to web browser phishing filter / "safe browsing" servers === 116 Capabilities may be leaked to web browser phishing filter / "safe browsing" servers 117 ----------------------------------------------------------------------------------- 106 118 107 119 Firefox, Internet Explorer, and Chrome include a "phishing filter" or 108 120 "safe browing" component, which is turned on by default, and which sends … … 134 146 version of this file stated that Firefox had abandoned their phishing 135 147 filter; this was incorrect. 136 148 137 ==== how to manage it ==== 149 how to manage it 150 ~~~~~~~~~~~~~~~~ 138 151 139 152 If you use any phishing filter or "safe browsing" feature, consider either 140 153 disabling it, or not using the WUI via that browser. Phishing filters have … … 143 156 or malware attackers have learnt how to bypass them. 144 157 145 158 To disable the filter in IE7 or IE8: 146 - Click Internet Options from the Tools menu. 147 - Click the Advanced tab. 148 - If an "Enable SmartScreen Filter" option is present, uncheck it. 149 If a "Use Phishing Filter" or "Phishing Filter" option is present, 150 set it to Disable. 151 - Confirm (click OK or Yes) out of all dialogs. 159 ```````````````````````````````````` 160 161 - Click Internet Options from the Tools menu. 162 163 - Click the Advanced tab. 164 165 - If an "Enable SmartScreen Filter" option is present, uncheck it. 166 If a "Use Phishing Filter" or "Phishing Filter" option is present, 167 set it to Disable. 168 169 - Confirm (click OK or Yes) out of all dialogs. 152 170 153 171 If you have a version of IE that splits the settings between security 154 172 zones, do this for all zones. 155 173 156 174 To disable the filter in Firefox: 157 - Click Options from the Tools menu. 158 - Click the Security tab. 159 - Uncheck both the "Block reported attack sites" and "Block reported 160 web forgeries" options. 161 - Click OK. 175 ````````````````````````````````` 176 177 - Click Options from the Tools menu. 178 179 - Click the Security tab. 180 181 - Uncheck both the "Block reported attack sites" and "Block reported 182 web forgeries" options. 183 184 - Click OK. 162 185 163 186 To disable the filter in Chrome: 164 - Click Options from the Tools menu. 165 - Click the "Under the Hood" tab and find the "Privacy" section. 166 - Uncheck the "Enable phishing and malware protection" option. 167 - Click Close. 187 ```````````````````````````````` 188 189 - Click Options from the Tools menu. 190 191 - Click the "Under the Hood" tab and find the "Privacy" section. 192 193 - Uncheck the "Enable phishing and malware protection" option. 194 195 - Click Close. 168 196 169 197 170 === known issues in the FTP and SFTP frontends === 198 Known issues in the FTP and SFTP frontends 199 ------------------------------------------ 171 200 172 201 These are documented in docs/frontends/FTP-and-SFTP.txt and at 173 202 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>. -
docs/logging.txt
diff -rN -u old-tahoe-lafs/docs/logging.txt new-tahoe-lafs/docs/logging.txt
old new 1 = Tahoe Logging = 1 ============= 2 Tahoe Logging 3 ============= 4 5 1. `Overview`_ 6 2. `Realtime Logging`_ 7 3. `Incidents`_ 8 4. `Working with flogfiles`_ 9 5. `Gatherers`_ 10 11 1. `Incident Gatherer`_ 12 2. `Log Gatherer`_ 13 14 6. `Local twistd.log files`_ 15 7. `Adding log messages`_ 16 8. `Log Messages During Unit Tests`_ 2 17 3 1. Overview 4 2. Realtime Logging 5 3. Incidents 6 4. Working with flogfiles 7 5. Gatherers 8 5.1. Incident Gatherer 9 5.2. Log Gatherer 10 6. Local twistd.log files 11 7. Adding log messages 12 8. Log Messages During Unit Tests 13 14 == Overview == 18 Overview 19 ======== 15 20 16 21 Tahoe uses the Foolscap logging mechanism (known as the "flog" subsystem) to 17 22 record information about what is happening inside the Tahoe node. This is … … 26 31 /usr/bin/flogtool) which is used to get access to many foolscap logging 27 32 features. 28 33 29 == Realtime Logging == 34 Realtime Logging 35 ================ 30 36 31 37 When you are working on Tahoe code, and want to see what the node is doing, 32 38 the easiest tool to use is "flogtool tail". This connects to the tahoe node … … 37 43 BASEDIR/private/logport.furl . The following command will connect to this 38 44 port and start emitting log information: 39 45 40 flogtool tail BASEDIR/private/logport.furl46 flogtool tail BASEDIR/private/logport.furl 41 47 42 48 The "--save-to FILENAME" option will save all received events to a file, 43 49 where then can be examined later with "flogtool dump" or "flogtool … … 45 51 before subscribing to new ones (without --catch-up, you will only hear about 46 52 events that occur after the tool has connected and subscribed). 47 53 48 == Incidents == 54 Incidents 55 ========= 49 56 50 57 Foolscap keeps a short list of recent events in memory. When something goes 51 58 wrong, it writes all the history it has (and everything that gets logged in … … 72 79 parent/child relationships of log events is displayed in a nested format. 73 80 "flogtool web-viewer" is still fairly immature. 74 81 75 == Working with flogfiles == 82 Working with flogfiles 83 ====================== 76 84 77 85 The "flogtool filter" command can be used to take a large flogfile (perhaps 78 86 one created by the log-gatherer, see below) and copy a subset of events into … … 85 93 were emitted with a given facility (like foolscap.negotiation or 86 94 tahoe.upload). 87 95 88 == Gatherers == 96 Gatherers 97 ========= 89 98 90 99 In a deployed Tahoe grid, it is useful to get log information automatically 91 100 transferred to a central log-gatherer host. This offloads the (admittedly … … 101 110 The gatherer will write to files in its working directory, which can then be 102 111 examined with tools like "flogtool dump" as described above. 103 112 104 === Incident Gatherer === 113 Incident Gatherer 114 ----------------- 105 115 106 116 The "incident gatherer" only collects Incidents: records of the log events 107 117 that occurred just before and slightly after some high-level "trigger event" … … 120 130 "gatherer.tac" file should be modified to add classifier functions. 121 131 122 132 The incident gatherer writes incident names (which are simply the relative 123 pathname of the incident- *.flog.bz2 file) into classified/CATEGORY. For133 pathname of the incident-\*.flog.bz2 file) into classified/CATEGORY. For 124 134 example, the classified/mutable-retrieve-uncoordinated-write-error file 125 135 contains a list of all incidents which were triggered by an uncoordinated 126 136 write that was detected during mutable file retrieval (caused when somebody … … 145 155 node which generated it to the gatherer. The gatherer will automatically 146 156 catch up to any incidents which occurred while it is offline. 147 157 148 === Log Gatherer === 158 Log Gatherer 159 ------------ 149 160 150 161 The "Log Gatherer" subscribes to hear about every single event published by 151 162 the connected nodes, regardless of severity. This server writes these log … … 172 183 the outbound queue grows too large. When this occurs, there will be gaps 173 184 (non-sequential event numbers) in the log-gatherer's flogfiles. 174 185 175 == Local twistd.log files == 186 Local twistd.log files 187 ====================== 176 188 177 189 [TODO: not yet true, requires foolscap-0.3.1 and a change to allmydata.node] 178 190 … … 188 200 (i.e. not the log.NOISY debugging events). In addition, foolscap internal 189 201 events (like connection negotiation messages) are not bridged to twistd.log . 190 202 191 == Adding log messages == 203 Adding log messages 204 =================== 192 205 193 206 When adding new code, the Tahoe developer should add a reasonable number of 194 207 new log events. For details, please see the Foolscap logging documentation, 195 208 but a few notes are worth stating here: 196 209 197 210 * use a facility prefix of "tahoe.", like "tahoe.mutable.publish" 198 211 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 212 * assign each severe (log.WEIRD or higher) event a unique message 213 identifier, as the umid= argument to the log.msg() call. The 214 misc/coding_tools/make_umid script may be useful for this purpose. This will make it 215 easier to write a classification function for these messages. 216 217 * use the parent= argument whenever the event is causally/temporally 218 clustered with its parent. For example, a download process that involves 219 three sequential hash fetches could announce the send and receipt of those 220 hash-fetch messages with a parent= argument that ties them to the overall 221 download process. However, each new wapi download request should be 222 unparented. 223 224 * use the format= argument in preference to the message= argument. E.g. 225 use log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k) instead of 226 log.msg("got %d shares, need %d" % (n,k)). This will allow later tools to 227 analyze the event without needing to scrape/reconstruct the structured 228 data out of the formatted string. 229 230 * Pass extra information as extra keyword arguments, even if they aren't 231 included in the format= string. This information will be displayed in the 232 "flogtool dump --verbose" output, as well as being available to other 233 tools. The umid= argument should be passed this way. 234 235 * use log.err for the catch-all addErrback that gets attached to the end of 236 any given Deferred chain. When used in conjunction with LOGTOTWISTED=1, 237 log.err() will tell Twisted about the error-nature of the log message, 238 causing Trial to flunk the test (with an "ERROR" indication that prints a 239 copy of the Failure, including a traceback). Don't use log.err for events 240 that are BAD but handled (like hash failures: since these are often 241 deliberately provoked by test code, they should not cause test failures): 242 use log.msg(level=BAD) for those instead. 230 243 231 244 232 == Log Messages During Unit Tests == 245 Log Messages During Unit Tests 246 ============================== 233 247 234 248 If a test is failing and you aren't sure why, start by enabling 235 249 FLOGTOTWISTED=1 like this: 236 250 237 make test FLOGTOTWISTED=1251 make test FLOGTOTWISTED=1 238 252 239 253 With FLOGTOTWISTED=1, sufficiently-important log events will be written into 240 254 _trial_temp/test.log, which may give you more ideas about why the test is … … 246 260 If that isn't enough, look at the detailed foolscap logging messages instead, 247 261 by running the tests like this: 248 262 249 make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1263 make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 250 264 251 265 The first environment variable will cause foolscap log events to be written 252 266 to ./flog.out.bz2 (instead of merely being recorded in the circular buffers -
docs/performance.txt
diff -rN -u old-tahoe-lafs/docs/performance.txt new-tahoe-lafs/docs/performance.txt
old new 1 = Performance costs for some common operations = 1 ============================================ 2 Performance costs for some common operations 3 ============================================ 4 5 1. `Publishing an A-byte immutable file`_ 6 2. `Publishing an A-byte mutable file`_ 7 3. `Downloading B bytes of an A-byte immutable file`_ 8 4. `Downloading B bytes of an A-byte mutable file`_ 9 5. `Modifying B bytes of an A-byte mutable file`_ 10 6. `Inserting/Removing B bytes in an A-byte mutable file`_ 11 7. `Adding an entry to an A-entry directory`_ 12 8. `Listing an A entry directory`_ 13 9. `Performing a file-check on an A-byte file`_ 14 10. `Performing a file-verify on an A-byte file`_ 15 11. `Repairing an A-byte file (mutable or immutable)`_ 2 16 3 1. Publishing an A-byte immutable file 4 2. Publishing an A-byte mutable file 5 3. Downloading B bytes of an A-byte immutable file 6 4. Downloading B bytes of an A-byte mutable file 7 5. Modifying B bytes of an A-byte mutable file 8 6. Inserting/Removing B bytes in an A-byte mutable file 9 7. Adding an entry to an A-entry directory 10 8. Listing an A entry directory 11 9. Performing a file-check on an A-byte file 12 10. Performing a file-verify on an A-byte file 13 11. Repairing an A-byte file (mutable or immutable) 14 15 == Publishing an A-byte immutable file == 17 Publishing an ``A``-byte immutable file 18 ======================================= 16 19 17 20 network: A 21 18 22 memory footprint: N/k*128KiB 19 23 20 24 notes: An immutable file upload requires an additional I/O pass over the entire 21 22 23 25 source file before the upload process can start, since convergent 26 encryption derives the encryption key in part from the contents of the 27 source file. 24 28 25 == Publishing an A-byte mutable file == 29 Publishing an ``A``-byte mutable file 30 ===================================== 26 31 27 32 network: A 33 28 34 memory footprint: N/k*A 35 29 36 cpu: O(A) + a large constant for RSA keypair generation 30 37 31 notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that 32 it publishes to a grid. This takes up to 1 or 2 seconds on a 33 typical desktop PC. 34 35 Part of the process of encrypting, encoding, and uploading a 36 mutable file to a Tahoe-LAFS grid requires that the entire file 37 be in memory at once. For larger files, this may cause 38 Tahoe-LAFS to have an unacceptably large memory footprint (at 39 least when uploading a mutable file). 38 notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that it 39 publishes to a grid. This takes up to 1 or 2 seconds on a typical desktop PC. 40 40 41 == Downloading B bytes of an A-byte immutable file == 41 Part of the process of encrypting, encoding, and uploading a mutable file to a 42 Tahoe-LAFS grid requires that the entire file be in memory at once. For larger 43 files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint 44 (at least when uploading a mutable file). 45 46 Downloading ``B`` bytes of an ``A``-byte immutable file 47 ======================================================= 42 48 43 49 network: B 50 44 51 memory footprint: 128KiB 45 52 46 53 notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary range 47 48 54 of an immutable file, only the 128-KiB segments that overlap the 55 requested range will be downloaded. 49 56 50 51 52 57 (Earlier versions would download from the beginning of the file up 58 until the end of the requested range, and then continue to download 59 the rest of the file even after the request was satisfied.) 53 60 54 == Downloading B bytes of an A-byte mutable file == 61 Downloading ``B`` bytes of an ``A``-byte mutable file 62 ===================================================== 55 63 56 64 network: A 65 57 66 memory footprint: A 58 67 59 68 notes: As currently implemented, mutable files must be downloaded in 60 61 69 their entirety before any part of them can be read. We are 70 exploring fixes for this; see ticket #393 for more information. 62 71 63 == Modifying B bytes of an A-byte mutable file == 72 Modifying ``B`` bytes of an ``A``-byte mutable file 73 =================================================== 64 74 65 75 network: A 76 66 77 memory footprint: N/k*A 67 78 68 79 notes: If you upload a changed version of a mutable file that you 69 70 71 72 73 74 80 earlier put onto your grid with, say, 'tahoe put --mutable', 81 Tahoe-LAFS will replace the old file with the new file on the 82 grid, rather than attempting to modify only those portions of the 83 file that have changed. Modifying a file in this manner is 84 essentially uploading the file over again, except that it re-uses 85 the existing RSA keypair instead of generating a new one. 75 86 76 == Inserting/Removing B bytes in an A-byte mutable file == 87 Inserting/Removing ``B`` bytes in an ``A``-byte mutable file 88 ============================================================ 77 89 78 90 network: A 91 79 92 memory footprint: N/k*A 80 93 81 94 notes: Modifying any part of a mutable file in Tahoe-LAFS requires that 82 83 84 85 86 87 88 89 95 the entire file be downloaded, modified, held in memory while it is 96 encrypted and encoded, and then re-uploaded. A future version of the 97 mutable file layout ("LDMF") may provide efficient inserts and 98 deletes. Note that this sort of modification is mostly used internally 99 for directories, and isn't something that the WUI, CLI, or other 100 interfaces will do -- instead, they will simply overwrite the file to 101 be modified, as described in "Modifying B bytes of an A-byte mutable 102 file". 90 103 91 == Adding an entry to an A-entry directory == 104 Adding an entry to an ``A``-entry directory 105 =========================================== 92 106 93 107 network: O(A) 108 94 109 memory footprint: N/k*A 95 110 96 111 notes: In Tahoe-LAFS, directories are implemented as specialized mutable 97 98 112 files. So adding an entry to a directory is essentially adding B 113 (actually, 300-330) bytes somewhere in an existing mutable file. 99 114 100 == Listing an A entry directory == 115 Listing an ``A`` entry directory 116 ================================ 101 117 102 118 network: O(A) 119 103 120 memory footprint: N/k*A 104 121 105 122 notes: Listing a directory requires that the mutable file storing the 106 107 108 123 directory be downloaded from the grid. So listing an A entry 124 directory requires downloading a (roughly) 330 * A byte mutable 125 file, since each directory entry is about 300-330 bytes in size. 109 126 110 == Performing a file-check on an A-byte file == 127 Performing a file-check on an ``A``-byte file 128 ============================================= 111 129 112 130 network: O(S), where S is the number of servers on your grid 131 113 132 memory footprint: negligible 114 133 115 134 notes: To check a file, Tahoe-LAFS queries all the servers that it knows 116 117 118 135 about. Note that neither of these values directly depend on the size 136 of the file. This is relatively inexpensive, compared to the verify 137 and repair operations. 119 138 120 == Performing a file-verify on an A-byte file == 139 Performing a file-verify on an ``A``-byte file 140 ============================================== 121 141 122 142 network: N/k*A 143 123 144 memory footprint: N/k*128KiB 124 145 125 146 notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext 126 127 128 129 147 shares that were originally uploaded to the grid and integrity 148 checks them. This is, for well-behaved grids, likely to be more 149 expensive than downloading an A-byte file, since only a fraction 150 of these shares are necessary to recover the file. 130 151 131 == Repairing an A-byte file (mutable or immutable) == 152 Repairing an ``A``-byte file (mutable or immutable) 153 =================================================== 132 154 133 155 network: variable; up to around O(A) 156 134 157 memory footprint: from 128KiB to (1+N/k)*128KiB 135 158 136 159 notes: To repair a file, Tahoe-LAFS downloads the file, and generates/uploads 137 138 139 160 missing shares in the same way as when it initially uploads the file. 161 So, depending on how many shares are missing, this can be about as 162 expensive as initially uploading the file in the first place. -
docs/stats.txt
diff -rN -u old-tahoe-lafs/docs/stats.txt new-tahoe-lafs/docs/stats.txt
old new 1 = Tahoe Statistics = 1 ================ 2 Tahoe Statistics 3 ================ 4 5 1. `Overview`_ 6 2. `Statistics Categories`_ 7 3. `Running a Tahoe Stats-Gatherer Service`_ 8 4. `Using Munin To Graph Stats Values`_ 2 9 3 1. Overview 4 2. Statistics Categories 5 3. Running a Tahoe Stats-Gatherer Service 6 4. Using Munin To Graph Stats Values 7 8 == Overview == 10 Overview 11 ======== 9 12 10 13 Each Tahoe node collects and publishes statistics about its operations as it 11 14 runs. These include counters of how many files have been uploaded and … … 20 23 block, along with a copy of the raw counters. To obtain just the raw counters 21 24 (in JSON format), use /statistics?t=json instead. 22 25 23 == Statistics Categories == 26 Statistics Categories 27 ===================== 24 28 25 29 The stats dictionary contains two keys: 'counters' and 'stats'. 'counters' 26 30 are strictly counters: they are reset to zero when the node is started, and … … 35 39 36 40 The currently available stats (as of release 1.6.0 or so) are described here: 37 41 38 counters.storage_server.*: this group counts inbound storage-server 39 operations. They are not provided by client-only 40 nodes which have been configured to not run a 41 storage server (with [storage]enabled=false in 42 tahoe.cfg) 43 allocate, write, close, abort: these are for immutable file uploads. 44 'allocate' is incremented when a client asks 45 if it can upload a share to the server. 46 'write' is incremented for each chunk of 47 data written. 'close' is incremented when 48 the share is finished. 'abort' is 49 incremented if the client abandons the 50 uploaed. 51 get, read: these are for immutable file downloads. 'get' is incremented 52 when a client asks if the server has a specific share. 'read' is 53 incremented for each chunk of data read. 54 readv, writev: these are for immutable file creation, publish, and 55 retrieve. 'readv' is incremented each time a client reads 56 part of a mutable share. 'writev' is incremented each time a 57 client sends a modification request. 58 add-lease, renew, cancel: these are for share lease modifications. 59 'add-lease' is incremented when an 'add-lease' 60 operation is performed (which either adds a new 61 lease or renews an existing lease). 'renew' is 62 for the 'renew-lease' operation (which can only 63 be used to renew an existing one). 'cancel' is 64 used for the 'cancel-lease' operation. 65 bytes_freed: this counts how many bytes were freed when a 'cancel-lease' 66 operation removed the last lease from a share and the share 67 was thus deleted. 68 bytes_added: this counts how many bytes were consumed by immutable share 69 uploads. It is incremented at the same time as the 'close' 70 counter. 71 72 stats.storage_server.*: 73 allocated: this counts how many bytes are currently 'allocated', which 74 tracks the space that will eventually be consumed by immutable 75 share upload operations. The stat is increased as soon as the 76 upload begins (at the same time the 'allocated' counter is 77 incremented), and goes back to zero when the 'close' or 'abort' 78 message is received (at which point the 'disk_used' stat should 79 incremented by the same amount). 80 disk_total 81 disk_used 82 disk_free_for_root 83 disk_free_for_nonroot 84 disk_avail 85 reserved_space: these all reflect disk-space usage policies and status. 86 'disk_total' is the total size of disk where the storage 87 server's BASEDIR/storage/shares directory lives, as reported 88 by /bin/df or equivalent. 'disk_used', 'disk_free_for_root', 89 and 'disk_free_for_nonroot' show related information. 90 'reserved_space' reports the reservation configured by the 91 tahoe.cfg [storage]reserved_space value. 'disk_avail' 92 reports the remaining disk space available for the Tahoe 93 server after subtracting reserved_space from disk_avail. All 94 values are in bytes. 95 accepting_immutable_shares: this is '1' if the storage server is currently 96 accepting uploads of immutable shares. It may be 97 '0' if a server is disabled by configuration, or 98 if the disk is full (i.e. disk_avail is less 99 than reserved_space). 100 total_bucket_count: this counts the number of 'buckets' (i.e. unique 101 storage-index values) currently managed by the storage 102 server. It indicates roughly how many files are managed 103 by the server. 104 latencies.*.*: these stats keep track of local disk latencies for 105 storage-server operations. A number of percentile values are 106 tracked for many operations. For example, 107 'storage_server.latencies.readv.50_0_percentile' records the 108 median response time for a 'readv' request. All values are in 109 seconds. These are recorded by the storage server, starting 110 from the time the request arrives (post-deserialization) and 111 ending when the response begins serialization. As such, they 112 are mostly useful for measuring disk speeds. The operations 113 tracked are the same as the counters.storage_server.* counter 114 values (allocate, write, close, get, read, add-lease, renew, 115 cancel, readv, writev). The percentile values tracked are: 116 mean, 01_0_percentile, 10_0_percentile, 50_0_percentile, 117 90_0_percentile, 95_0_percentile, 99_0_percentile, 118 99_9_percentile. (the last value, 99.9 percentile, means that 119 999 out of the last 1000 operations were faster than the 120 given number, and is the same threshold used by Amazon's 121 internal SLA, according to the Dynamo paper). 122 123 counters.uploader.files_uploaded 124 counters.uploader.bytes_uploaded 125 counters.downloader.files_downloaded 126 counters.downloader.bytes_downloaded 127 128 These count client activity: a Tahoe client will increment these when it 129 uploads or downloads an immutable file. 'files_uploaded' is incremented by 130 one for each operation, while 'bytes_uploaded' is incremented by the size of 131 the file. 132 133 counters.mutable.files_published 134 counters.mutable.bytes_published 135 counters.mutable.files_retrieved 136 counters.mutable.bytes_retrieved 42 **counters.storage_server.\*** 43 44 this group counts inbound storage-server operations. They are not provided 45 by client-only nodes which have been configured to not run a storage server 46 (with [storage]enabled=false in tahoe.cfg) 47 48 allocate, write, close, abort 49 these are for immutable file uploads. 'allocate' is incremented when a 50 client asks if it can upload a share to the server. 'write' is 51 incremented for each chunk of data written. 'close' is incremented when 52 the share is finished. 'abort' is incremented if the client abandons 53 the upload. 54 55 get, read 56 these are for immutable file downloads. 'get' is incremented 57 when a client asks if the server has a specific share. 'read' is 58 incremented for each chunk of data read. 59 60 readv, writev 61 these are for immutable file creation, publish, and retrieve. 'readv' 62 is incremented each time a client reads part of a mutable share. 63 'writev' is incremented each time a client sends a modification 64 request. 65 66 add-lease, renew, cancel 67 these are for share lease modifications. 'add-lease' is incremented 68 when an 'add-lease' operation is performed (which either adds a new 69 lease or renews an existing lease). 'renew' is for the 'renew-lease' 70 operation (which can only be used to renew an existing one). 'cancel' 71 is used for the 'cancel-lease' operation. 72 73 bytes_freed 74 this counts how many bytes were freed when a 'cancel-lease' 75 operation removed the last lease from a share and the share 76 was thus deleted. 77 78 bytes_added 79 this counts how many bytes were consumed by immutable share 80 uploads. It is incremented at the same time as the 'close' 81 counter. 82 83 **stats.storage_server.\*** 84 85 allocated 86 this counts how many bytes are currently 'allocated', which 87 tracks the space that will eventually be consumed by immutable 88 share upload operations. The stat is increased as soon as the 89 upload begins (at the same time the 'allocated' counter is 90 incremented), and goes back to zero when the 'close' or 'abort' 91 message is received (at which point the 'disk_used' stat should 92 incremented by the same amount). 93 94 disk_total, disk_used, disk_free_for_root, disk_free_for_nonroot, disk_avail, reserved_space 95 these all reflect disk-space usage policies and status. 96 'disk_total' is the total size of disk where the storage 97 server's BASEDIR/storage/shares directory lives, as reported 98 by /bin/df or equivalent. 'disk_used', 'disk_free_for_root', 99 and 'disk_free_for_nonroot' show related information. 100 'reserved_space' reports the reservation configured by the 101 tahoe.cfg [storage]reserved_space value. 'disk_avail' 102 reports the remaining disk space available for the Tahoe 103 server after subtracting reserved_space from disk_avail. All 104 values are in bytes. 105 106 accepting_immutable_shares 107 this is '1' if the storage server is currently accepting uploads of 108 immutable shares. It may be '0' if a server is disabled by 109 configuration, or if the disk is full (i.e. disk_avail is less than 110 reserved_space). 111 112 total_bucket_count 113 this counts the number of 'buckets' (i.e. unique 114 storage-index values) currently managed by the storage 115 server. It indicates roughly how many files are managed 116 by the server. 117 118 latencies.*.* 119 these stats keep track of local disk latencies for 120 storage-server operations. A number of percentile values are 121 tracked for many operations. For example, 122 'storage_server.latencies.readv.50_0_percentile' records the 123 median response time for a 'readv' request. All values are in 124 seconds. These are recorded by the storage server, starting 125 from the time the request arrives (post-deserialization) and 126 ending when the response begins serialization. As such, they 127 are mostly useful for measuring disk speeds. The operations 128 tracked are the same as the counters.storage_server.* counter 129 values (allocate, write, close, get, read, add-lease, renew, 130 cancel, readv, writev). The percentile values tracked are: 131 mean, 01_0_percentile, 10_0_percentile, 50_0_percentile, 132 90_0_percentile, 95_0_percentile, 99_0_percentile, 133 99_9_percentile. (the last value, 99.9 percentile, means that 134 999 out of the last 1000 operations were faster than the 135 given number, and is the same threshold used by Amazon's 136 internal SLA, according to the Dynamo paper). 137 138 **counters.uploader.files_uploaded** 139 140 **counters.uploader.bytes_uploaded** 141 142 **counters.downloader.files_downloaded** 143 144 **counters.downloader.bytes_downloaded** 145 146 These count client activity: a Tahoe client will increment these when it 147 uploads or downloads an immutable file. 'files_uploaded' is incremented by 148 one for each operation, while 'bytes_uploaded' is incremented by the size of 149 the file. 150 151 **counters.mutable.files_published** 152 153 **counters.mutable.bytes_published** 154 155 **counters.mutable.files_retrieved** 156 157 **counters.mutable.bytes_retrieved** 137 158 138 159 These count client activity for mutable files. 'published' is the act of 139 160 changing an existing mutable file (or creating a brand-new mutable file). 140 161 'retrieved' is the act of reading its current contents. 141 162 142 counters.chk_upload_helper.* 163 **counters.chk_upload_helper.\*** 164 165 These count activity of the "Helper", which receives ciphertext from clients 166 and performs erasure-coding and share upload for files that are not already 167 in the grid. The code which implements these counters is in 168 src/allmydata/immutable/offloaded.py . 169 170 upload_requests 171 incremented each time a client asks to upload a file 172 upload_already_present: incremented when the file is already in the grid 173 174 upload_need_upload 175 incremented when the file is not already in the grid 176 177 resumes 178 incremented when the helper already has partial ciphertext for 179 the requested upload, indicating that the client is resuming an 180 earlier upload 181 182 fetched_bytes 183 this counts how many bytes of ciphertext have been fetched 184 from uploading clients 185 186 encoded_bytes 187 this counts how many bytes of ciphertext have been 188 encoded and turned into successfully-uploaded shares. If no 189 uploads have failed or been abandoned, encoded_bytes should 190 eventually equal fetched_bytes. 191 192 **stats.chk_upload_helper.\*** 193 194 These also track Helper activity: 195 196 active_uploads 197 how many files are currently being uploaded. 0 when idle. 198 199 incoming_count 200 how many cache files are present in the incoming/ directory, 201 which holds ciphertext files that are still being fetched 202 from the client 143 203 144 These count activity of the "Helper", which receives ciphertext from clients 145 and performs erasure-coding and share upload for files that are not already 146 in the grid. The code which implements these counters is in 147 src/allmydata/immutable/offloaded.py . 148 149 upload_requests: incremented each time a client asks to upload a file 150 upload_already_present: incremented when the file is already in the grid 151 upload_need_upload: incremented when the file is not already in the grid 152 resumes: incremented when the helper already has partial ciphertext for 153 the requested upload, indicating that the client is resuming an 154 earlier upload 155 fetched_bytes: this counts how many bytes of ciphertext have been fetched 156 from uploading clients 157 encoded_bytes: this counts how many bytes of ciphertext have been 158 encoded and turned into successfully-uploaded shares. If no 159 uploads have failed or been abandoned, encoded_bytes should 160 eventually equal fetched_bytes. 161 162 stats.chk_upload_helper.* 163 164 These also track Helper activity: 165 166 active_uploads: how many files are currently being uploaded. 0 when idle. 167 incoming_count: how many cache files are present in the incoming/ directory, 168 which holds ciphertext files that are still being fetched 169 from the client 170 incoming_size: total size of cache files in the incoming/ directory 171 incoming_size_old: total size of 'old' cache files (more than 48 hours) 172 encoding_count: how many cache files are present in the encoding/ directory, 173 which holds ciphertext files that are being encoded and 174 uploaded 175 encoding_size: total size of cache files in the encoding/ directory 176 encoding_size_old: total size of 'old' cache files (more than 48 hours) 177 178 stats.node.uptime: how many seconds since the node process was started 179 180 stats.cpu_monitor.*: 181 .1min_avg, 5min_avg, 15min_avg: estimate of what percentage of system CPU 182 time was consumed by the node process, over 183 the given time interval. Expressed as a 184 float, 0.0 for 0%, 1.0 for 100% 185 .total: estimate of total number of CPU seconds consumed by node since 186 the process was started. Ticket #472 indicates that .total may 187 sometimes be negative due to wraparound of the kernel's counter. 188 189 stats.load_monitor.*: 190 When enabled, the "load monitor" continually schedules a one-second 191 callback, and measures how late the response is. This estimates system load 192 (if the system is idle, the response should be on time). This is only 193 enabled if a stats-gatherer is configured. 204 incoming_size 205 total size of cache files in the incoming/ directory 194 206 195 .avg_load: average "load" value (seconds late) over the last minute196 .max_load: maximum "load" value over the last minute207 incoming_size_old 208 total size of 'old' cache files (more than 48 hours) 197 209 210 encoding_count 211 how many cache files are present in the encoding/ directory, 212 which holds ciphertext files that are being encoded and 213 uploaded 198 214 199 == Running a Tahoe Stats-Gatherer Service == 215 encoding_size 216 total size of cache files in the encoding/ directory 217 218 encoding_size_old 219 total size of 'old' cache files (more than 48 hours) 220 221 **stats.node.uptime** 222 how many seconds since the node process was started 223 224 **stats.cpu_monitor.\*** 225 226 1min_avg, 5min_avg, 15min_avg 227 estimate of what percentage of system CPU time was consumed by the 228 node process, over the given time interval. Expressed as a float, 0.0 229 for 0%, 1.0 for 100% 230 231 total 232 estimate of total number of CPU seconds consumed by node since 233 the process was started. Ticket #472 indicates that .total may 234 sometimes be negative due to wraparound of the kernel's counter. 235 236 **stats.load_monitor.\*** 237 238 When enabled, the "load monitor" continually schedules a one-second 239 callback, and measures how late the response is. This estimates system load 240 (if the system is idle, the response should be on time). This is only 241 enabled if a stats-gatherer is configured. 242 243 avg_load 244 average "load" value (seconds late) over the last minute 245 246 max_load 247 maximum "load" value over the last minute 248 249 250 Running a Tahoe Stats-Gatherer Service 251 ====================================== 200 252 201 253 The "stats-gatherer" is a simple daemon that periodically collects stats from 202 254 several tahoe nodes. It could be useful, e.g., in a production environment, … … 204 256 host. It merely gatherers statistics from many nodes into a single place: it 205 257 does not do any actual analysis. 206 258 207 The stats gatherer listens on a network port using the same Foolscap 259 The stats gatherer listens on a network port using the same Foolscap_ 208 260 connection library that Tahoe clients use to connect to storage servers. 209 261 Tahoe nodes can be configured to connect to the stats gatherer and publish 210 their stats on a periodic basis. ( in fact, what happens is that nodes connect262 their stats on a periodic basis. (In fact, what happens is that nodes connect 211 263 to the gatherer and offer it a second FURL which points back to the node's 212 264 "stats port", which the gatherer then uses to pull stats on a periodic basis. 213 265 The initial connection is flipped to allow the nodes to live behind NAT 214 boxes, as long as the stats-gatherer has a reachable IP address) 266 boxes, as long as the stats-gatherer has a reachable IP address.) 267 268 .. _Foolscap: http://foolscap.lothar.com/trac 215 269 216 270 The stats-gatherer is created in the same fashion as regular tahoe client 217 271 nodes and introducer nodes. Choose a base directory for the gatherer to live 218 272 in (but do not create the directory). Then run: 219 273 220 tahoe create-stats-gatherer $BASEDIR 274 :: 275 276 tahoe create-stats-gatherer $BASEDIR 221 277 222 278 and start it with "tahoe start $BASEDIR". Once running, the gatherer will 223 279 write a FURL into $BASEDIR/stats_gatherer.furl . … … 226 282 this FURL into the node's tahoe.cfg file, in a section named "[client]", 227 283 under a key named "stats_gatherer.furl", like so: 228 284 229 [client] 230 stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y 285 :: 286 287 [client] 288 stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y 231 289 232 290 or simply copy the stats_gatherer.furl file into the node's base directory 233 291 (next to the tahoe.cfg file): it will be interpreted in the same way. … … 256 314 total-disk-available number for the entire grid (however, the "disk watcher" 257 315 daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task). 258 316 259 == Using Munin To Graph Stats Values == 317 Using Munin To Graph Stats Values 318 ================================= 260 319 261 320 The misc/munin/ directory contains various plugins to graph stats for Tahoe 262 nodes. They are intended for use with the Munin system-management tool, which321 nodes. They are intended for use with the Munin_ system-management tool, which 263 322 typically polls target systems every 5 minutes and produces a web page with 264 323 graphs of various things over multiple time scales (last hour, last month, 265 324 last year). 266 325 326 .. _Munin: http://munin-monitoring.org/ 327 267 328 Most of the plugins are designed to pull stats from a single Tahoe node, and 268 329 are configured with the e.g. http://localhost:3456/statistics?t=json URL. The 269 330 "tahoe_stats" plugin is designed to read from the pickle file created by the