Ticket #1225: docs-txt-rst-conversion-ii.patch
File docs-txt-rst-conversion-ii.patch, 444.2 KB (added by p-static, at 2010-10-29T05:45:43Z) |
---|
-
new file docs/frontends/CLI.rst
diff --git a/docs/frontends/CLI.rst b/docs/frontends/CLI.rst new file mode 100644 index 0000000..743b887
- + 1 ====================== 2 The Tahoe CLI commands 3 ====================== 4 5 1. `Overview`_ 6 2. `CLI Command Overview`_ 7 3. `Node Management`_ 8 4. `Filesystem Manipulation`_ 9 10 1. `Starting Directories`_ 11 2. `Command Syntax Summary`_ 12 3. `Command Examples`_ 13 14 5. `Storage Grid Maintenance`_ 15 6. `Debugging`_ 16 17 18 Overview 19 ======== 20 21 Tahoe provides a single executable named "``tahoe``", which can be used to 22 create and manage client/server nodes, manipulate the filesystem, and perform 23 several debugging/maintenance tasks. 24 25 This executable lives in the source tree at "``bin/tahoe``". Once you've done a 26 build (by running "make"), ``bin/tahoe`` can be run in-place: if it discovers 27 that it is being run from within a Tahoe source tree, it will modify sys.path 28 as necessary to use all the source code and dependent libraries contained in 29 that tree. 30 31 If you've installed Tahoe (using "``make install``", or by installing a binary 32 package), then the tahoe executable will be available somewhere else, perhaps 33 in ``/usr/bin/tahoe``. In this case, it will use your platform's normal 34 PYTHONPATH search paths to find the tahoe code and other libraries. 35 36 37 CLI Command Overview 38 ==================== 39 40 The "``tahoe``" tool provides access to three categories of commands. 41 42 * node management: create a client/server node, start/stop/restart it 43 * filesystem manipulation: list files, upload, download, delete, rename 44 * debugging: unpack cap-strings, examine share files 45 46 To get a list of all commands, just run "``tahoe``" with no additional 47 arguments. "``tahoe --help``" might also provide something useful. 48 49 Running "``tahoe --version``" will display a list of version strings, starting 50 with the "allmydata" module (which contains the majority of the Tahoe 51 functionality) and including versions for a number of dependent libraries, 52 like Twisted, Foolscap, pycryptopp, and zfec. 53 54 55 Node Management 56 =============== 57 58 "``tahoe create-node [NODEDIR]``" is the basic make-a-new-node command. It 59 creates a new directory and populates it with files that will allow the 60 "``tahoe start``" command to use it later on. This command creates nodes that 61 have client functionality (upload/download files), web API services 62 (controlled by the 'webport' file), and storage services (unless 63 "--no-storage" is specified). 64 65 NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to 66 publishing a web server on port 3456 (limited to the loopback interface, at 67 127.0.0.1, to restrict access to other programs on the same host). All of the 68 other "``tahoe``" subcommands use corresponding defaults. 69 70 "``tahoe create-client [NODEDIR]``" creates a node with no storage service. 71 That is, it behaves like "``tahoe create-node --no-storage [NODEDIR]``". 72 (This is a change from versions prior to 1.6.0.) 73 74 "``tahoe create-introducer [NODEDIR]``" is used to create the Introducer node. 75 This node provides introduction services and nothing else. When started, this 76 node will produce an introducer.furl, which should be published to all 77 clients. 78 79 "``tahoe create-key-generator [NODEDIR]``" is used to create a special 80 "key-generation" service, which allows a client to offload their RSA key 81 generation to a separate process. Since RSA key generation takes several 82 seconds, and must be done each time a directory is created, moving it to a 83 separate process allows the first process (perhaps a busy wapi server) to 84 continue servicing other requests. The key generator exports a FURL that can 85 be copied into a node to enable this functionality. 86 87 "``tahoe run [NODEDIR]``" will start a previously-created node in the foreground. 88 89 "``tahoe start [NODEDIR]``" will launch a previously-created node. It will launch 90 the node into the background, using the standard Twisted "twistd" 91 daemon-launching tool. On some platforms (including Windows) this command is 92 unable to run a daemon in the background; in that case it behaves in the 93 same way as "``tahoe run``". 94 95 "``tahoe stop [NODEDIR]``" will shut down a running node. 96 97 "``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is 98 most often used by developers who have just modified the code and want to 99 start using their changes. 100 101 102 Filesystem Manipulation 103 ======================= 104 105 These commands let you exmaine a Tahoe filesystem, providing basic 106 list/upload/download/delete/rename/mkdir functionality. They can be used as 107 primitives by other scripts. Most of these commands are fairly thin wrappers 108 around wapi calls. 109 110 By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure 111 out which Tahoe node they should use. When the CLI command uses wapi calls, 112 it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that 113 provides a wapi port will write its URL into this file. If you want to use 114 a node on some other host, just create ~/.tahoe/ and copy that node's wapi 115 URL into this file, and the CLI commands will contact that node instead of a 116 local one. 117 118 These commands also use a table of "aliases" to figure out which directory 119 they ought to use a starting point. This is explained in more detail below. 120 121 As of Tahoe v1.7, passing non-ASCII characters to the CLI should work, 122 except on Windows. The command-line arguments are assumed to use the 123 character encoding specified by the current locale. 124 125 Starting Directories 126 -------------------- 127 128 As described in architecture.txt, the Tahoe distributed filesystem consists 129 of a collection of directories and files, each of which has a "read-cap" or a 130 "write-cap" (also known as a URI). Each directory is simply a table that maps 131 a name to a child file or directory, and this table is turned into a string 132 and stored in a mutable file. The whole set of directory and file "nodes" are 133 connected together into a directed graph. 134 135 To use this collection of files and directories, you need to choose a 136 starting point: some specific directory that we will refer to as a 137 "starting directory". For a given starting directory, the "``ls 138 [STARTING_DIR]:``" command would list the contents of this directory, 139 the "``ls [STARTING_DIR]:dir1``" command would look inside this directory 140 for a child named "dir1" and list its contents, "``ls 141 [STARTING_DIR]:dir1/subdir2``" would look two levels deep, etc. 142 143 Note that there is no real global "root" directory, but instead each 144 starting directory provides a different, possibly overlapping 145 perspective on the graph of files and directories. 146 147 Each tahoe node remembers a list of starting points, named "aliases", 148 in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8 149 encoded strings that stand in for a directory read- or write- cap. If 150 you use the command line "``ls``" without any "[STARTING_DIR]:" argument, 151 then it will use the default alias, which is "tahoe", therefore "``tahoe 152 ls``" has the same effect as "``tahoe ls tahoe:``". The same goes for the 153 other commands which can reasonably use a default alias: get, put, 154 mkdir, mv, and rm. 155 156 For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not 157 found in ~/.tahoe/private/aliases, the CLI will use the contents of 158 ~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting 159 point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if 160 necessary. However, once you've set a "tahoe:" alias with "``tahoe set-alias``", 161 that will override anything in the old root_dir.cap file. 162 163 The Tahoe CLI commands use the same filename syntax as scp and rsync 164 -- an optional "alias:" prefix, followed by the pathname or filename. 165 Some commands (like "tahoe cp") use the lack of an alias to mean that 166 you want to refer to a local file, instead of something from the tahoe 167 virtual filesystem. [TODO] Another way to indicate this is to start 168 the pathname with a dot, slash, or tilde. 169 170 When you're dealing a single starting directory, the "tahoe:" alias is 171 all you need. But when you want to refer to something that isn't yet 172 attached to the graph rooted at that starting directory, you need to 173 refer to it by its capability. The way to do that is either to use its 174 capability directory as an argument on the command line, or to add an 175 alias to it, with the "tahoe add-alias" command. Once you've added an 176 alias, you can use that alias as an argument to commands. 177 178 The best way to get started with Tahoe is to create a node, start it, then 179 use the following command to create a new directory and set it as your 180 "tahoe:" alias:: 181 182 tahoe create-alias tahoe 183 184 After that you can use "``tahoe ls tahoe:``" and 185 "``tahoe cp local.txt tahoe:``", and both will refer to the directory that 186 you've just created. 187 188 SECURITY NOTE: For users of shared systems 189 `````````````````````````````````````````` 190 191 Another way to achieve the same effect as the above "tahoe create-alias" 192 command is:: 193 194 tahoe add-alias tahoe `tahoe mkdir` 195 196 However, command-line arguments are visible to other users (through the 197 'ps' command, or the Windows Process Explorer tool), so if you are using a 198 tahoe node on a shared host, your login neighbors will be able to see (and 199 capture) any directory caps that you set up with the "``tahoe add-alias``" 200 command. 201 202 The "``tahoe create-alias``" command avoids this problem by creating a new 203 directory and putting the cap into your aliases file for you. Alternatively, 204 you can edit the NODEDIR/private/aliases file directly, by adding a line like 205 this:: 206 207 fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa 208 209 By entering the dircap through the editor, the command-line arguments are 210 bypassed, and other users will not be able to see them. Once you've added the 211 alias, no other secrets are passed through the command line, so this 212 vulnerability becomes less significant: they can still see your filenames and 213 other arguments you type there, but not the caps that Tahoe uses to permit 214 access to your files and directories. 215 216 217 Command Syntax Summary 218 ---------------------- 219 220 tahoe add-alias alias cap 221 222 tahoe create-alias alias 223 224 tahoe list-aliases 225 226 tahoe mkdir 227 228 tahoe mkdir [alias:]path 229 230 tahoe ls [alias:][path] 231 232 tahoe webopen [alias:][path] 233 234 tahoe put [--mutable] [localfrom:-] 235 236 tahoe put [--mutable] [localfrom:-] [alias:]to 237 238 tahoe put [--mutable] [localfrom:-] [alias:]subdir/to 239 240 tahoe put [--mutable] [localfrom:-] dircap:to 241 242 tahoe put [--mutable] [localfrom:-] dircap:./subdir/to 243 244 tahoe put [localfrom:-] mutable-file-writecap 245 246 tahoe get [alias:]from [localto:-] 247 248 tahoe cp [-r] [alias:]frompath [alias:]topath 249 250 tahoe rm [alias:]what 251 252 tahoe mv [alias:]from [alias:]to 253 254 tahoe ln [alias:]from [alias:]to 255 256 tahoe backup localfrom [alias:]to 257 258 Command Examples 259 ---------------- 260 261 ``tahoe mkdir`` 262 263 This creates a new empty unlinked directory, and prints its write-cap to 264 stdout. The new directory is not attached to anything else. 265 266 ``tahoe add-alias fun DIRCAP`` 267 268 An example would be:: 269 270 tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa 271 272 This creates an alias "fun:" and configures it to use the given directory 273 cap. Once this is done, "tahoe ls fun:" will list the contents of this 274 directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the 275 default "tahoe:" alias. 276 277 ``tahoe create-alias fun`` 278 279 This combines "``tahoe mkdir``" and "``tahoe add-alias``" into a single step. 280 281 ``tahoe list-aliases`` 282 283 This displays a table of all configured aliases. 284 285 ``tahoe mkdir subdir`` 286 287 ``tahoe mkdir /subdir`` 288 289 This both create a new empty directory and attaches it to your root with the 290 name "subdir". 291 292 ``tahoe ls`` 293 294 ``tahoe ls /`` 295 296 ``tahoe ls tahoe:`` 297 298 ``tahoe ls tahoe:/`` 299 300 All four list the root directory of your personal virtual filesystem. 301 302 ``tahoe ls subdir`` 303 304 This lists a subdirectory of your filesystem. 305 306 ``tahoe webopen`` 307 308 ``tahoe webopen tahoe:`` 309 310 ``tahoe webopen tahoe:subdir/`` 311 312 ``tahoe webopen subdir/`` 313 314 This uses the python 'webbrowser' module to cause a local web browser to 315 open to the web page for the given directory. This page offers interfaces to 316 add, dowlonad, rename, and delete files in the directory. If not given an 317 alias or path, opens "tahoe:", the root dir of the default alias. 318 319 ``tahoe put file.txt`` 320 321 ``tahoe put ./file.txt`` 322 323 ``tahoe put /tmp/file.txt`` 324 325 ``tahoe put ~/file.txt`` 326 327 These upload the local file into the grid, and prints the new read-cap to 328 stdout. The uploaded file is not attached to any directory. All one-argument 329 forms of "``tahoe put``" perform an unlinked upload. 330 331 ``tahoe put -`` 332 333 ``tahoe put`` 334 335 These also perform an unlinked upload, but the data to be uploaded is taken 336 from stdin. 337 338 ``tahoe put file.txt uploaded.txt`` 339 340 ``tahoe put file.txt tahoe:uploaded.txt`` 341 342 These upload the local file and add it to your root with the name 343 "uploaded.txt" 344 345 ``tahoe put file.txt subdir/foo.txt`` 346 347 ``tahoe put - subdir/foo.txt`` 348 349 ``tahoe put file.txt tahoe:subdir/foo.txt`` 350 351 ``tahoe put file.txt DIRCAP:./foo.txt`` 352 353 ``tahoe put file.txt DIRCAP:./subdir/foo.txt`` 354 355 These upload the named file and attach them to a subdirectory of the given 356 root directory, under the name "foo.txt". Note that to use a directory 357 write-cap instead of an alias, you must use ":./" as a separator, rather 358 than ":", to help the CLI parser figure out where the dircap ends. When the 359 source file is named "-", the contents are taken from stdin. 360 361 ``tahoe put file.txt --mutable`` 362 363 Create a new mutable file, fill it with the contents of file.txt, and print 364 the new write-cap to stdout. 365 366 ``tahoe put file.txt MUTABLE-FILE-WRITECAP`` 367 368 Replace the contents of the given mutable file with the contents of file.txt 369 and prints the same write-cap to stdout. 370 371 ``tahoe cp file.txt tahoe:uploaded.txt`` 372 373 ``tahoe cp file.txt tahoe:`` 374 375 ``tahoe cp file.txt tahoe:/`` 376 377 ``tahoe cp ./file.txt tahoe:`` 378 379 These upload the local file and add it to your root with the name 380 "uploaded.txt". 381 382 ``tahoe cp tahoe:uploaded.txt downloaded.txt`` 383 384 ``tahoe cp tahoe:uploaded.txt ./downloaded.txt`` 385 386 ``tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt`` 387 388 ``tahoe cp tahoe:uploaded.txt ~/downloaded.txt`` 389 390 This downloads the named file from your tahoe root, and puts the result on 391 your local filesystem. 392 393 ``tahoe cp tahoe:uploaded.txt fun:stuff.txt`` 394 395 This copies a file from your tahoe root to a different virtual directory, 396 set up earlier with "tahoe add-alias fun DIRCAP". 397 398 ``tahoe rm uploaded.txt`` 399 400 ``tahoe rm tahoe:uploaded.txt`` 401 402 This deletes a file from your tahoe root. 403 404 ``tahoe mv uploaded.txt renamed.txt`` 405 406 ``tahoe mv tahoe:uploaded.txt tahoe:renamed.txt`` 407 408 These rename a file within your tahoe root directory. 409 410 ``tahoe mv uploaded.txt fun:`` 411 412 ``tahoe mv tahoe:uploaded.txt fun:`` 413 414 ``tahoe mv tahoe:uploaded.txt fun:uploaded.txt`` 415 416 These move a file from your tahoe root directory to the virtual directory 417 set up earlier with "tahoe add-alias fun DIRCAP" 418 419 ``tahoe backup ~ work:backups`` 420 421 This command performs a full versioned backup of every file and directory 422 underneath your "~" home directory, placing an immutable timestamped 423 snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the 424 timestamp is in UTC, hence the "Z" suffix), and a link to the latest 425 snapshot in work:backups/Latest/ . This command uses a small SQLite database 426 known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to 427 remember which local files have been backed up already, and will avoid 428 uploading files that have already been backed up. It compares timestamps and 429 filesizes when making this comparison. It also re-uses existing directories 430 which have identical contents. This lets it run faster and reduces the 431 number of directories created. 432 433 If you reconfigure your client node to switch to a different grid, you 434 should delete the stale backupdb.sqlite file, to force "tahoe backup" to 435 upload all files to the new grid. 436 437 ``tahoe backup --exclude=*~ ~ work:backups`` 438 439 Same as above, but this time the backup process will ignore any 440 filename that will end with '~'. '--exclude' will accept any standard 441 unix shell-style wildcards, have a look at 442 http://docs.python.org/library/fnmatch.html for a more detailed 443 reference. You may give multiple '--exclude' options. Please pay 444 attention that the pattern will be matched against any level of the 445 directory tree, it's still impossible to specify absolute path exclusions. 446 447 ``tahoe backup --exclude-from=/path/to/filename ~ work:backups`` 448 449 '--exclude-from' is similar to '--exclude', but reads exclusion 450 patterns from '/path/to/filename', one per line. 451 452 ``tahoe backup --exclude-vcs ~ work:backups`` 453 454 This command will ignore any known file or directory that's used by 455 version control systems to store metadata. The excluded names are: 456 457 * CVS 458 * RCS 459 * SCCS 460 * .git 461 * .gitignore 462 * .cvsignore 463 * .svn 464 * .arch-ids 465 * {arch} 466 * =RELEASE-ID 467 * =meta-update 468 * =update 469 * .bzr 470 * .bzrignore 471 * .bzrtags 472 * .hg 473 * .hgignore 474 * _darcs 475 476 Storage Grid Maintenance 477 ======================== 478 479 ``tahoe manifest tahoe:`` 480 481 ``tahoe manifest --storage-index tahoe:`` 482 483 ``tahoe manifest --verify-cap tahoe:`` 484 485 ``tahoe manifest --repair-cap tahoe:`` 486 487 ``tahoe manifest --raw tahoe:`` 488 489 This performs a recursive walk of the given directory, visiting every file 490 and directory that can be reached from that point. It then emits one line to 491 stdout for each object it encounters. 492 493 The default behavior is to print the access cap string (like URI:CHK:.. or 494 URI:DIR2:..), followed by a space, followed by the full path name. 495 496 If --storage-index is added, each line will instead contain the object's 497 storage index. This (string) value is useful to determine which share files 498 (on the server) are associated with this directory tree. The --verify-cap 499 and --repair-cap options are similar, but emit a verify-cap and repair-cap, 500 respectively. If --raw is provided instead, the output will be a 501 JSON-encoded dictionary that includes keys for pathnames, storage index 502 strings, and cap strings. The last line of the --raw output will be a JSON 503 encoded deep-stats dictionary. 504 505 ``tahoe stats tahoe:`` 506 507 This performs a recursive walk of the given directory, visiting every file 508 and directory that can be reached from that point. It gathers statistics on 509 the sizes of the objects it encounters, and prints a summary to stdout. 510 511 512 Debugging 513 ========= 514 515 For a list of all debugging commands, use "tahoe debug". 516 517 "``tahoe debug find-shares STORAGEINDEX NODEDIRS..``" will look through one or 518 more storage nodes for the share files that are providing storage for the 519 given storage index. 520 521 "``tahoe debug catalog-shares NODEDIRS..``" will look through one or more 522 storage nodes and locate every single share they contain. It produces a report 523 on stdout with one line per share, describing what kind of share it is, the 524 storage index, the size of the file is used for, etc. It may be useful to 525 concatenate these reports from all storage hosts and use it to look for 526 anomalies. 527 528 "``tahoe debug dump-share SHAREFILE``" will take the name of a single share file 529 (as found by "tahoe find-shares") and print a summary of its contents to 530 stdout. This includes a list of leases, summaries of the hash tree, and 531 information from the UEB (URI Extension Block). For mutable file shares, it 532 will describe which version (seqnum and root-hash) is being stored in this 533 share. 534 535 "``tahoe debug dump-cap CAP``" will take a URI (a file read-cap, or a directory 536 read- or write- cap) and unpack it into separate pieces. The most useful 537 aspect of this command is to reveal the storage index for any given URI. This 538 can be used to locate the share files that are holding the encoded+encrypted 539 data for this file. 540 541 "``tahoe debug repl``" will launch an interactive python interpreter in which 542 the Tahoe packages and modules are available on sys.path (e.g. by using 'import 543 allmydata'). This is most useful from a source tree: it simply sets the 544 PYTHONPATH correctly and runs the 'python' executable. 545 546 "``tahoe debug corrupt-share SHAREFILE``" will flip a bit in the given 547 sharefile. This can be used to test the client-side verification/repair code. 548 Obviously, this command should not be used during normal operation. -
deleted file docs/frontends/CLI.txt
diff --git a/docs/frontends/CLI.txt b/docs/frontends/CLI.txt deleted file mode 100644 index d613a38..0000000
+ - 1 = The Tahoe CLI commands =2 3 1. Overview4 2. CLI Command Overview5 3. Node Management6 4. Virtual Drive Manipulation7 4.1. Starting Directories8 4.1.1. SECURITY NOTE: For users of shared systems9 4.2. Command Syntax Summary10 4.3. Command Examples11 5. Virtual Drive Maintenance12 6. Debugging13 14 == Overview ==15 16 Tahoe provides a single executable named "tahoe", which can be used to create17 and manage client/server nodes, manipulate the filesystem, and perform18 several debugging/maintenance tasks.19 20 This executable lives in the source tree at "bin/tahoe". Once you've done a21 build (by running "make"), bin/tahoe can be run in-place: if it discovers22 that it is being run from within a Tahoe source tree, it will modify sys.path23 as necessary to use all the source code and dependent libraries contained in24 that tree.25 26 If you've installed Tahoe (using "make install", or by installing a binary27 package), then the tahoe executable will be available somewhere else, perhaps28 in /usr/bin/tahoe . In this case, it will use your platform's normal29 PYTHONPATH search paths to find the tahoe code and other libraries.30 31 32 == CLI Command Overview ==33 34 The "tahoe" tool provides access to three categories of commands.35 36 * node management: create a client/server node, start/stop/restart it37 * filesystem manipulation: list files, upload, download, delete, rename38 * debugging: unpack cap-strings, examine share files39 40 To get a list of all commands, just run "tahoe" with no additional arguments.41 "tahoe --help" might also provide something useful.42 43 Running "tahoe --version" will display a list of version strings, starting44 with the "allmydata" module (which contains the majority of the Tahoe45 functionality) and including versions for a number of dependent libraries,46 like Twisted, Foolscap, pycryptopp, and zfec.47 48 49 == Node Management ==50 51 "tahoe create-node [NODEDIR]" is the basic make-a-new-node command. It52 creates a new directory and populates it with files that will allow the53 "tahoe start" command to use it later on. This command creates nodes that54 have client functionality (upload/download files), web API services55 (controlled by the 'webport' file), and storage services (unless56 "--no-storage" is specified).57 58 NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to59 publishing a web server on port 3456 (limited to the loopback interface, at60 127.0.0.1, to restrict access to other programs on the same host). All of the61 other "tahoe" subcommands use corresponding defaults.62 63 "tahoe create-client [NODEDIR]" creates a node with no storage service.64 That is, it behaves like "tahoe create-node --no-storage [NODEDIR]".65 (This is a change from versions prior to 1.6.0.)66 67 "tahoe create-introducer [NODEDIR]" is used to create the Introducer node.68 This node provides introduction services and nothing else. When started, this69 node will produce an introducer.furl, which should be published to all70 clients.71 72 "tahoe create-key-generator [NODEDIR]" is used to create a special73 "key-generation" service, which allows a client to offload their RSA key74 generation to a separate process. Since RSA key generation takes several75 seconds, and must be done each time a directory is created, moving it to a76 separate process allows the first process (perhaps a busy wapi server) to77 continue servicing other requests. The key generator exports a FURL that can78 be copied into a node to enable this functionality.79 80 "tahoe run [NODEDIR]" will start a previously-created node in the foreground.81 82 "tahoe start [NODEDIR]" will launch a previously-created node. It will launch83 the node into the background, using the standard Twisted "twistd"84 daemon-launching tool. On some platforms (including Windows) this command is85 unable to run a daemon in the background; in that case it behaves in the86 same way as "tahoe run".87 88 "tahoe stop [NODEDIR]" will shut down a running node.89 90 "tahoe restart [NODEDIR]" will stop and then restart a running node. This is91 most often used by developers who have just modified the code and want to92 start using their changes.93 94 95 == Filesystem Manipulation ==96 97 These commands let you exmaine a Tahoe filesystem, providing basic98 list/upload/download/delete/rename/mkdir functionality. They can be used as99 primitives by other scripts. Most of these commands are fairly thin wrappers100 around wapi calls.101 102 By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure103 out which Tahoe node they should use. When the CLI command uses wapi calls,104 it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that105 provides a wapi port will write its URL into this file. If you want to use106 a node on some other host, just create ~/.tahoe/ and copy that node's wapi107 URL into this file, and the CLI commands will contact that node instead of a108 local one.109 110 These commands also use a table of "aliases" to figure out which directory111 they ought to use a starting point. This is explained in more detail below.112 113 As of Tahoe v1.7, passing non-ASCII characters to the CLI should work,114 except on Windows. The command-line arguments are assumed to use the115 character encoding specified by the current locale.116 117 === Starting Directories ===118 119 As described in architecture.txt, the Tahoe distributed filesystem consists120 of a collection of directories and files, each of which has a "read-cap" or a121 "write-cap" (also known as a URI). Each directory is simply a table that maps122 a name to a child file or directory, and this table is turned into a string123 and stored in a mutable file. The whole set of directory and file "nodes" are124 connected together into a directed graph.125 126 To use this collection of files and directories, you need to choose a127 starting point: some specific directory that we will refer to as a128 "starting directory". For a given starting directory, the "ls129 [STARTING_DIR]:" command would list the contents of this directory,130 the "ls [STARTING_DIR]:dir1" command would look inside this directory131 for a child named "dir1" and list its contents, "ls132 [STARTING_DIR]:dir1/subdir2" would look two levels deep, etc.133 134 Note that there is no real global "root" directory, but instead each135 starting directory provides a different, possibly overlapping136 perspective on the graph of files and directories.137 138 Each tahoe node remembers a list of starting points, named "aliases",139 in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8140 encoded strings that stand in for a directory read- or write- cap. If141 you use the command line "ls" without any "[STARTING_DIR]:" argument,142 then it will use the default alias, which is "tahoe", therefore "tahoe143 ls" has the same effect as "tahoe ls tahoe:". The same goes for the144 other commands which can reasonably use a default alias: get, put,145 mkdir, mv, and rm.146 147 For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not148 found in ~/.tahoe/private/aliases, the CLI will use the contents of149 ~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting150 point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if151 necessary. However, once you've set a "tahoe:" alias with "tahoe set-alias",152 that will override anything in the old root_dir.cap file.153 154 The Tahoe CLI commands use the same filename syntax as scp and rsync155 -- an optional "alias:" prefix, followed by the pathname or filename.156 Some commands (like "tahoe cp") use the lack of an alias to mean that157 you want to refer to a local file, instead of something from the tahoe158 virtual filesystem. [TODO] Another way to indicate this is to start159 the pathname with a dot, slash, or tilde.160 161 When you're dealing a single starting directory, the "tahoe:" alias is162 all you need. But when you want to refer to something that isn't yet163 attached to the graph rooted at that starting directory, you need to164 refer to it by its capability. The way to do that is either to use its165 capability directory as an argument on the command line, or to add an166 alias to it, with the "tahoe add-alias" command. Once you've added an167 alias, you can use that alias as an argument to commands.168 169 The best way to get started with Tahoe is to create a node, start it, then170 use the following command to create a new directory and set it as your171 "tahoe:" alias:172 173 tahoe create-alias tahoe174 175 After that you can use "tahoe ls tahoe:" and "tahoe cp local.txt tahoe:",176 and both will refer to the directory that you've just created.177 178 ==== SECURITY NOTE: For users of shared systems ====179 180 Another way to achieve the same effect as the above "tahoe create-alias"181 command is:182 183 tahoe add-alias tahoe `tahoe mkdir`184 185 However, command-line arguments are visible to other users (through the186 'ps' command, or the Windows Process Explorer tool), so if you are using a187 tahoe node on a shared host, your login neighbors will be able to see (and188 capture) any directory caps that you set up with the "tahoe add-alias"189 command.190 191 The "tahoe create-alias" command avoids this problem by creating a new192 directory and putting the cap into your aliases file for you. Alternatively,193 you can edit the NODEDIR/private/aliases file directly, by adding a line like194 this:195 196 fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa197 198 By entering the dircap through the editor, the command-line arguments are199 bypassed, and other users will not be able to see them. Once you've added the200 alias, no other secrets are passed through the command line, so this201 vulnerability becomes less significant: they can still see your filenames and202 other arguments you type there, but not the caps that Tahoe uses to permit203 access to your files and directories.204 205 206 === Command Syntax Summary ===207 208 tahoe add-alias alias cap209 tahoe create-alias alias210 tahoe list-aliases211 tahoe mkdir212 tahoe mkdir [alias:]path213 tahoe ls [alias:][path]214 tahoe webopen [alias:][path]215 tahoe put [--mutable] [localfrom:-]216 tahoe put [--mutable] [localfrom:-] [alias:]to217 tahoe put [--mutable] [localfrom:-] [alias:]subdir/to218 tahoe put [--mutable] [localfrom:-] dircap:to219 tahoe put [--mutable] [localfrom:-] dircap:./subdir/to220 tahoe put [localfrom:-] mutable-file-writecap221 tahoe get [alias:]from [localto:-]222 tahoe cp [-r] [alias:]frompath [alias:]topath223 tahoe rm [alias:]what224 tahoe mv [alias:]from [alias:]to225 tahoe ln [alias:]from [alias:]to226 tahoe backup localfrom [alias:]to227 228 === Command Examples ===229 230 tahoe mkdir231 232 This creates a new empty unlinked directory, and prints its write-cap to233 stdout. The new directory is not attached to anything else.234 235 tahoe add-alias fun DIRCAP236 237 An example would be:238 239 tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa240 241 This creates an alias "fun:" and configures it to use the given directory242 cap. Once this is done, "tahoe ls fun:" will list the contents of this243 directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the244 default "tahoe:" alias.245 246 tahoe create-alias fun247 248 This combines 'tahoe mkdir' and 'tahoe add-alias' into a single step.249 250 tahoe list-aliases251 252 This displays a table of all configured aliases.253 254 tahoe mkdir subdir255 tahoe mkdir /subdir256 257 This both create a new empty directory and attaches it to your root with the258 name "subdir".259 260 tahoe ls261 tahoe ls /262 tahoe ls tahoe:263 tahoe ls tahoe:/264 265 All four list the root directory of your personal virtual filesystem.266 267 tahoe ls subdir268 269 This lists a subdirectory of your filesystem.270 271 tahoe webopen272 tahoe webopen tahoe:273 tahoe webopen tahoe:subdir/274 tahoe webopen subdir/275 276 This uses the python 'webbrowser' module to cause a local web browser to277 open to the web page for the given directory. This page offers interfaces to278 add, dowlonad, rename, and delete files in the directory. If not given an279 alias or path, opens "tahoe:", the root dir of the default alias.280 281 tahoe put file.txt282 tahoe put ./file.txt283 tahoe put /tmp/file.txt284 tahoe put ~/file.txt285 286 These upload the local file into the grid, and prints the new read-cap to287 stdout. The uploaded file is not attached to any directory. All one-argument288 forms of "tahoe put" perform an unlinked upload.289 290 tahoe put -291 tahoe put292 293 These also perform an unlinked upload, but the data to be uploaded is taken294 from stdin.295 296 tahoe put file.txt uploaded.txt297 tahoe put file.txt tahoe:uploaded.txt298 299 These upload the local file and add it to your root with the name300 "uploaded.txt"301 302 tahoe put file.txt subdir/foo.txt303 tahoe put - subdir/foo.txt304 tahoe put file.txt tahoe:subdir/foo.txt305 tahoe put file.txt DIRCAP:./foo.txt306 tahoe put file.txt DIRCAP:./subdir/foo.txt307 308 These upload the named file and attach them to a subdirectory of the given309 root directory, under the name "foo.txt". Note that to use a directory310 write-cap instead of an alias, you must use ":./" as a separator, rather311 than ":", to help the CLI parser figure out where the dircap ends. When the312 source file is named "-", the contents are taken from stdin.313 314 tahoe put file.txt --mutable315 316 Create a new mutable file, fill it with the contents of file.txt, and print317 the new write-cap to stdout.318 319 tahoe put file.txt MUTABLE-FILE-WRITECAP320 321 Replace the contents of the given mutable file with the contents of file.txt322 and prints the same write-cap to stdout.323 324 tahoe cp file.txt tahoe:uploaded.txt325 tahoe cp file.txt tahoe:326 tahoe cp file.txt tahoe:/327 tahoe cp ./file.txt tahoe:328 329 These upload the local file and add it to your root with the name330 "uploaded.txt".331 332 tahoe cp tahoe:uploaded.txt downloaded.txt333 tahoe cp tahoe:uploaded.txt ./downloaded.txt334 tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt335 tahoe cp tahoe:uploaded.txt ~/downloaded.txt336 337 This downloads the named file from your tahoe root, and puts the result on338 your local filesystem.339 340 tahoe cp tahoe:uploaded.txt fun:stuff.txt341 342 This copies a file from your tahoe root to a different virtual directory,343 set up earlier with "tahoe add-alias fun DIRCAP".344 345 tahoe rm uploaded.txt346 tahoe rm tahoe:uploaded.txt347 348 This deletes a file from your tahoe root.349 350 tahoe mv uploaded.txt renamed.txt351 tahoe mv tahoe:uploaded.txt tahoe:renamed.txt352 353 These rename a file within your tahoe root directory.354 355 tahoe mv uploaded.txt fun:356 tahoe mv tahoe:uploaded.txt fun:357 tahoe mv tahoe:uploaded.txt fun:uploaded.txt358 359 These move a file from your tahoe root directory to the virtual directory360 set up earlier with "tahoe add-alias fun DIRCAP"361 362 tahoe backup ~ work:backups363 364 This command performs a full versioned backup of every file and directory365 underneath your "~" home directory, placing an immutable timestamped366 snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the367 timestamp is in UTC, hence the "Z" suffix), and a link to the latest368 snapshot in work:backups/Latest/ . This command uses a small SQLite database369 known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to370 remember which local files have been backed up already, and will avoid371 uploading files that have already been backed up. It compares timestamps and372 filesizes when making this comparison. It also re-uses existing directories373 which have identical contents. This lets it run faster and reduces the374 number of directories created.375 376 If you reconfigure your client node to switch to a different grid, you377 should delete the stale backupdb.sqlite file, to force "tahoe backup" to378 upload all files to the new grid.379 380 tahoe backup --exclude=*~ ~ work:backups381 382 Same as above, but this time the backup process will ignore any383 filename that will end with '~'. '--exclude' will accept any standard384 unix shell-style wildcards, have a look at385 http://docs.python.org/library/fnmatch.html for a more detailed386 reference. You may give multiple '--exclude' options. Please pay387 attention that the pattern will be matched against any level of the388 directory tree, it's still impossible to specify absolute path exclusions.389 390 tahoe backup --exclude-from=/path/to/filename ~ work:backups391 392 '--exclude-from' is similar to '--exclude', but reads exclusion393 patterns from '/path/to/filename', one per line.394 395 tahoe backup --exclude-vcs ~ work:backups396 397 This command will ignore any known file or directory that's used by398 version control systems to store metadata. The list of the exluded399 names is:400 401 * CVS402 * RCS403 * SCCS404 * .git405 * .gitignore406 * .cvsignore407 * .svn408 * .arch-ids409 * {arch}410 * =RELEASE-ID411 * =meta-update412 * =update413 * .bzr414 * .bzrignore415 * .bzrtags416 * .hg417 * .hgignore418 * _darcs419 420 == Storage Grid Maintenance ==421 422 tahoe manifest tahoe:423 tahoe manifest --storage-index tahoe:424 tahoe manifest --verify-cap tahoe:425 tahoe manifest --repair-cap tahoe:426 tahoe manifest --raw tahoe:427 428 This performs a recursive walk of the given directory, visiting every file429 and directory that can be reached from that point. It then emits one line to430 stdout for each object it encounters.431 432 The default behavior is to print the access cap string (like URI:CHK:.. or433 URI:DIR2:..), followed by a space, followed by the full path name.434 435 If --storage-index is added, each line will instead contain the object's436 storage index. This (string) value is useful to determine which share files437 (on the server) are associated with this directory tree. The --verify-cap438 and --repair-cap options are similar, but emit a verify-cap and repair-cap,439 respectively. If --raw is provided instead, the output will be a440 JSON-encoded dictionary that includes keys for pathnames, storage index441 strings, and cap strings. The last line of the --raw output will be a JSON442 encoded deep-stats dictionary.443 444 tahoe stats tahoe:445 446 This performs a recursive walk of the given directory, visiting every file447 and directory that can be reached from that point. It gathers statistics on448 the sizes of the objects it encounters, and prints a summary to stdout.449 450 451 == Debugging ==452 453 For a list of all debugging commands, use "tahoe debug".454 455 "tahoe debug find-shares STORAGEINDEX NODEDIRS.." will look through one or456 more storage nodes for the share files that are providing storage for the457 given storage index.458 459 "tahoe debug catalog-shares NODEDIRS.." will look through one or more storage460 nodes and locate every single share they contain. It produces a report on461 stdout with one line per share, describing what kind of share it is, the462 storage index, the size of the file is used for, etc. It may be useful to463 concatenate these reports from all storage hosts and use it to look for464 anomalies.465 466 "tahoe debug dump-share SHAREFILE" will take the name of a single share file467 (as found by "tahoe find-shares") and print a summary of its contents to468 stdout. This includes a list of leases, summaries of the hash tree, and469 information from the UEB (URI Extension Block). For mutable file shares, it470 will describe which version (seqnum and root-hash) is being stored in this471 share.472 473 "tahoe debug dump-cap CAP" will take a URI (a file read-cap, or a directory474 read- or write- cap) and unpack it into separate pieces. The most useful475 aspect of this command is to reveal the storage index for any given URI. This476 can be used to locate the share files that are holding the encoded+encrypted477 data for this file.478 479 "tahoe debug repl" will launch an interactive python interpreter in which the480 Tahoe packages and modules are available on sys.path (e.g. by using 'import481 allmydata'). This is most useful from a source tree: it simply sets the482 PYTHONPATH correctly and runs the 'python' executable.483 484 "tahoe debug corrupt-share SHAREFILE" will flip a bit in the given sharefile.485 This can be used to test the client-side verification/repair code. Obviously486 this command should not be used during normal operation. -
new file docs/frontends/FTP-and-SFTP.rst
diff --git a/docs/frontends/FTP-and-SFTP.rst b/docs/frontends/FTP-and-SFTP.rst new file mode 100644 index 0000000..230dca3
- + 1 ================================= 2 Tahoe-LAFS FTP and SFTP Frontends 3 ================================= 4 5 1. `FTP/SFTP Background`_ 6 2. `Tahoe-LAFS Support`_ 7 3. `Creating an Account File`_ 8 4. `Configuring FTP Access`_ 9 5. `Configuring SFTP Access`_ 10 6. `Dependencies`_ 11 7. `Immutable and mutable files`_ 12 8. `Known Issues`_ 13 14 15 FTP/SFTP Background 16 =================== 17 18 FTP is the venerable internet file-transfer protocol, first developed in 19 1971. The FTP server usually listens on port 21. A separate connection is 20 used for the actual data transfers, either in the same direction as the 21 initial client-to-server connection (for PORT mode), or in the reverse 22 direction (for PASV) mode. Connections are unencrypted, so passwords, file 23 names, and file contents are visible to eavesdroppers. 24 25 SFTP is the modern replacement, developed as part of the SSH "secure shell" 26 protocol, and runs as a subchannel of the regular SSH connection. The SSH 27 server usually listens on port 22. All connections are encrypted. 28 29 Both FTP and SFTP were developed assuming a UNIX-like server, with accounts 30 and passwords, octal file modes (user/group/other, read/write/execute), and 31 ctime/mtime timestamps. 32 33 Tahoe-LAFS Support 34 ================== 35 36 All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP 37 clients (like /usr/bin/ftp, ncftp, and countless others) to access the 38 virtual filesystem. They can also run an SFTP server, so SFTP clients (like 39 /usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends 40 sit at the same level as the webapi interface. 41 42 Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers 43 must be configured with a way to first authenticate a user (confirm that a 44 prospective client has a legitimate claim to whatever authorities we might 45 grant a particular user), and second to decide what root directory cap should 46 be granted to the authenticated username. A username and password is used 47 for this purpose. (The SFTP protocol is also capable of using client 48 RSA or DSA public keys, but this is not currently implemented.) 49 50 Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The 51 first is a simple flat file with one account per line. The second is an 52 HTTP-based login mechanism, backed by simple PHP script and a database. The 53 latter form is used by allmydata.com to provide secure access to customer 54 rootcaps. 55 56 Creating an Account File 57 ======================== 58 59 To use the first form, create a file (probably in 60 BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a 61 space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so:: 62 63 % cat BASEDIR/private/ftp.accounts 64 # This is a password line, (username, password, rootcap) 65 alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a 66 bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja 67 68 Future versions of Tahoe-LAFS may support using client public keys for SFTP. 69 The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify 70 the public key format, so users cannot have a password equal to either of 71 these strings. 72 73 Now add an 'accounts.file' directive to your tahoe.cfg file, as described 74 in the next sections. 75 76 Configuring FTP Access 77 ====================== 78 79 To enable the FTP server with an accounts file, add the following lines to 80 the BASEDIR/tahoe.cfg file:: 81 82 [ftpd] 83 enabled = true 84 port = tcp:8021:interface=127.0.0.1 85 accounts.file = private/ftp.accounts 86 87 The FTP server will listen on the given port number and on the loopback 88 interface only. The "accounts.file" pathname will be interpreted 89 relative to the node's BASEDIR. 90 91 To enable the FTP server with an account server instead, provide the URL of 92 that server in an "accounts.url" directive:: 93 94 [ftpd] 95 enabled = true 96 port = tcp:8021:interface=127.0.0.1 97 accounts.url = https://example.com/login 98 99 You can provide both accounts.file and accounts.url, although it probably 100 isn't very useful except for testing. 101 102 FTP provides no security, and so your password or caps could be eavesdropped 103 if you connect to the FTP server remotely. The examples above include 104 ":interface=127.0.0.1" in the "port" option, which causes the server to only 105 accept connections from localhost. 106 107 Configuring SFTP Access 108 ======================= 109 110 The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH 111 server. It is important to give each server a distinct keypair, to prevent 112 one server from masquerading as different one. The first time a client 113 program talks to a given server, it will store the host key it receives, and 114 will complain if a subsequent connection uses a different key. This reduces 115 the opportunity for man-in-the-middle attacks to just the first connection. 116 117 Exercise caution when connecting to the SFTP server remotely. The AES 118 implementation used by the SFTP code does not have defenses against timing 119 attacks. The code for encrypting the SFTP connection was not written by the 120 Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed 121 the code for encrypting files and directories in Tahoe-LAFS itself. If you 122 can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway) 123 only from a client on the same host, then you would be safe from any problem 124 with the SFTP connection security. The examples given below enforce this 125 policy by including ":interface=127.0.0.1" in the "port" option, which 126 causes the server to only accept connections from localhost. 127 128 You will use directives in the tahoe.cfg file to tell the SFTP code where to 129 find these keys. To create one, use the ``ssh-keygen`` tool (which comes with 130 the standard openssh client distribution):: 131 132 % cd BASEDIR 133 % ssh-keygen -f private/ssh_host_rsa_key 134 135 The server private key file must not have a passphrase. 136 137 Then, to enable the SFTP server with an accounts file, add the following 138 lines to the BASEDIR/tahoe.cfg file:: 139 140 [sftpd] 141 enabled = true 142 port = tcp:8022:interface=127.0.0.1 143 host_pubkey_file = private/ssh_host_rsa_key.pub 144 host_privkey_file = private/ssh_host_rsa_key 145 accounts.file = private/ftp.accounts 146 147 The SFTP server will listen on the given port number and on the loopback 148 interface only. The "accounts.file" pathname will be interpreted 149 relative to the node's BASEDIR. 150 151 Or, to use an account server instead, do this:: 152 153 [sftpd] 154 enabled = true 155 port = tcp:8022:interface=127.0.0.1 156 host_pubkey_file = private/ssh_host_rsa_key.pub 157 host_privkey_file = private/ssh_host_rsa_key 158 accounts.url = https://example.com/login 159 160 You can provide both accounts.file and accounts.url, although it probably 161 isn't very useful except for testing. 162 163 For further information on SFTP compatibility and known issues with various 164 clients and with the sshfs filesystem, see 165 http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend . 166 167 Dependencies 168 ============ 169 170 The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is 171 a twisted shell, get it?). Many Linux distributions package the Conch code 172 separately: debian puts it in the "python-twisted-conch" package. Conch 173 requires the "pycrypto" package, which is a Python+C implementation of many 174 cryptographic functions (the debian package is named "python-crypto"). 175 176 Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS 177 uses (which is a Python wrapper around the C++ -based Crypto++ library, a 178 library that is frequently installed as /usr/lib/libcryptopp.a, to avoid 179 problems with non-alphanumerics in filenames). 180 181 The FTP server requires code in Twisted that enables asynchronous closing of 182 file-upload operations. This code was landed to Twisted's SVN trunk in r28453 183 on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should 184 be present in the next release after that. To use Tahoe-LAFS's FTP server with 185 Twisted-10.0 or earlier, you will need to apply the patch attached to 186 http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to 187 start the FTP server unless it detects the necessary support code in Twisted. 188 This patch is not needed for SFTP. 189 190 Immutable and Mutable Files 191 =========================== 192 193 All files created via SFTP (and FTP) are immutable files. However, files 194 can only be created in writeable directories, which allows the directory 195 entry to be relinked to a different file. Normally, when the path of an 196 immutable file is opened for writing by SFTP, the directory entry is 197 relinked to another file with the newly written contents when the file 198 handle is closed. The old file is still present on the grid, and any other 199 caps to it will remain valid. (See docs/garbage-collection.txt for how to 200 reclaim the space used by files that are no longer needed.) 201 202 The 'no-write' metadata field of a directory entry can override this 203 behaviour. If the 'no-write' field holds a true value, then a permission 204 error will occur when trying to write to the file, even if it is in a 205 writeable directory. This does not prevent the directory entry from being 206 unlinked or replaced. 207 208 When using sshfs, the 'no-write' field can be set by clearing the 'w' 209 bits in the Unix permissions, for example using the command 210 'chmod 444 path/to/file'. Note that this does not mean that arbitrary 211 combinations of Unix permissions are supported. If the 'w' bits are 212 cleared on a link to a mutable file or directory, that link will become 213 read-only. 214 215 If SFTP is used to write to an existing mutable file, it will publish a 216 new version when the file handle is closed. 217 218 Known Issues 219 ============ 220 221 Mutable files are not supported by the FTP frontend (`ticket #680 222 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/680>`_). Currently, a directory 223 containing mutable files cannot even be listed over FTP. 224 225 The FTP frontend sometimes fails to report errors, for example if an upload 226 fails because it does meet the "servers of happiness" threshold (`ticket #1081 227 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1081>`_). Upload errors also may not 228 be reported when writing files using SFTP via sshfs (`ticket #1059 229 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1059>`_). 230 231 Non-ASCII filenames are not supported by FTP (`ticket #682 232 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/682>`_). They can be used 233 with SFTP only if the client encodes filenames as UTF-8 (`ticket #1089 234 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1089>`_). 235 236 The gateway node may incur a memory leak when accessing many files via SFTP 237 (`ticket #1045 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045>`_). 238 239 For other known issues in SFTP, see 240 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>. -
deleted file docs/frontends/FTP-and-SFTP.txt
diff --git a/docs/frontends/FTP-and-SFTP.txt b/docs/frontends/FTP-and-SFTP.txt deleted file mode 100644 index 8facc09..0000000
+ - 1 = Tahoe-LAFS FTP and SFTP Frontends =2 3 1. FTP/SFTP Background4 2. Tahoe-LAFS Support5 3. Creating an Account File6 4. Configuring FTP Access7 5. Configuring SFTP Access8 6. Dependencies9 7. Immutable and mutable files10 11 12 == FTP/SFTP Background ==13 14 FTP is the venerable internet file-transfer protocol, first developed in15 1971. The FTP server usually listens on port 21. A separate connection is16 used for the actual data transfers, either in the same direction as the17 initial client-to-server connection (for PORT mode), or in the reverse18 direction (for PASV) mode. Connections are unencrypted, so passwords, file19 names, and file contents are visible to eavesdroppers.20 21 SFTP is the modern replacement, developed as part of the SSH "secure shell"22 protocol, and runs as a subchannel of the regular SSH connection. The SSH23 server usually listens on port 22. All connections are encrypted.24 25 Both FTP and SFTP were developed assuming a UNIX-like server, with accounts26 and passwords, octal file modes (user/group/other, read/write/execute), and27 ctime/mtime timestamps.28 29 30 == Tahoe-LAFS Support ==31 32 All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP33 clients (like /usr/bin/ftp, ncftp, and countless others) to access the34 virtual filesystem. They can also run an SFTP server, so SFTP clients (like35 /usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends36 sit at the same level as the webapi interface.37 38 Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers39 must be configured with a way to first authenticate a user (confirm that a40 prospective client has a legitimate claim to whatever authorities we might41 grant a particular user), and second to decide what root directory cap should42 be granted to the authenticated username. A username and password is used43 for this purpose. (The SFTP protocol is also capable of using client44 RSA or DSA public keys, but this is not currently implemented.)45 46 Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The47 first is a simple flat file with one account per line. The second is an48 HTTP-based login mechanism, backed by simple PHP script and a database. The49 latter form is used by allmydata.com to provide secure access to customer50 rootcaps.51 52 53 == Creating an Account File ==54 55 To use the first form, create a file (probably in56 BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a57 space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so:58 59 % cat BASEDIR/private/ftp.accounts60 # This is a password line, (username, password, rootcap)61 alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a62 bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja63 64 Future versions of Tahoe-LAFS may support using client public keys for SFTP.65 The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify66 the public key format, so users cannot have a password equal to either of67 these strings.68 69 Now add an 'accounts.file' directive to your tahoe.cfg file, as described70 in the next sections.71 72 73 == Configuring FTP Access ==74 75 To enable the FTP server with an accounts file, add the following lines to76 the BASEDIR/tahoe.cfg file:77 78 [ftpd]79 enabled = true80 port = tcp:8021:interface=127.0.0.181 accounts.file = private/ftp.accounts82 83 The FTP server will listen on the given port number and on the loopback84 interface only. The "accounts.file" pathname will be interpreted85 relative to the node's BASEDIR.86 87 To enable the FTP server with an account server instead, provide the URL of88 that server in an "accounts.url" directive:89 90 [ftpd]91 enabled = true92 port = tcp:8021:interface=127.0.0.193 accounts.url = https://example.com/login94 95 You can provide both accounts.file and accounts.url, although it probably96 isn't very useful except for testing.97 98 FTP provides no security, and so your password or caps could be eavesdropped99 if you connect to the FTP server remotely. The examples above include100 ":interface=127.0.0.1" in the "port" option, which causes the server to only101 accept connections from localhost.102 103 104 == Configuring SFTP Access ==105 106 The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH107 server. It is important to give each server a distinct keypair, to prevent108 one server from masquerading as different one. The first time a client109 program talks to a given server, it will store the host key it receives, and110 will complain if a subsequent connection uses a different key. This reduces111 the opportunity for man-in-the-middle attacks to just the first connection.112 113 Exercise caution when connecting to the SFTP server remotely. The AES114 implementation used by the SFTP code does not have defenses against timing115 attacks. The code for encrypting the SFTP connection was not written by the116 Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed117 the code for encrypting files and directories in Tahoe-LAFS itself. If you118 can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway)119 only from a client on the same host, then you would be safe from any problem120 with the SFTP connection security. The examples given below enforce this121 policy by including ":interface=127.0.0.1" in the "port" option, which122 causes the server to only accept connections from localhost.123 124 You will use directives in the tahoe.cfg file to tell the SFTP code where to125 find these keys. To create one, use the ssh-keygen tool (which comes with the126 standard openssh client distribution):127 128 % cd BASEDIR129 % ssh-keygen -f private/ssh_host_rsa_key130 131 The server private key file must not have a passphrase.132 133 Then, to enable the SFTP server with an accounts file, add the following134 lines to the BASEDIR/tahoe.cfg file:135 136 [sftpd]137 enabled = true138 port = tcp:8022:interface=127.0.0.1139 host_pubkey_file = private/ssh_host_rsa_key.pub140 host_privkey_file = private/ssh_host_rsa_key141 accounts.file = private/ftp.accounts142 143 The SFTP server will listen on the given port number and on the loopback144 interface only. The "accounts.file" pathname will be interpreted145 relative to the node's BASEDIR.146 147 Or, to use an account server instead, do this:148 149 [sftpd]150 enabled = true151 port = tcp:8022:interface=127.0.0.1152 host_pubkey_file = private/ssh_host_rsa_key.pub153 host_privkey_file = private/ssh_host_rsa_key154 accounts.url = https://example.com/login155 156 You can provide both accounts.file and accounts.url, although it probably157 isn't very useful except for testing.158 159 For further information on SFTP compatibility and known issues with various160 clients and with the sshfs filesystem, see161 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.162 163 164 == Dependencies ==165 166 The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is a167 twisted shell, get it?). Many Linux distributions package the Conch code168 separately: debian puts it in the "python-twisted-conch" package. Conch169 requires the "pycrypto" package, which is a Python+C implementation of many170 cryptographic functions (the debian package is named "python-crypto").171 172 Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS173 uses (which is a Python wrapper around the C++ -based Crypto++ library, a174 library that is frequently installed as /usr/lib/libcryptopp.a, to avoid175 problems with non-alphanumerics in filenames).176 177 The FTP server requires code in Twisted that enables asynchronous closing of178 file-upload operations. This code was landed to Twisted's SVN trunk in r28453179 on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should180 be present in the next release after that. To use Tahoe-LAFS's FTP server with181 Twisted-10.0 or earlier, you will need to apply the patch attached to182 http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to183 start the FTP server unless it detects the necessary support code in Twisted.184 This patch is not needed for SFTP.185 186 187 == Immutable and Mutable Files ==188 189 All files created via SFTP (and FTP) are immutable files. However, files190 can only be created in writeable directories, which allows the directory191 entry to be relinked to a different file. Normally, when the path of an192 immutable file is opened for writing by SFTP, the directory entry is193 relinked to another file with the newly written contents when the file194 handle is closed. The old file is still present on the grid, and any other195 caps to it will remain valid. (See docs/garbage-collection.txt for how to196 reclaim the space used by files that are no longer needed.)197 198 The 'no-write' metadata field of a directory entry can override this199 behaviour. If the 'no-write' field holds a true value, then a permission200 error will occur when trying to write to the file, even if it is in a201 writeable directory. This does not prevent the directory entry from being202 unlinked or replaced.203 204 When using sshfs, the 'no-write' field can be set by clearing the 'w'205 bits in the Unix permissions, for example using the command206 'chmod 444 path/to/file'. Note that this does not mean that arbitrary207 combinations of Unix permissions are supported. If the 'w' bits are208 cleared on a link to a mutable file or directory, that link will become209 read-only.210 211 If SFTP is used to write to an existing mutable file, it will publish a212 new version when the file handle is closed.213 214 215 == Known Issues ==216 217 Mutable files are not supported by the FTP frontend (ticket #680). Currently,218 a directory containing mutable files cannot even be listed over FTP.219 220 The FTP frontend sometimes fails to report errors, for example if an upload221 fails because it does meet the "servers of happiness" threshold (ticket #1081).222 Upload errors also may not be reported when writing files using SFTP via sshfs223 (ticket #1059).224 225 Non-ASCII filenames are not supported by FTP (ticket #682). They can be used226 with SFTP only if the client encodes filenames as UTF-8 (ticket #1089).227 228 The gateway node may incur a memory leak when accessing many files via SFTP229 (ticket #1045).230 231 For other known issues in SFTP, see232 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>. -
new file docs/frontends/download-status.rst
diff --git a/docs/frontends/download-status.rst b/docs/frontends/download-status.rst new file mode 100644 index 0000000..315b6a3
- + 1 =============== 2 Download status 3 =============== 4 5 6 Introduction 7 ============ 8 9 The WUI will display the "status" of uploads and downloads. 10 11 The Welcome Page has a link entitled "Recent Uploads and Downloads" 12 which goes to this URL: 13 14 http://$GATEWAY/status 15 16 Each entry in the list of recent operations has a "status" link which 17 will take you to a page describing that operation. 18 19 For immutable downloads, the page has a lot of information, and this 20 document is to explain what it all means. It was written by Brian 21 Warner, who wrote the v1.8.0 downloader code and the code which 22 generates this status report about the v1.8.0 downloader's 23 behavior. Brian posted it to the trac: 24 http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:1 25 26 Then Zooko lightly edited it while copying it into the docs/ 27 directory. 28 29 What's involved in a download? 30 ============================== 31 32 Downloads are triggered by read() calls, each with a starting offset (defaults 33 to 0) and a length (defaults to the whole file). A regular webapi GET request 34 will result in a whole-file read() call. 35 36 Each read() call turns into an ordered sequence of get_segment() calls. A 37 whole-file read will fetch all segments, in order, but partial reads or 38 multiple simultaneous reads will result in random-access of segments. Segment 39 reads always return ciphertext: the layer above that (in read()) is responsible 40 for decryption. 41 42 Before we can satisfy any segment reads, we need to find some shares. ("DYHB" 43 is an abbreviation for "Do You Have Block", and is the message we send to 44 storage servers to ask them if they have any shares for us. The name is 45 historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive. 46 Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come 47 back eventually, or don't. 48 49 Once we get enough positive DYHB responses, we have enough shares to start 50 downloading. We send "block requests" for various pieces of the share. 51 Responses come back eventually, or don't. 52 53 When we get enough block-request responses for a given segment, we can decode 54 the data and satisfy the segment read. 55 56 When the segment read completes, some or all of the segment data is used to 57 satisfy the read() call (if the read call started or ended in the middle of a 58 segment, we'll only use part of the data, otherwise we'll use all of it). 59 60 Data on the download-status page 61 ================================ 62 63 DYHB Requests 64 ------------- 65 66 This shows every Do-You-Have-Block query sent to storage servers and their 67 results. Each line shows the following: 68 69 * the serverid to which the request was sent 70 * the time at which the request was sent. Note that all timestamps are 71 relative to the start of the first read() call and indicated with a "+" sign 72 * the time at which the response was received (if ever) 73 * the share numbers that the server has, if any 74 * the elapsed time taken by the request 75 76 Also, each line is colored according to the serverid. This color is also used 77 in the "Requests" section below. 78 79 Read Events 80 ----------- 81 82 This shows all the FileNode read() calls and their overall results. Each line 83 shows: 84 85 * the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file 86 GET will start at 0 and read the entire file. 87 * the time at which the read() was made 88 * the time at which the request finished, either because the last byte of data 89 was returned to the read() caller, or because they cancelled the read by 90 calling stopProducing (i.e. closing the HTTP connection) 91 * the number of bytes returned to the caller so far 92 * the time spent on the read, so far 93 * the total time spent in AES decryption 94 * total time spend paused by the client (pauseProducing), generally because the 95 HTTP connection filled up, which most streaming media players will do to 96 limit how much data they have to buffer 97 * effective speed of the read(), not including paused time 98 99 Segment Events 100 -------------- 101 102 This shows each get_segment() call and its resolution. This table is not well 103 organized, and my post-1.8.0 work will clean it up a lot. In its present form, 104 it records "request" and "delivery" events separately, indicated by the "type" 105 column. 106 107 Each request shows the segment number being requested and the time at which the 108 get_segment() call was made. 109 110 Each delivery shows: 111 112 * segment number 113 * range of file data (as [OFFSET:+SIZE]) delivered 114 * elapsed time spent doing ZFEC decoding 115 * overall elapsed time fetching the segment 116 * effective speed of the segment fetch 117 118 Requests 119 -------- 120 121 This shows every block-request sent to the storage servers. Each line shows: 122 123 * the server to which the request was sent 124 * which share number it is referencing 125 * the portion of the share data being requested (as [OFFSET:+SIZE]) 126 * the time the request was sent 127 * the time the response was received (if ever) 128 * the amount of data that was received (which might be less than SIZE if we 129 tried to read off the end of the share) 130 * the elapsed time for the request (RTT=Round-Trip-Time) 131 132 Also note that each Request line is colored according to the serverid it was 133 sent to. And all timestamps are shown relative to the start of the first 134 read() call: for example the first DYHB message was sent at +0.001393s about 135 1.4 milliseconds after the read() call started everything off. -
deleted file docs/frontends/download-status.txt
diff --git a/docs/frontends/download-status.txt b/docs/frontends/download-status.txt deleted file mode 100644 index 90aaabf..0000000
+ - 1 The WUI will display the "status" of uploads and downloads.2 3 The Welcome Page has a link entitled "Recent Uploads and Downloads"4 which goes to this URL:5 6 http://$GATEWAY/status7 8 Each entry in the list of recent operations has a "status" link which9 will take you to a page describing that operation.10 11 For immutable downloads, the page has a lot of information, and this12 document is to explain what it all means. It was written by Brian13 Warner, who wrote the v1.8.0 downloader code and the code which14 generates this status report about the v1.8.0 downloader's15 behavior. Brian posted it to the trac:16 http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:117 18 Then Zooko lightly edited it while copying it into the docs/19 directory.20 21 -------22 23 First, what's involved in a download?:24 25 downloads are triggered by read() calls, each with a starting offset (defaults to 0) and a length (defaults to the whole file). A regular webapi GET request will result in a whole-file read() call26 each read() call turns into an ordered sequence of get_segment() calls. A whole-file read will fetch all segments, in order, but partial reads or multiple simultaneous reads will result in random-access of segments. Segment reads always return ciphertext: the layer above that (in read()) is responsible for decryption.27 before we can satisfy any segment reads, we need to find some shares. ("DYHB" is an abbreviation for "Do You Have Block", and is the message we send to storage servers to ask them if they have any shares for us. The name is historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive. Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come back eventually, or don't.28 Once we get enough positive DYHB responses, we have enough shares to start downloading. We send "block requests" for various pieces of the share. Responses come back eventually, or don't.29 When we get enough block-request responses for a given segment, we can decode the data and satisfy the segment read.30 When the segment read completes, some or all of the segment data is used to satisfy the read() call (if the read call started or ended in the middle of a segment, we'll only use part of the data, otherwise we'll use all of it).31 32 With that background, here is the data currently on the download-status page:33 34 "DYHB Requests": this shows every Do-You-Have-Block query sent to storage servers and their results. Each line shows the following:35 the serverid to which the request was sent36 the time at which the request was sent. Note that all timestamps are relative to the start of the first read() call and indicated with a "+" sign37 the time at which the response was received (if ever)38 the share numbers that the server has, if any39 the elapsed time taken by the request40 also, each line is colored according to the serverid. This color is also used in the "Requests" section below.41 42 "Read Events": this shows all the FileNode read() calls and their overall results. Each line shows:43 the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file GET will start at 0 and read the entire file.44 the time at which the read() was made45 the time at which the request finished, either because the last byte of data was returned to the read() caller, or because they cancelled the read by calling stopProducing (i.e. closing the HTTP connection)46 the number of bytes returned to the caller so far47 the time spent on the read, so far48 the total time spent in AES decryption49 total time spend paused by the client (pauseProducing), generally because the HTTP connection filled up, which most streaming media players will do to limit how much data they have to buffer50 effective speed of the read(), not including paused time51 52 "Segment Events": this shows each get_segment() call and its resolution. This table is not well organized, and my post-1.8.0 work will clean it up a lot. In its present form, it records "request" and "delivery" events separately, indicated by the "type" column.53 Each request shows the segment number being requested and the time at which the get_segment() call was made54 Each delivery shows:55 segment number56 range of file data (as [OFFSET:+SIZE]) delivered57 elapsed time spent doing ZFEC decoding58 overall elapsed time fetching the segment59 effective speed of the segment fetch60 61 "Requests": this shows every block-request sent to the storage servers. Each line shows:62 the server to which the request was sent63 which share number it is referencing64 the portion of the share data being requested (as [OFFSET:+SIZE])65 the time the request was sent66 the time the response was received (if ever)67 the amount of data that was received (which might be less than SIZE if we tried to read off the end of the share)68 the elapsed time for the request (RTT=Round-Trip-Time)69 70 Also note that each Request line is colored according to the serverid it was sent to. And all timestamps are shown relative to the start of the first read() call: for example the first DYHB message was sent at +0.001393s about 1.4 milliseconds after the read() call started everything off. -
new file docs/frontends/webapi.rst
diff --git a/docs/frontends/webapi.rst b/docs/frontends/webapi.rst new file mode 100644 index 0000000..31924bc
- + 1 ========================== 2 The Tahoe REST-ful Web API 3 ========================== 4 5 1. `Enabling the web-API port`_ 6 2. `Basic Concepts: GET, PUT, DELETE, POST`_ 7 3. `URLs`_ 8 9 1. `Child Lookup`_ 10 11 4. `Slow Operations, Progress, and Cancelling`_ 12 5. `Programmatic Operations`_ 13 14 1. `Reading a file`_ 15 2. `Writing/Uploading a File`_ 16 3. `Creating a New Directory`_ 17 4. `Get Information About A File Or Directory (as JSON)`_ 18 5. `Attaching an existing File or Directory by its read- or write-cap`_ 19 6. `Adding multiple files or directories to a parent directory at once`_ 20 7. `Deleting a File or Directory`_ 21 22 6. `Browser Operations: Human-Oriented Interfaces`_ 23 24 1. `Viewing A Directory (as HTML)`_ 25 2. `Viewing/Downloading a File`_ 26 3. `Get Information About A File Or Directory (as HTML)`_ 27 4. `Creating a Directory`_ 28 5. `Uploading a File`_ 29 6. `Attaching An Existing File Or Directory (by URI)`_ 30 7. `Deleting A Child`_ 31 8. `Renaming A Child`_ 32 9. `Other Utilities`_ 33 10. `Debugging and Testing Features`_ 34 35 7. `Other Useful Pages`_ 36 8. `Static Files in /public_html`_ 37 9. `Safety and security issues -- names vs. URIs`_ 38 10. `Concurrency Issues`_ 39 40 Enabling the web-API port 41 ========================= 42 43 Every Tahoe node is capable of running a built-in HTTP server. To enable 44 this, just write a port number into the "[node]web.port" line of your node's 45 tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]" 46 section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port 47 3456. 48 49 This string is actually a Twisted "strports" specification, meaning you can 50 get more control over the interface to which the server binds by supplying 51 additional arguments. For more details, see the documentation on 52 `twisted.application.strports 53 <http://twistedmatrix.com/documents/current/api/twisted.application.strports.html>`_. 54 55 Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same 56 but binds to the loopback interface, ensuring that only the programs on the 57 local host can connect. Using "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" 58 runs an SSL server. 59 60 This webport can be set when the node is created by passing a --webport 61 option to the 'tahoe create-node' command. By default, the node listens on 62 port 3456, on the loopback (127.0.0.1) interface. 63 64 Basic Concepts: GET, PUT, DELETE, POST 65 ====================================== 66 67 As described in `architecture.rst`_, each file and directory in a Tahoe virtual 68 filesystem is referenced by an identifier that combines the designation of 69 the object with the authority to do something with it (such as read or modify 70 the contents). This identifier is called a "read-cap" or "write-cap", 71 depending upon whether it enables read-only or read-write access. These 72 "caps" are also referred to as URIs. 73 74 .. _architecture.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/architecture.rst 75 76 The Tahoe web-based API is "REST-ful", meaning it implements the concepts of 77 "REpresentational State Transfer": the original scheme by which the World 78 Wide Web was intended to work. Each object (file or directory) is referenced 79 by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and 80 DELETE) are used to manipulate these objects. You can think of the URL as a 81 noun, and the method as a verb. 82 83 In REST, the GET method is used to retrieve information about an object, or 84 to retrieve some representation of the object itself. When the object is a 85 file, the basic GET method will simply return the contents of that file. 86 Other variations (generally implemented by adding query parameters to the 87 URL) will return information about the object, such as metadata. GET 88 operations are required to have no side-effects. 89 90 PUT is used to upload new objects into the filesystem, or to replace an 91 existing object. DELETE it used to delete objects from the filesystem. Both 92 PUT and DELETE are required to be idempotent: performing the same operation 93 multiple times must have the same side-effects as only performing it once. 94 95 POST is used for more complicated actions that cannot be expressed as a GET, 96 PUT, or DELETE. POST operations can be thought of as a method call: sending 97 some message to the object referenced by the URL. In Tahoe, POST is also used 98 for operations that must be triggered by an HTML form (including upload and 99 delete), because otherwise a regular web browser has no way to accomplish 100 these tasks. In general, everything that can be done with a PUT or DELETE can 101 also be done with a POST. 102 103 Tahoe's web API is designed for two different kinds of consumer. The first is 104 a program that needs to manipulate the virtual file system. Such programs are 105 expected to use the RESTful interface described above. The second is a human 106 using a standard web browser to work with the filesystem. This user is given 107 a series of HTML pages with links to download files, and forms that use POST 108 actions to upload, rename, and delete files. 109 110 When an error occurs, the HTTP response code will be set to an appropriate 111 400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request 112 when the parameters to a webapi operation are invalid), and the HTTP response 113 body will usually contain a few lines of explanation as to the cause of the 114 error and possible responses. Unusual exceptions may result in a 500 Internal 115 Server Error as a catch-all, with a default response body containing 116 a Nevow-generated HTML-ized representation of the Python exception stack trace 117 that caused the problem. CLI programs which want to copy the response body to 118 stderr should provide an "Accept: text/plain" header to their requests to get 119 a plain text stack trace instead. If the Accept header contains ``*/*``, or 120 ``text/*``, or text/html (or if there is no Accept header), HTML tracebacks will 121 be generated. 122 123 URLs 124 ==== 125 126 Tahoe uses a variety of read- and write- caps to identify files and 127 directories. The most common of these is the "immutable file read-cap", which 128 is used for most uploaded files. These read-caps look like the following:: 129 130 URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202 131 132 The next most common is a "directory write-cap", which provides both read and 133 write access to a directory, and look like this:: 134 135 URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq 136 137 There are also "directory read-caps", which start with "URI:DIR2-RO:", and 138 give read-only access to a directory. Finally there are also mutable file 139 read- and write- caps, which start with "URI:SSK", and give access to mutable 140 files. 141 142 (Later versions of Tahoe will make these strings shorter, and will remove the 143 unfortunate colons, which must be escaped when these caps are embedded in 144 URLs.) 145 146 To refer to any Tahoe object through the web API, you simply need to combine 147 a prefix (which indicates the HTTP server to use) with the cap (which 148 indicates which object inside that server to access). Since the default Tahoe 149 webport is 3456, the most common prefix is one that will use a local node 150 listening on this port:: 151 152 http://127.0.0.1:3456/uri/ + $CAP 153 154 So, to access the directory named above (which happens to be the 155 publically-writeable sample directory on the Tahoe test grid, described at 156 http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:: 157 158 http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/ 159 160 (note that the colons in the directory-cap are url-encoded into "%3A" 161 sequences). 162 163 Likewise, to access the file named above, use:: 164 165 http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202 166 167 In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap 168 or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap 169 that refers to a file (whether mutable or immutable). So those URLs above can 170 be abbreviated as:: 171 172 http://127.0.0.1:3456/uri/$DIRCAP/ 173 http://127.0.0.1:3456/uri/$FILECAP 174 175 The operation summaries below will abbreviate these further, by eliding the 176 server prefix. They will be displayed like this:: 177 178 /uri/$DIRCAP/ 179 /uri/$FILECAP 180 181 182 Child Lookup 183 ------------ 184 185 Tahoe directories contain named child entries, just like directories in a regular 186 local filesystem. These child entries, called "dirnodes", consist of a name, 187 metadata, a write slot, and a read slot. The write and read slots normally contain 188 a write-cap and read-cap referring to the same object, which can be either a file 189 or a subdirectory. The write slot may be empty (actually, both may be empty, 190 but that is unusual). 191 192 If you have a Tahoe URL that refers to a directory, and want to reference a 193 named child inside it, just append the child name to the URL. For example, if 194 our sample directory contains a file named "welcome.txt", we can refer to 195 that file with:: 196 197 http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt 198 199 (or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt) 200 201 Multiple levels of subdirectories can be handled this way:: 202 203 http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt 204 205 In this document, when we need to refer to a URL that references a file using 206 this child-of-some-directory format, we'll use the following string:: 207 208 /uri/$DIRCAP/[SUBDIRS../]FILENAME 209 210 The "[SUBDIRS../]" part means that there are zero or more (optional) 211 subdirectory names in the middle of the URL. The "FILENAME" at the end means 212 that this whole URL refers to a file of some sort, rather than to a 213 directory. 214 215 When we need to refer specifically to a directory in this way, we'll write:: 216 217 /uri/$DIRCAP/[SUBDIRS../]SUBDIR 218 219 220 Note that all components of pathnames in URLs are required to be UTF-8 221 encoded, so "resume.doc" (with an acute accent on both E's) would be accessed 222 with:: 223 224 http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc 225 226 Also note that the filenames inside upload POST forms are interpreted using 227 whatever character set was provided in the conventional '_charset' field, and 228 defaults to UTF-8 if not otherwise specified. The JSON representation of each 229 directory contains native unicode strings. Tahoe directories are specified to 230 contain unicode filenames, and cannot contain binary strings that are not 231 representable as such. 232 233 All Tahoe operations that refer to existing files or directories must include 234 a suitable read- or write- cap in the URL: the webapi server won't add one 235 for you. If you don't know the cap, you can't access the file. This allows 236 the security properties of Tahoe caps to be extended across the webapi 237 interface. 238 239 Slow Operations, Progress, and Cancelling 240 ========================================= 241 242 Certain operations can be expected to take a long time. The "t=deep-check", 243 described below, will recursively visit every file and directory reachable 244 from a given starting point, which can take minutes or even hours for 245 extremely large directory structures. A single long-running HTTP request is a 246 fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient 247 with waiting and give up on the connection. 248 249 For this reason, long-running operations have an "operation handle", which 250 can be used to poll for status/progress messages while the operation 251 proceeds. This handle can also be used to cancel the operation. These handles 252 are created by the client, and passed in as a an "ophandle=" query argument 253 to the POST or PUT request which starts the operation. The following 254 operations can then be used to retrieve status: 255 256 ``GET /operations/$HANDLE?output=HTML (with or without t=status)`` 257 258 ``GET /operations/$HANDLE?output=JSON (same)`` 259 260 These two retrieve the current status of the given operation. Each operation 261 presents a different sort of information, but in general the page retrieved 262 will indicate: 263 264 * whether the operation is complete, or if it is still running 265 * how much of the operation is complete, and how much is left, if possible 266 267 Note that the final status output can be quite large: a deep-manifest of a 268 directory structure with 300k directories and 200k unique files is about 269 275MB of JSON, and might take two minutes to generate. For this reason, the 270 full status is not provided until the operation has completed. 271 272 The HTML form will include a meta-refresh tag, which will cause a regular 273 web browser to reload the status page about 60 seconds later. This tag will 274 be removed once the operation has completed. 275 276 There may be more status information available under 277 /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space. 278 279 ``POST /operations/$HANDLE?t=cancel`` 280 281 This terminates the operation, and returns an HTML page explaining what was 282 cancelled. If the operation handle has already expired (see below), this 283 POST will return a 404, which indicates that the operation is no longer 284 running (either it was completed or terminated). The response body will be 285 the same as a GET /operations/$HANDLE on this operation handle, and the 286 handle will be expired immediately afterwards. 287 288 The operation handle will eventually expire, to avoid consuming an unbounded 289 amount of memory. The handle's time-to-live can be reset at any time, by 290 passing a retain-for= argument (with a count of seconds) to either the 291 initial POST that starts the operation, or the subsequent GET request which 292 asks about the operation. For example, if a 'GET 293 /operations/$HANDLE?output=JSON&retain-for=600' query is performed, the 294 handle will remain active for 600 seconds (10 minutes) after the GET was 295 received. 296 297 In addition, if the GET includes a release-after-complete=True argument, and 298 the operation has completed, the operation handle will be released 299 immediately. 300 301 If a retain-for= argument is not used, the default handle lifetimes are: 302 303 * handles will remain valid at least until their operation finishes 304 * uncollected handles for finished operations (i.e. handles for 305 operations that have finished but for which the GET page has not been 306 accessed since completion) will remain valid for four days, or for 307 the total time consumed by the operation, whichever is greater. 308 * collected handles (i.e. the GET page has been retrieved at least once 309 since the operation completed) will remain valid for one day. 310 311 Many "slow" operations can begin to use unacceptable amounts of memory when 312 operating on large directory structures. The memory usage increases when the 313 ophandle is polled, as the results must be copied into a JSON string, sent 314 over the wire, then parsed by a client. So, as an alternative, many "slow" 315 operations have streaming equivalents. These equivalents do not use operation 316 handles. Instead, they emit line-oriented status results immediately. Client 317 code can cancel the operation by simply closing the HTTP connection. 318 319 Programmatic Operations 320 ======================= 321 322 Now that we know how to build URLs that refer to files and directories in a 323 Tahoe virtual filesystem, what sorts of operations can we do with those URLs? 324 This section contains a catalog of GET, PUT, DELETE, and POST operations that 325 can be performed on these URLs. This set of operations are aimed at programs 326 that use HTTP to communicate with a Tahoe node. A later section describes 327 operations that are intended for web browsers. 328 329 Reading A File 330 -------------- 331 332 ``GET /uri/$FILECAP`` 333 334 ``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME`` 335 336 This will retrieve the contents of the given file. The HTTP response body 337 will contain the sequence of bytes that make up the file. 338 339 To view files in a web browser, you may want more control over the 340 Content-Type and Content-Disposition headers. Please see the next section 341 "Browser Operations", for details on how to modify these URLs for that 342 purpose. 343 344 Writing/Uploading A File 345 ------------------------ 346 347 ``PUT /uri/$FILECAP`` 348 349 ``PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME`` 350 351 Upload a file, using the data from the HTTP request body, and add whatever 352 child links and subdirectories are necessary to make the file available at 353 the given location. Once this operation succeeds, a GET on the same URL will 354 retrieve the same contents that were just uploaded. This will create any 355 necessary intermediate subdirectories. 356 357 To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file. 358 359 In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a 360 writeable mutable file, that file's contents will be overwritten in-place. If 361 it is a read-cap for a mutable file, an error will occur. If it is an 362 immutable file, the old file will be discarded, and a new one will be put in 363 its place. 364 365 When creating a new file, if "mutable=true" is in the query arguments, the 366 operation will create a mutable file instead of an immutable one. 367 368 This returns the file-cap of the resulting file. If a new file was created 369 by this method, the HTTP response code (as dictated by rfc2616) will be set 370 to 201 CREATED. If an existing file was replaced or modified, the response 371 code will be 200 OK. 372 373 Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt' 374 command can be used to invoke this operation. 375 376 ``PUT /uri`` 377 378 This uploads a file, and produces a file-cap for the contents, but does not 379 attach the file into the filesystem. No directories will be modified by 380 this operation. The file-cap is returned as the body of the HTTP response. 381 382 If "mutable=true" is in the query arguments, the operation will create a 383 mutable file, and return its write-cap in the HTTP respose. The default is 384 to create an immutable file, returning the read-cap as a response. 385 386 Creating A New Directory 387 ------------------------ 388 389 ``POST /uri?t=mkdir`` 390 391 ``PUT /uri?t=mkdir`` 392 393 Create a new empty directory and return its write-cap as the HTTP response 394 body. This does not make the newly created directory visible from the 395 filesystem. The "PUT" operation is provided for backwards compatibility: 396 new code should use POST. 397 398 ``POST /uri?t=mkdir-with-children`` 399 400 Create a new directory, populated with a set of child nodes, and return its 401 write-cap as the HTTP response body. The new directory is not attached to 402 any other directory: the returned write-cap is the only reference to it. 403 404 Initial children are provided as the body of the POST form (this is more 405 efficient than doing separate mkdir and set_children operations). If the 406 body is empty, the new directory will be empty. If not empty, the body will 407 be interpreted as a UTF-8 JSON-encoded dictionary of children with which the 408 new directory should be populated, using the same format as would be 409 returned in the 'children' value of the t=json GET request, described below. 410 Each dictionary key should be a child name, and each value should be a list 411 of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and 412 "metadata" keys (all others are ignored). For example, the PUT request body 413 could be:: 414 415 { 416 "Fran\u00e7ais": [ "filenode", { 417 "ro_uri": "URI:CHK:...", 418 "size": bytes, 419 "metadata": { 420 "ctime": 1202777696.7564139, 421 "mtime": 1202777696.7564139, 422 "tahoe": { 423 "linkcrtime": 1202777696.7564139, 424 "linkmotime": 1202777696.7564139 425 } } } ], 426 "subdir": [ "dirnode", { 427 "rw_uri": "URI:DIR2:...", 428 "ro_uri": "URI:DIR2-RO:...", 429 "metadata": { 430 "ctime": 1202778102.7589991, 431 "mtime": 1202778111.2160511, 432 "tahoe": { 433 "linkcrtime": 1202777696.7564139, 434 "linkmotime": 1202777696.7564139 435 } } } ] 436 } 437 438 For forward-compatibility, a mutable directory can also contain caps in 439 a format that is unknown to the webapi server. When such caps are retrieved 440 from a mutable directory in a "ro_uri" field, they will be prefixed with 441 the string "ro.", indicating that they must not be decoded without 442 checking that they are read-only. The "ro." prefix must not be stripped 443 off without performing this check. (Future versions of the webapi server 444 will perform it where necessary.) 445 446 If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT, 447 and the webapi server recognizes the rw_uri as a write cap, then it will 448 reset the ro_uri to the corresponding read cap and discard the original 449 contents of ro_uri (in order to ensure that the two caps correspond to the 450 same object and that the ro_uri is in fact read-only). However this may not 451 happen for caps in a format unknown to the webapi server. Therefore, when 452 writing a directory the webapi client should ensure that the contents 453 of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent 454 (write cap, read cap) pair if possible. If the webapi client only has 455 one cap and does not know whether it is a write cap or read cap, then 456 it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The 457 client must not put a write cap into a "ro_uri" field. 458 459 The metadata may have a "no-write" field. If this is set to true in the 460 metadata of a link, it will not be possible to open that link for writing 461 via the SFTP frontend; see `FTP-and-SFTP.rst`_ for details. 462 Also, if the "no-write" field is set to true in the metadata of a link to 463 a mutable child, it will cause the link to be diminished to read-only. 464 465 .. _FTP-and-SFTP.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/frontents/FTP-and-SFTP.rst 466 467 Note that the webapi-using client application must not provide the 468 "Content-Type: multipart/form-data" header that usually accompanies HTML 469 form submissions, since the body is not formatted this way. Doing so will 470 cause a server error as the lower-level code misparses the request body. 471 472 Child file names should each be expressed as a unicode string, then used as 473 keys of the dictionary. The dictionary should then be converted into JSON, 474 and the resulting string encoded into UTF-8. This UTF-8 bytestring should 475 then be used as the POST body. 476 477 ``POST /uri?t=mkdir-immutable`` 478 479 Like t=mkdir-with-children above, but the new directory will be 480 deep-immutable. This means that the directory itself is immutable, and that 481 it can only contain objects that are treated as being deep-immutable, like 482 immutable files, literal files, and deep-immutable directories. 483 484 For forward-compatibility, a deep-immutable directory can also contain caps 485 in a format that is unknown to the webapi server. When such caps are retrieved 486 from a deep-immutable directory in a "ro_uri" field, they will be prefixed 487 with the string "imm.", indicating that they must not be decoded without 488 checking that they are immutable. The "imm." prefix must not be stripped 489 off without performing this check. (Future versions of the webapi server 490 will perform it where necessary.) 491 492 The cap for each child may be given either in the "rw_uri" or "ro_uri" 493 field of the PROPDICT (not both). If a cap is given in the "rw_uri" field, 494 then the webapi server will check that it is an immutable read-cap of a 495 *known* format, and give an error if it is not. If a cap is given in the 496 "ro_uri" field, then the webapi server will still check whether known 497 caps are immutable, but for unknown caps it will simply assume that the 498 cap can be stored, as described above. Note that an attacker would be 499 able to store any cap in an immutable directory, so this check when 500 creating the directory is only to help non-malicious clients to avoid 501 accidentally giving away more authority than intended. 502 503 A non-empty request body is mandatory, since after the directory is created, 504 it will not be possible to add more children to it. 505 506 ``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir`` 507 508 ``PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir`` 509 510 Create new directories as necessary to make sure that the named target 511 ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional 512 intermediate mutable directories as necessary. If the named target directory 513 already exists, this will make no changes to it. 514 515 If the final directory is created, it will be empty. 516 517 This operation will return an error if a blocking file is present at any of 518 the parent names, preventing the server from creating the necessary parent 519 directory; or if it would require changing an immutable directory. 520 521 The write-cap of the new directory will be returned as the HTTP response 522 body. 523 524 ``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children`` 525 526 Like /uri?t=mkdir-with-children, but the final directory is created as a 527 child of an existing mutable directory. This will create additional 528 intermediate mutable directories as necessary. If the final directory is 529 created, it will be populated with initial children from the POST request 530 body, as described above. 531 532 This operation will return an error if a blocking file is present at any of 533 the parent names, preventing the server from creating the necessary parent 534 directory; or if it would require changing an immutable directory; or if 535 the immediate parent directory already has a a child named SUBDIR. 536 537 ``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable`` 538 539 Like /uri?t=mkdir-immutable, but the final directory is created as a child 540 of an existing mutable directory. The final directory will be deep-immutable, 541 and will be populated with the children specified as a JSON dictionary in 542 the POST request body. 543 544 In Tahoe 1.6 this operation creates intermediate mutable directories if 545 necessary, but that behaviour should not be relied on; see ticket #920. 546 547 This operation will return an error if the parent directory is immutable, 548 or already has a child named SUBDIR. 549 550 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME`` 551 552 Create a new empty mutable directory and attach it to the given existing 553 directory. This will create additional intermediate directories as necessary. 554 555 This operation will return an error if a blocking file is present at any of 556 the parent names, preventing the server from creating the necessary parent 557 directory, or if it would require changing any immutable directory. 558 559 The URL of this operation points to the parent of the bottommost new directory, 560 whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL 561 that points directly to the bottommost new directory. 562 563 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME`` 564 565 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will 566 be populated with initial children via the POST request body. This command 567 will create additional intermediate mutable directories as necessary. 568 569 This operation will return an error if a blocking file is present at any of 570 the parent names, preventing the server from creating the necessary parent 571 directory; or if it would require changing an immutable directory; or if 572 the immediate parent directory already has a a child named NAME. 573 574 Note that the name= argument must be passed as a queryarg, because the POST 575 request body is used for the initial children JSON. 576 577 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME`` 578 579 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the 580 final directory will be deep-immutable. The children are specified as a 581 JSON dictionary in the POST request body. Again, the name= argument must be 582 passed as a queryarg. 583 584 In Tahoe 1.6 this operation creates intermediate mutable directories if 585 necessary, but that behaviour should not be relied on; see ticket #920. 586 587 This operation will return an error if the parent directory is immutable, 588 or already has a child named NAME. 589 590 Get Information About A File Or Directory (as JSON) 591 --------------------------------------------------- 592 593 ``GET /uri/$FILECAP?t=json`` 594 595 ``GET /uri/$DIRCAP?t=json`` 596 597 ``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json`` 598 599 ``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json`` 600 601 This returns a machine-parseable JSON-encoded description of the given 602 object. The JSON always contains a list, and the first element of the list is 603 always a flag that indicates whether the referenced object is a file or a 604 directory. If it is a capability to a file, then the information includes 605 file size and URI, like this:: 606 607 GET /uri/$FILECAP?t=json : 608 609 [ "filenode", { 610 "ro_uri": file_uri, 611 "verify_uri": verify_uri, 612 "size": bytes, 613 "mutable": false 614 } ] 615 616 If it is a capability to a directory followed by a path from that directory 617 to a file, then the information also includes metadata from the link to the 618 file in the parent directory, like this:: 619 620 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json 621 622 [ "filenode", { 623 "ro_uri": file_uri, 624 "verify_uri": verify_uri, 625 "size": bytes, 626 "mutable": false, 627 "metadata": { 628 "ctime": 1202777696.7564139, 629 "mtime": 1202777696.7564139, 630 "tahoe": { 631 "linkcrtime": 1202777696.7564139, 632 "linkmotime": 1202777696.7564139 633 } } } ] 634 635 If it is a directory, then it includes information about the children of 636 this directory, as a mapping from child name to a set of data about the 637 child (the same data that would appear in a corresponding GET?t=json of the 638 child itself). The child entries also include metadata about each child, 639 including link-creation- and link-change- timestamps. The output looks like 640 this:: 641 642 GET /uri/$DIRCAP?t=json : 643 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json : 644 645 [ "dirnode", { 646 "rw_uri": read_write_uri, 647 "ro_uri": read_only_uri, 648 "verify_uri": verify_uri, 649 "mutable": true, 650 "children": { 651 "foo.txt": [ "filenode", { 652 "ro_uri": uri, 653 "size": bytes, 654 "metadata": { 655 "ctime": 1202777696.7564139, 656 "mtime": 1202777696.7564139, 657 "tahoe": { 658 "linkcrtime": 1202777696.7564139, 659 "linkmotime": 1202777696.7564139 660 } } } ], 661 "subdir": [ "dirnode", { 662 "rw_uri": rwuri, 663 "ro_uri": rouri, 664 "metadata": { 665 "ctime": 1202778102.7589991, 666 "mtime": 1202778111.2160511, 667 "tahoe": { 668 "linkcrtime": 1202777696.7564139, 669 "linkmotime": 1202777696.7564139 670 } } } ] 671 } } ] 672 673 In the above example, note how 'children' is a dictionary in which the keys 674 are child names and the values depend upon whether the child is a file or a 675 directory. The value is mostly the same as the JSON representation of the 676 child object (except that directories do not recurse -- the "children" 677 entry of the child is omitted, and the directory view includes the metadata 678 that is stored on the directory edge). 679 680 The rw_uri field will be present in the information about a directory 681 if and only if you have read-write access to that directory. The verify_uri 682 field will be present if and only if the object has a verify-cap 683 (non-distributed LIT files do not have verify-caps). 684 685 If the cap is of an unknown format, then the file size and verify_uri will 686 not be available:: 687 688 GET /uri/$UNKNOWNCAP?t=json : 689 690 [ "unknown", { 691 "ro_uri": unknown_read_uri 692 } ] 693 694 GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json : 695 696 [ "unknown", { 697 "rw_uri": unknown_write_uri, 698 "ro_uri": unknown_read_uri, 699 "mutable": true, 700 "metadata": { 701 "ctime": 1202777696.7564139, 702 "mtime": 1202777696.7564139, 703 "tahoe": { 704 "linkcrtime": 1202777696.7564139, 705 "linkmotime": 1202777696.7564139 706 } } } ] 707 708 As in the case of file nodes, the metadata will only be present when the 709 capability is to a directory followed by a path. The "mutable" field is also 710 not always present; when it is absent, the mutability of the object is not 711 known. 712 713 About the metadata 714 `````````````````` 715 716 The value of the 'tahoe':'linkmotime' key is updated whenever a link to a 717 child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever 718 a link to a child is created -- i.e. when there was not previously a link 719 under that name. 720 721 Note however, that if the edge in the Tahoe filesystem points to a mutable 722 file and the contents of that mutable file is changed, then the 723 'tahoe':'linkmotime' value on that edge will *not* be updated, since the 724 edge itself wasn't updated -- only the mutable file was. 725 726 The timestamps are represented as a number of seconds since the UNIX epoch 727 (1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long 728 term. 729 730 In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated 731 instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting 732 in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict 733 are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe' 734 sub-dict to be deleted by webapi requests in which new metadata is 735 specified, and not to be added to existing child links that lack it. 736 737 From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer 738 populated or updated (see ticket #924), except by "tahoe backup" as 739 explained below. For backward compatibility, when an existing link is 740 updated and 'tahoe':'linkcrtime' is not present in the previous metadata 741 but 'ctime' is, the old value of 'ctime' is used as the new value of 742 'tahoe':'linkcrtime'. 743 744 The reason we added the new fields in Tahoe v1.4.0 is that there is a 745 "set_children" API (described below) which you can use to overwrite the 746 values of the 'mtime'/'ctime' pair, and this API is used by the 747 "tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and 748 'ctime' values when backing up files from a local filesystem into the 749 Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used 750 to set anything under the 'tahoe' key of the metadata dict -- if you 751 include 'tahoe' keys in your 'metadata' arguments then it will silently 752 ignore those keys. 753 754 Therefore, if the 'tahoe' sub-dict is present, you can rely on the 755 'linkcrtime' and 'linkmotime' values therein to have the semantics described 756 above. (This is assuming that only official Tahoe clients have been used to 757 write those links, and that their system clocks were set to what you expected 758 -- there is nothing preventing someone from editing their Tahoe client or 759 writing their own Tahoe client which would overwrite those values however 760 they like, and there is nothing to constrain their system clock from taking 761 any value.) 762 763 When an edge is created or updated by "tahoe backup", the 'mtime' and 764 'ctime' keys on that edge are set as follows: 765 766 * 'mtime' is set to the timestamp read from the local filesystem for the 767 "mtime" of the local file in question, which means the last time the 768 contents of that file were changed. 769 770 * On Windows, 'ctime' is set to the creation timestamp for the file 771 read from the local filesystem. On other platforms, 'ctime' is set to 772 the UNIX "ctime" of the local file, which means the last time that 773 either the contents or the metadata of the local file was changed. 774 775 There are several ways that the 'ctime' field could be confusing: 776 777 1. You might be confused about whether it reflects the time of the creation 778 of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a 779 timestamp copied in by "tahoe backup" from a local filesystem. 780 781 2. You might be confused about whether it is a copy of the file creation 782 time (if "tahoe backup" was run on a Windows system) or of the last 783 contents-or-metadata change (if "tahoe backup" was run on a different 784 operating system). 785 786 3. You might be confused by the fact that changing the contents of a 787 mutable file in Tahoe doesn't have any effect on any links pointing at 788 that file in any directories, although "tahoe backup" sets the link 789 'ctime'/'mtime' to reflect timestamps about the local file corresponding 790 to the Tahoe file to which the link points. 791 792 4. Also, quite apart from Tahoe, you might be confused about the meaning 793 of the "ctime" in UNIX local filesystems, which people sometimes think 794 means file creation time, but which actually means, in UNIX local 795 filesystems, the most recent time that the file contents or the file 796 metadata (such as owner, permission bits, extended attributes, etc.) 797 has changed. Note that although "ctime" does not mean file creation time 798 in UNIX, links created by a version of Tahoe prior to v1.7.0, and never 799 written by "tahoe backup", will have 'ctime' set to the link creation 800 time. 801 802 803 Attaching an existing File or Directory by its read- or write-cap 804 ----------------------------------------------------------------- 805 806 ``PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri`` 807 808 This attaches a child object (either a file or directory) to a specified 809 location in the virtual filesystem. The child object is referenced by its 810 read- or write- cap, as provided in the HTTP request body. This will create 811 intermediate directories as necessary. 812 813 This is similar to a UNIX hardlink: by referencing a previously-uploaded file 814 (or previously-created directory) instead of uploading/creating a new one, 815 you can create two references to the same object. 816 817 The read- or write- cap of the child is provided in the body of the HTTP 818 request, and this same cap is returned in the response body. 819 820 The default behavior is to overwrite any existing object at the same 821 location. To prevent this (and make the operation return an error instead 822 of overwriting), add a "replace=false" argument, as "?t=uri&replace=false". 823 With replace=false, this operation will return an HTTP 409 "Conflict" error 824 if there is already an object at the given location, rather than 825 overwriting the existing object. To allow the operation to overwrite a 826 file, but return an error when trying to overwrite a directory, use 827 "replace=only-files" (this behavior is closer to the traditional UNIX "mv" 828 command). Note that "true", "t", and "1" are all synonyms for "True", and 829 "false", "f", and "0" are synonyms for "False", and the parameter is 830 case-insensitive. 831 832 Note that this operation does not take its child cap in the form of 833 separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a 834 child cap in a format unknown to the webapi server, unless its URI 835 starts with "ro." or "imm.". This restriction is necessary because the 836 server is not able to attenuate an unknown write cap to a read cap. 837 Unknown URIs starting with "ro." or "imm.", on the other hand, are 838 assumed to represent read caps. The client should not prefix a write 839 cap with "ro." or "imm." and pass it to this operation, since that 840 would result in granting the cap's write authority to holders of the 841 directory read cap. 842 843 Adding multiple files or directories to a parent directory at once 844 ------------------------------------------------------------------ 845 846 ``POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children`` 847 848 ``POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children`` (Tahoe >= v1.6) 849 850 This command adds multiple children to a directory in a single operation. 851 It reads the request body and interprets it as a JSON-encoded description 852 of the child names and read/write-caps that should be added. 853 854 The body should be a JSON-encoded dictionary, in the same format as the 855 "children" value returned by the "GET /uri/$DIRCAP?t=json" operation 856 described above. In this format, each key is a child names, and the 857 corresponding value is a tuple of (type, childinfo). "type" is ignored, and 858 "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and 859 "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and 860 use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2 861 look very much like DIR1 (except for any existing children of DIR2 that 862 were not overwritten, and any existing "tahoe" metadata keys as described 863 below). 864 865 When the set_children request contains a child name that already exists in 866 the target directory, this command defaults to overwriting that child with 867 the new value (both child cap and metadata, but if the JSON data does not 868 contain a "metadata" key, the old child's metadata is preserved). The 869 command takes a boolean "overwrite=" query argument to control this 870 behavior. If you use "?t=set_children&overwrite=false", then an attempt to 871 replace an existing child will instead cause an error. 872 873 Any "tahoe" key in the new child's "metadata" value is ignored. Any 874 existing "tahoe" metadata is preserved. The metadata["tahoe"] value is 875 reserved for metadata generated by the tahoe node itself. The only two keys 876 currently placed here are "linkcrtime" and "linkmotime". For details, see 877 the section above entitled "Get Information About A File Or Directory (as 878 JSON)", in the "About the metadata" subsection. 879 880 Note that this command was introduced with the name "set_children", which 881 uses an underscore rather than a hyphen as other multi-word command names 882 do. The variant with a hyphen is now accepted, but clients that desire 883 backward compatibility should continue to use "set_children". 884 885 886 Deleting a File or Directory 887 ---------------------------- 888 889 ``DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME`` 890 891 This removes the given name from its parent directory. CHILDNAME is the 892 name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will 893 be modified. 894 895 Note that this does not actually delete the file or directory that the name 896 points to from the tahoe grid -- it only removes the named reference from 897 this directory. If there are other names in this directory or in other 898 directories that point to the resource, then it will remain accessible 899 through those paths. Even if all names pointing to this object are removed 900 from their parent directories, then someone with possession of its read-cap 901 can continue to access the object through that cap. 902 903 The object will only become completely unreachable once 1: there are no 904 reachable directories that reference it, and 2: nobody is holding a read- 905 or write- cap to the object. (This behavior is very similar to the way 906 hardlinks and anonymous files work in traditional UNIX filesystems). 907 908 This operation will not modify more than a single directory. Intermediate 909 directories which were implicitly created by PUT or POST methods will *not* 910 be automatically removed by DELETE. 911 912 This method returns the file- or directory- cap of the object that was just 913 removed. 914 915 Browser Operations: Human-oriented interfaces 916 ============================================= 917 918 This section describes the HTTP operations that provide support for humans 919 running a web browser. Most of these operations use HTML forms that use POST 920 to drive the Tahoe node. This section is intended for HTML authors who want 921 to write web pages that contain forms and buttons which manipulate the Tahoe 922 filesystem. 923 924 Note that for all POST operations, the arguments listed can be provided 925 either as URL query arguments or as form body fields. URL query arguments are 926 separated from the main URL by "?", and from each other by "&". For example, 927 "POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually 928 specified by using <input type="hidden"> elements. For clarity, the 929 descriptions below display the most significant arguments as URL query args. 930 931 Viewing A Directory (as HTML) 932 ----------------------------- 933 934 ``GET /uri/$DIRCAP/[SUBDIRS../]`` 935 936 This returns an HTML page, intended to be displayed to a human by a web 937 browser, which contains HREF links to all files and directories reachable 938 from this directory. These HREF links do not have a t= argument, meaning 939 that a human who follows them will get pages also meant for a human. It also 940 contains forms to upload new files, and to delete files and directories. 941 Those forms use POST methods to do their job. 942 943 Viewing/Downloading a File 944 -------------------------- 945 946 ``GET /uri/$FILECAP`` 947 948 ``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME`` 949 950 This will retrieve the contents of the given file. The HTTP response body 951 will contain the sequence of bytes that make up the file. 952 953 If you want the HTTP response to include a useful Content-Type header, 954 either use the second form (which starts with a $DIRCAP), or add a 955 "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg". 956 The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information 957 to determine a Content-Type (since Tahoe immutable files are merely 958 sequences of bytes, not typed+named file objects). 959 960 If the URL has both filename= and "save=true" in the query arguments, then 961 the server to add a "Content-Disposition: attachment" header, along with a 962 filename= parameter. When a user clicks on such a link, most browsers will 963 offer to let the user save the file instead of displaying it inline (indeed, 964 most browsers will refuse to display it inline). "true", "t", "1", and other 965 case-insensitive equivalents are all treated the same. 966 967 Character-set handling in URLs and HTTP headers is a dubious art [1]_. For 968 maximum compatibility, Tahoe simply copies the bytes from the filename= 969 argument into the Content-Disposition header's filename= parameter, without 970 trying to interpret them in any particular way. 971 972 973 ``GET /named/$FILECAP/FILENAME`` 974 975 This is an alternate download form which makes it easier to get the correct 976 filename. The Tahoe server will provide the contents of the given file, with 977 a Content-Type header derived from the given filename. This form is used to 978 get browsers to use the "Save Link As" feature correctly, and also helps 979 command-line tools like "wget" and "curl" use the right filename. Note that 980 this form can *only* be used with file caps; it is an error to use a 981 directory cap after the /named/ prefix. 982 983 Get Information About A File Or Directory (as HTML) 984 --------------------------------------------------- 985 986 ``GET /uri/$FILECAP?t=info`` 987 988 ``GET /uri/$DIRCAP/?t=info`` 989 990 ``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info`` 991 992 ``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info`` 993 994 This returns a human-oriented HTML page with more detail about the selected 995 file or directory object. This page contains the following items: 996 997 * object size 998 * storage index 999 * JSON representation 1000 * raw contents (text/plain) 1001 * access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects) 1002 * check/verify/repair form 1003 * deep-check/deep-size/deep-stats/manifest (for directories) 1004 * replace-conents form (for mutable files) 1005 1006 Creating a Directory 1007 -------------------- 1008 1009 ``POST /uri?t=mkdir`` 1010 1011 This creates a new empty directory, but does not attach it to the virtual 1012 filesystem. 1013 1014 If a "redirect_to_result=true" argument is provided, then the HTTP response 1015 will cause the web browser to be redirected to a /uri/$DIRCAP page that 1016 gives access to the newly-created directory. If you bookmark this page, 1017 you'll be able to get back to the directory again in the future. This is the 1018 recommended way to start working with a Tahoe server: create a new unlinked 1019 directory (using redirect_to_result=true), then bookmark the resulting 1020 /uri/$DIRCAP page. There is a "create directory" button on the Welcome page 1021 to invoke this action. 1022 1023 If "redirect_to_result=true" is not provided (or is given a value of 1024 "false"), then the HTTP response body will simply be the write-cap of the 1025 new directory. 1026 1027 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME`` 1028 1029 This creates a new empty directory as a child of the designated SUBDIR. This 1030 will create additional intermediate directories as necessary. 1031 1032 If a "when_done=URL" argument is provided, the HTTP response will cause the 1033 web browser to redirect to the given URL. This provides a convenient way to 1034 return the browser to the directory that was just modified. Without a 1035 when_done= argument, the HTTP response will simply contain the write-cap of 1036 the directory that was just created. 1037 1038 1039 Uploading a File 1040 ---------------- 1041 1042 ``POST /uri?t=upload`` 1043 1044 This uploads a file, and produces a file-cap for the contents, but does not 1045 attach the file into the filesystem. No directories will be modified by 1046 this operation. 1047 1048 The file must be provided as the "file" field of an HTML encoded form body, 1049 produced in response to an HTML form like this:: 1050 1051 <form action="/uri" method="POST" enctype="multipart/form-data"> 1052 <input type="hidden" name="t" value="upload" /> 1053 <input type="file" name="file" /> 1054 <input type="submit" value="Upload Unlinked" /> 1055 </form> 1056 1057 If a "when_done=URL" argument is provided, the response body will cause the 1058 browser to redirect to the given URL. If the when_done= URL has the string 1059 "%(uri)s" in it, that string will be replaced by a URL-escaped form of the 1060 newly created file-cap. (Note that without this substitution, there is no 1061 way to access the file that was just uploaded). 1062 1063 The default (in the absence of when_done=) is to return an HTML page that 1064 describes the results of the upload. This page will contain information 1065 about which storage servers were used for the upload, how long each 1066 operation took, etc. 1067 1068 If a "mutable=true" argument is provided, the operation will create a 1069 mutable file, and the response body will contain the write-cap instead of 1070 the upload results page. The default is to create an immutable file, 1071 returning the upload results page as a response. 1072 1073 1074 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload`` 1075 1076 This uploads a file, and attaches it as a new child of the given directory, 1077 which must be mutable. The file must be provided as the "file" field of an 1078 HTML-encoded form body, produced in response to an HTML form like this:: 1079 1080 <form action="." method="POST" enctype="multipart/form-data"> 1081 <input type="hidden" name="t" value="upload" /> 1082 <input type="file" name="file" /> 1083 <input type="submit" value="Upload" /> 1084 </form> 1085 1086 A "name=" argument can be provided to specify the new child's name, 1087 otherwise it will be taken from the "filename" field of the upload form 1088 (most web browsers will copy the last component of the original file's 1089 pathname into this field). To avoid confusion, name= is not allowed to 1090 contain a slash. 1091 1092 If there is already a child with that name, and it is a mutable file, then 1093 its contents are replaced with the data being uploaded. If it is not a 1094 mutable file, the default behavior is to remove the existing child before 1095 creating a new one. To prevent this (and make the operation return an error 1096 instead of overwriting the old child), add a "replace=false" argument, as 1097 "?t=upload&replace=false". With replace=false, this operation will return an 1098 HTTP 409 "Conflict" error if there is already an object at the given 1099 location, rather than overwriting the existing object. Note that "true", 1100 "t", and "1" are all synonyms for "True", and "false", "f", and "0" are 1101 synonyms for "False". the parameter is case-insensitive. 1102 1103 This will create additional intermediate directories as necessary, although 1104 since it is expected to be triggered by a form that was retrieved by "GET 1105 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will 1106 already exist. 1107 1108 If a "mutable=true" argument is provided, any new file that is created will 1109 be a mutable file instead of an immutable one. <input type="checkbox" 1110 name="mutable" /> will give the user a way to set this option. 1111 1112 If a "when_done=URL" argument is provided, the HTTP response will cause the 1113 web browser to redirect to the given URL. This provides a convenient way to 1114 return the browser to the directory that was just modified. Without a 1115 when_done= argument, the HTTP response will simply contain the file-cap of 1116 the file that was just uploaded (a write-cap for mutable files, or a 1117 read-cap for immutable files). 1118 1119 ``POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload`` 1120 1121 This also uploads a file and attaches it as a new child of the given 1122 directory, which must be mutable. It is a slight variant of the previous 1123 operation, as the URL refers to the target file rather than the parent 1124 directory. It is otherwise identical: this accepts mutable= and when_done= 1125 arguments too. 1126 1127 ``POST /uri/$FILECAP?t=upload`` 1128 1129 This modifies the contents of an existing mutable file in-place. An error is 1130 signalled if $FILECAP does not refer to a mutable file. It behaves just like 1131 the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms 1132 in a web browser. 1133 1134 Attaching An Existing File Or Directory (by URI) 1135 ------------------------------------------------ 1136 1137 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP`` 1138 1139 This attaches a given read- or write- cap "CHILDCAP" to the designated 1140 directory, with a specified child name. This behaves much like the PUT t=uri 1141 operation, and is a lot like a UNIX hardlink. It is subject to the same 1142 restrictions as that operation on the use of cap formats unknown to the 1143 webapi server. 1144 1145 This will create additional intermediate directories as necessary, although 1146 since it is expected to be triggered by a form that was retrieved by "GET 1147 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will 1148 already exist. 1149 1150 This accepts the same replace= argument as POST t=upload. 1151 1152 Deleting A Child 1153 ---------------- 1154 1155 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME`` 1156 1157 This instructs the node to remove a child object (file or subdirectory) from 1158 the given directory, which must be mutable. Note that the entire subtree is 1159 unlinked from the parent. Unlike deleting a subdirectory in a UNIX local 1160 filesystem, the subtree need not be empty; if it isn't, then other references 1161 into the subtree will see that the child subdirectories are not modified by 1162 this operation. Only the link from the given directory to its child is severed. 1163 1164 Renaming A Child 1165 ---------------- 1166 1167 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW`` 1168 1169 This instructs the node to rename a child of the given directory, which must 1170 be mutable. This has a similar effect to removing the child, then adding the 1171 same child-cap under the new name, except that it preserves metadata. This 1172 operation cannot move the child to a different directory. 1173 1174 This operation will replace any existing child of the new name, making it 1175 behave like the UNIX "``mv -f``" command. 1176 1177 Other Utilities 1178 --------------- 1179 1180 ``GET /uri?uri=$CAP`` 1181 1182 This causes a redirect to /uri/$CAP, and retains any additional query 1183 arguments (like filename= or save=). This is for the convenience of web 1184 forms which allow the user to paste in a read- or write- cap (obtained 1185 through some out-of-band channel, like IM or email). 1186 1187 Note that this form merely redirects to the specific file or directory 1188 indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot 1189 traverse to children by appending additional path segments to the URL. 1190 1191 ``GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME`` 1192 1193 This provides a useful facility to browser-based user interfaces. It 1194 returns a page containing a form targetting the "POST $DIRCAP t=rename" 1195 functionality described above, with the provided $CHILDNAME present in the 1196 'from_name' field of that form. I.e. this presents a form offering to 1197 rename $CHILDNAME, requesting the new name, and submitting POST rename. 1198 1199 ``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri`` 1200 1201 This returns the file- or directory- cap for the specified object. 1202 1203 ``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri`` 1204 1205 This returns a read-only file- or directory- cap for the specified object. 1206 If the object is an immutable file, this will return the same value as 1207 t=uri. 1208 1209 Debugging and Testing Features 1210 ------------------------------ 1211 1212 These URLs are less-likely to be helpful to the casual Tahoe user, and are 1213 mainly intended for developers. 1214 1215 ``POST $URL?t=check`` 1216 1217 This triggers the FileChecker to determine the current "health" of the 1218 given file or directory, by counting how many shares are available. The 1219 page that is returned will display the results. This can be used as a "show 1220 me detailed information about this file" page. 1221 1222 If a verify=true argument is provided, the node will perform a more 1223 intensive check, downloading and verifying every single bit of every share. 1224 1225 If an add-lease=true argument is provided, the node will also add (or 1226 renew) a lease to every share it encounters. Each lease will keep the share 1227 alive for a certain period of time (one month by default). Once the last 1228 lease expires or is explicitly cancelled, the storage server is allowed to 1229 delete the share. 1230 1231 If an output=JSON argument is provided, the response will be 1232 machine-readable JSON instead of human-oriented HTML. The data is a 1233 dictionary with the following keys:: 1234 1235 storage-index: a base32-encoded string with the objects's storage index, 1236 or an empty string for LIT files 1237 summary: a string, with a one-line summary of the stats of the file 1238 results: a dictionary that describes the state of the file. For LIT files, 1239 this dictionary has only the 'healthy' key, which will always be 1240 True. For distributed files, this dictionary has the following 1241 keys: 1242 count-shares-good: the number of good shares that were found 1243 count-shares-needed: 'k', the number of shares required for recovery 1244 count-shares-expected: 'N', the number of total shares generated 1245 count-good-share-hosts: this was intended to be the number of distinct 1246 storage servers with good shares. It is currently 1247 (as of Tahoe-LAFS v1.8.0) computed incorrectly; 1248 see ticket #1115. 1249 count-wrong-shares: for mutable files, the number of shares for 1250 versions other than the 'best' one (highest 1251 sequence number, highest roothash). These are 1252 either old ... 1253 count-recoverable-versions: for mutable files, the number of 1254 recoverable versions of the file. For 1255 a healthy file, this will equal 1. 1256 count-unrecoverable-versions: for mutable files, the number of 1257 unrecoverable versions of the file. 1258 For a healthy file, this will be 0. 1259 count-corrupt-shares: the number of shares with integrity failures 1260 list-corrupt-shares: a list of "share locators", one for each share 1261 that was found to be corrupt. Each share locator 1262 is a list of (serverid, storage_index, sharenum). 1263 needs-rebalancing: (bool) True if there are multiple shares on a single 1264 storage server, indicating a reduction in reliability 1265 that could be resolved by moving shares to new 1266 servers. 1267 servers-responding: list of base32-encoded storage server identifiers, 1268 one for each server which responded to the share 1269 query. 1270 healthy: (bool) True if the file is completely healthy, False otherwise. 1271 Healthy files have at least N good shares. Overlapping shares 1272 do not currently cause a file to be marked unhealthy. If there 1273 are at least N good shares, then corrupt shares do not cause the 1274 file to be marked unhealthy, although the corrupt shares will be 1275 listed in the results (list-corrupt-shares) and should be manually 1276 removed to wasting time in subsequent downloads (as the 1277 downloader rediscovers the corruption and uses alternate shares). 1278 Future compatibility: the meaning of this field may change to 1279 reflect whether the servers-of-happiness criterion is met 1280 (see ticket #614). 1281 sharemap: dict mapping share identifier to list of serverids 1282 (base32-encoded strings). This indicates which servers are 1283 holding which shares. For immutable files, the shareid is 1284 an integer (the share number, from 0 to N-1). For 1285 immutable files, it is a string of the form 1286 'seq%d-%s-sh%d', containing the sequence number, the 1287 roothash, and the share number. 1288 1289 ``POST $URL?t=start-deep-check`` (must add &ophandle=XYZ) 1290 1291 This initiates a recursive walk of all files and directories reachable from 1292 the target, performing a check on each one just like t=check. The result 1293 page will contain a summary of the results, including details on any 1294 file/directory that was not fully healthy. 1295 1296 t=start-deep-check can only be invoked on a directory. An error (400 1297 BAD_REQUEST) will be signalled if it is invoked on a file. The recursive 1298 walker will deal with loops safely. 1299 1300 This accepts the same verify= and add-lease= arguments as t=check. 1301 1302 Since this operation can take a long time (perhaps a second per object), 1303 the ophandle= argument is required (see "Slow Operations, Progress, and 1304 Cancelling" above). The response to this POST will be a redirect to the 1305 corresponding /operations/$HANDLE page (with output=HTML or output=JSON to 1306 match the output= argument given to the POST). The deep-check operation 1307 will continue to run in the background, and the /operations page should be 1308 used to find out when the operation is done. 1309 1310 Detailed check results for non-healthy files and directories will be 1311 available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will 1312 contain links to these detailed results. 1313 1314 The HTML /operations/$HANDLE page for incomplete operations will contain a 1315 meta-refresh tag, set to 60 seconds, so that a browser which uses 1316 deep-check will automatically poll until the operation has completed. 1317 1318 The JSON page (/options/$HANDLE?output=JSON) will contain a 1319 machine-readable JSON dictionary with the following keys:: 1320 1321 finished: a boolean, True if the operation is complete, else False. Some 1322 of the remaining keys may not be present until the operation 1323 is complete. 1324 root-storage-index: a base32-encoded string with the storage index of the 1325 starting point of the deep-check operation 1326 count-objects-checked: count of how many objects were checked. Note that 1327 non-distributed objects (i.e. small immutable LIT 1328 files) are not checked, since for these objects, 1329 the data is contained entirely in the URI. 1330 count-objects-healthy: how many of those objects were completely healthy 1331 count-objects-unhealthy: how many were damaged in some way 1332 count-corrupt-shares: how many shares were found to have corruption, 1333 summed over all objects examined 1334 list-corrupt-shares: a list of "share identifiers", one for each share 1335 that was found to be corrupt. Each share identifier 1336 is a list of (serverid, storage_index, sharenum). 1337 list-unhealthy-files: a list of (pathname, check-results) tuples, for 1338 each file that was not fully healthy. 'pathname' is 1339 a list of strings (which can be joined by "/" 1340 characters to turn it into a single string), 1341 relative to the directory on which deep-check was 1342 invoked. The 'check-results' field is the same as 1343 that returned by t=check&output=JSON, described 1344 above. 1345 stats: a dictionary with the same keys as the t=start-deep-stats command 1346 (described below) 1347 1348 ``POST $URL?t=stream-deep-check`` 1349 1350 This initiates a recursive walk of all files and directories reachable from 1351 the target, performing a check on each one just like t=check. For each 1352 unique object (duplicates are skipped), a single line of JSON is emitted to 1353 the HTTP response channel (or an error indication, see below). When the walk 1354 is complete, a final line of JSON is emitted which contains the accumulated 1355 file-size/count "deep-stats" data. 1356 1357 This command takes the same arguments as t=start-deep-check. 1358 1359 A CLI tool can split the response stream on newlines into "response units", 1360 and parse each response unit as JSON. Each such parsed unit will be a 1361 dictionary, and will contain at least the "type" key: a string, one of 1362 "file", "directory", or "stats". 1363 1364 For all units that have a type of "file" or "directory", the dictionary will 1365 contain the following keys:: 1366 1367 "path": a list of strings, with the path that is traversed to reach the 1368 object 1369 "cap": a write-cap URI for the file or directory, if available, else a 1370 read-cap URI 1371 "verifycap": a verify-cap URI for the file or directory 1372 "repaircap": an URI for the weakest cap that can still be used to repair 1373 the object 1374 "storage-index": a base32 storage index for the object 1375 "check-results": a copy of the dictionary which would be returned by 1376 t=check&output=json, with three top-level keys: 1377 "storage-index", "summary", and "results", and a variety 1378 of counts and sharemaps in the "results" value. 1379 1380 Note that non-distributed files (i.e. LIT files) will have values of None 1381 for verifycap, repaircap, and storage-index, since these files can neither 1382 be verified nor repaired, and are not stored on the storage servers. 1383 Likewise the check-results dictionary will be limited: an empty string for 1384 storage-index, and a results dictionary with only the "healthy" key. 1385 1386 The last unit in the stream will have a type of "stats", and will contain 1387 the keys described in the "start-deep-stats" operation, below. 1388 1389 If any errors occur during the traversal (specifically if a directory is 1390 unrecoverable, such that further traversal is not possible), an error 1391 indication is written to the response body, instead of the usual line of 1392 JSON. This error indication line will begin with the string "ERROR:" (in all 1393 caps), and contain a summary of the error on the rest of the line. The 1394 remaining lines of the response body will be a python exception. The client 1395 application should look for the ERROR: and stop processing JSON as soon as 1396 it is seen. Note that neither a file being unrecoverable nor a directory 1397 merely being unhealthy will cause traversal to stop. The line just before 1398 the ERROR: will describe the directory that was untraversable, since the 1399 unit is emitted to the HTTP response body before the child is traversed. 1400 1401 1402 ``POST $URL?t=check&repair=true`` 1403 1404 This performs a health check of the given file or directory, and if the 1405 checker determines that the object is not healthy (some shares are missing 1406 or corrupted), it will perform a "repair". During repair, any missing 1407 shares will be regenerated and uploaded to new servers. 1408 1409 This accepts the same verify=true and add-lease= arguments as t=check. When 1410 an output=JSON argument is provided, the machine-readable JSON response 1411 will contain the following keys:: 1412 1413 storage-index: a base32-encoded string with the objects's storage index, 1414 or an empty string for LIT files 1415 repair-attempted: (bool) True if repair was attempted 1416 repair-successful: (bool) True if repair was attempted and the file was 1417 fully healthy afterwards. False if no repair was 1418 attempted, or if a repair attempt failed. 1419 pre-repair-results: a dictionary that describes the state of the file 1420 before any repair was performed. This contains exactly 1421 the same keys as the 'results' value of the t=check 1422 response, described above. 1423 post-repair-results: a dictionary that describes the state of the file 1424 after any repair was performed. If no repair was 1425 performed, post-repair-results and pre-repair-results 1426 will be the same. This contains exactly the same keys 1427 as the 'results' value of the t=check response, 1428 described above. 1429 1430 ``POST $URL?t=start-deep-check&repair=true`` (must add &ophandle=XYZ) 1431 1432 This triggers a recursive walk of all files and directories, performing a 1433 t=check&repair=true on each one. 1434 1435 Like t=start-deep-check without the repair= argument, this can only be 1436 invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it 1437 is invoked on a file. The recursive walker will deal with loops safely. 1438 1439 This accepts the same verify= and add-lease= arguments as 1440 t=start-deep-check. It uses the same ophandle= mechanism as 1441 start-deep-check. When an output=JSON argument is provided, the response 1442 will contain the following keys:: 1443 1444 finished: (bool) True if the operation has completed, else False 1445 root-storage-index: a base32-encoded string with the storage index of the 1446 starting point of the deep-check operation 1447 count-objects-checked: count of how many objects were checked 1448 1449 count-objects-healthy-pre-repair: how many of those objects were completely 1450 healthy, before any repair 1451 count-objects-unhealthy-pre-repair: how many were damaged in some way 1452 count-objects-healthy-post-repair: how many of those objects were completely 1453 healthy, after any repair 1454 count-objects-unhealthy-post-repair: how many were damaged in some way 1455 1456 count-repairs-attempted: repairs were attempted on this many objects. 1457 count-repairs-successful: how many repairs resulted in healthy objects 1458 count-repairs-unsuccessful: how many repairs resulted did not results in 1459 completely healthy objects 1460 count-corrupt-shares-pre-repair: how many shares were found to have 1461 corruption, summed over all objects 1462 examined, before any repair 1463 count-corrupt-shares-post-repair: how many shares were found to have 1464 corruption, summed over all objects 1465 examined, after any repair 1466 list-corrupt-shares: a list of "share identifiers", one for each share 1467 that was found to be corrupt (before any repair). 1468 Each share identifier is a list of (serverid, 1469 storage_index, sharenum). 1470 list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares 1471 that were successfully repaired are not 1472 included. These are shares that need 1473 manual processing. Since immutable shares 1474 cannot be modified by clients, all corruption 1475 in immutable shares will be listed here. 1476 list-unhealthy-files: a list of (pathname, check-results) tuples, for 1477 each file that was not fully healthy. 'pathname' is 1478 relative to the directory on which deep-check was 1479 invoked. The 'check-results' field is the same as 1480 that returned by t=check&repair=true&output=JSON, 1481 described above. 1482 stats: a dictionary with the same keys as the t=start-deep-stats command 1483 (described below) 1484 1485 ``POST $URL?t=stream-deep-check&repair=true`` 1486 1487 This triggers a recursive walk of all files and directories, performing a 1488 t=check&repair=true on each one. For each unique object (duplicates are 1489 skipped), a single line of JSON is emitted to the HTTP response channel (or 1490 an error indication). When the walk is complete, a final line of JSON is 1491 emitted which contains the accumulated file-size/count "deep-stats" data. 1492 1493 This emits the same data as t=stream-deep-check (without the repair=true), 1494 except that the "check-results" field is replaced with a 1495 "check-and-repair-results" field, which contains the keys returned by 1496 t=check&repair=true&output=json (i.e. repair-attempted, repair-successful, 1497 pre-repair-results, and post-repair-results). The output does not contain 1498 the summary dictionary that is provied by t=start-deep-check&repair=true 1499 (the one with count-objects-checked and list-unhealthy-files), since the 1500 receiving client is expected to calculate those values itself from the 1501 stream of per-object check-and-repair-results. 1502 1503 Note that the "ERROR:" indication will only be emitted if traversal stops, 1504 which will only occur if an unrecoverable directory is encountered. If a 1505 file or directory repair fails, the traversal will continue, and the repair 1506 failure will be indicated in the JSON data (in the "repair-successful" key). 1507 1508 ``POST $DIRURL?t=start-manifest`` (must add &ophandle=XYZ) 1509 1510 This operation generates a "manfest" of the given directory tree, mostly 1511 for debugging. This is a table of (path, filecap/dircap), for every object 1512 reachable from the starting directory. The path will be slash-joined, and 1513 the filecap/dircap will contain a link to the object in question. This page 1514 gives immediate access to every object in the virtual filesystem subtree. 1515 1516 This operation uses the same ophandle= mechanism as deep-check. The 1517 corresponding /operations/$HANDLE page has three different forms. The 1518 default is output=HTML. 1519 1520 If output=text is added to the query args, the results will be a text/plain 1521 list. The first line is special: it is either "finished: yes" or "finished: 1522 no"; if the operation is not finished, you must periodically reload the 1523 page until it completes. The rest of the results are a plaintext list, with 1524 one file/dir per line, slash-separated, with the filecap/dircap separated 1525 by a space. 1526 1527 If output=JSON is added to the queryargs, then the results will be a 1528 JSON-formatted dictionary with six keys. Note that because large directory 1529 structures can result in very large JSON results, the full results will not 1530 be available until the operation is complete (i.e. until output["finished"] 1531 is True):: 1532 1533 finished (bool): if False then you must reload the page until True 1534 origin_si (base32 str): the storage index of the starting point 1535 manifest: list of (path, cap) tuples, where path is a list of strings. 1536 verifycaps: list of (printable) verify cap strings 1537 storage-index: list of (base32) storage index strings 1538 stats: a dictionary with the same keys as the t=start-deep-stats command 1539 (described below) 1540 1541 ``POST $DIRURL?t=start-deep-size`` (must add &ophandle=XYZ) 1542 1543 This operation generates a number (in bytes) containing the sum of the 1544 filesize of all directories and immutable files reachable from the given 1545 directory. This is a rough lower bound of the total space consumed by this 1546 subtree. It does not include space consumed by mutable files, nor does it 1547 take expansion or encoding overhead into account. Later versions of the 1548 code may improve this estimate upwards. 1549 1550 The /operations/$HANDLE status output consists of two lines of text:: 1551 1552 finished: yes 1553 size: 1234 1554 1555 ``POST $DIRURL?t=start-deep-stats`` (must add &ophandle=XYZ) 1556 1557 This operation performs a recursive walk of all files and directories 1558 reachable from the given directory, and generates a collection of 1559 statistics about those objects. 1560 1561 The result (obtained from the /operations/$OPHANDLE page) is a 1562 JSON-serialized dictionary with the following keys (note that some of these 1563 keys may be missing until 'finished' is True):: 1564 1565 finished: (bool) True if the operation has finished, else False 1566 count-immutable-files: count of how many CHK files are in the set 1567 count-mutable-files: same, for mutable files (does not include directories) 1568 count-literal-files: same, for LIT files (data contained inside the URI) 1569 count-files: sum of the above three 1570 count-directories: count of directories 1571 count-unknown: count of unrecognized objects (perhaps from the future) 1572 size-immutable-files: total bytes for all CHK files in the set, =deep-size 1573 size-mutable-files (TODO): same, for current version of all mutable files 1574 size-literal-files: same, for LIT files 1575 size-directories: size of directories (includes size-literal-files) 1576 size-files-histogram: list of (minsize, maxsize, count) buckets, 1577 with a histogram of filesizes, 5dB/bucket, 1578 for both literal and immutable files 1579 largest-directory: number of children in the largest directory 1580 largest-immutable-file: number of bytes in the largest CHK file 1581 1582 size-mutable-files is not implemented, because it would require extra 1583 queries to each mutable file to get their size. This may be implemented in 1584 the future. 1585 1586 Assuming no sharing, the basic space consumed by a single root directory is 1587 the sum of size-immutable-files, size-mutable-files, and size-directories. 1588 The actual disk space used by the shares is larger, because of the 1589 following sources of overhead:: 1590 1591 integrity data 1592 expansion due to erasure coding 1593 share management data (leases) 1594 backend (ext3) minimum block size 1595 1596 ``POST $URL?t=stream-manifest`` 1597 1598 This operation performs a recursive walk of all files and directories 1599 reachable from the given starting point. For each such unique object 1600 (duplicates are skipped), a single line of JSON is emitted to the HTTP 1601 response channel (or an error indication, see below). When the walk is 1602 complete, a final line of JSON is emitted which contains the accumulated 1603 file-size/count "deep-stats" data. 1604 1605 A CLI tool can split the response stream on newlines into "response units", 1606 and parse each response unit as JSON. Each such parsed unit will be a 1607 dictionary, and will contain at least the "type" key: a string, one of 1608 "file", "directory", or "stats". 1609 1610 For all units that have a type of "file" or "directory", the dictionary will 1611 contain the following keys:: 1612 1613 "path": a list of strings, with the path that is traversed to reach the 1614 object 1615 "cap": a write-cap URI for the file or directory, if available, else a 1616 read-cap URI 1617 "verifycap": a verify-cap URI for the file or directory 1618 "repaircap": an URI for the weakest cap that can still be used to repair 1619 the object 1620 "storage-index": a base32 storage index for the object 1621 1622 Note that non-distributed files (i.e. LIT files) will have values of None 1623 for verifycap, repaircap, and storage-index, since these files can neither 1624 be verified nor repaired, and are not stored on the storage servers. 1625 1626 The last unit in the stream will have a type of "stats", and will contain 1627 the keys described in the "start-deep-stats" operation, below. 1628 1629 If any errors occur during the traversal (specifically if a directory is 1630 unrecoverable, such that further traversal is not possible), an error 1631 indication is written to the response body, instead of the usual line of 1632 JSON. This error indication line will begin with the string "ERROR:" (in all 1633 caps), and contain a summary of the error on the rest of the line. The 1634 remaining lines of the response body will be a python exception. The client 1635 application should look for the ERROR: and stop processing JSON as soon as 1636 it is seen. The line just before the ERROR: will describe the directory that 1637 was untraversable, since the manifest entry is emitted to the HTTP response 1638 body before the child is traversed. 1639 1640 Other Useful Pages 1641 ================== 1642 1643 The portion of the web namespace that begins with "/uri" (and "/named") is 1644 dedicated to giving users (both humans and programs) access to the Tahoe 1645 virtual filesystem. The rest of the namespace provides status information 1646 about the state of the Tahoe node. 1647 1648 ``GET /`` (the root page) 1649 1650 This is the "Welcome Page", and contains a few distinct sections:: 1651 1652 Node information: library versions, local nodeid, services being provided. 1653 1654 Filesystem Access Forms: create a new directory, view a file/directory by 1655 URI, upload a file (unlinked), download a file by 1656 URI. 1657 1658 Grid Status: introducer information, helper information, connected storage 1659 servers. 1660 1661 ``GET /status/`` 1662 1663 This page lists all active uploads and downloads, and contains a short list 1664 of recent upload/download operations. Each operation has a link to a page 1665 that describes file sizes, servers that were involved, and the time consumed 1666 in each phase of the operation. 1667 1668 A GET of /status/?t=json will contain a machine-readable subset of the same 1669 data. It returns a JSON-encoded dictionary. The only key defined at this 1670 time is "active", with a value that is a list of operation dictionaries, one 1671 for each active operation. Once an operation is completed, it will no longer 1672 appear in data["active"] . 1673 1674 Each op-dict contains a "type" key, one of "upload", "download", 1675 "mapupdate", "publish", or "retrieve" (the first two are for immutable 1676 files, while the latter three are for mutable files and directories). 1677 1678 The "upload" op-dict will contain the following keys:: 1679 1680 type (string): "upload" 1681 storage-index-string (string): a base32-encoded storage index 1682 total-size (int): total size of the file 1683 status (string): current status of the operation 1684 progress-hash (float): 1.0 when the file has been hashed 1685 progress-ciphertext (float): 1.0 when the file has been encrypted. 1686 progress-encode-push (float): 1.0 when the file has been encoded and 1687 pushed to the storage servers. For helper 1688 uploads, the ciphertext value climbs to 1.0 1689 first, then encoding starts. For unassisted 1690 uploads, ciphertext and encode-push progress 1691 will climb at the same pace. 1692 1693 The "download" op-dict will contain the following keys:: 1694 1695 type (string): "download" 1696 storage-index-string (string): a base32-encoded storage index 1697 total-size (int): total size of the file 1698 status (string): current status of the operation 1699 progress (float): 1.0 when the file has been fully downloaded 1700 1701 Front-ends which want to report progress information are advised to simply 1702 average together all the progress-* indicators. A slightly more accurate 1703 value can be found by ignoring the progress-hash value (since the current 1704 implementation hashes synchronously, so clients will probably never see 1705 progress-hash!=1.0). 1706 1707 ``GET /provisioning/`` 1708 1709 This page provides a basic tool to predict the likely storage and bandwidth 1710 requirements of a large Tahoe grid. It provides forms to input things like 1711 total number of users, number of files per user, average file size, number 1712 of servers, expansion ratio, hard drive failure rate, etc. It then provides 1713 numbers like how many disks per server will be needed, how many read 1714 operations per second should be expected, and the likely MTBF for files in 1715 the grid. This information is very preliminary, and the model upon which it 1716 is based still needs a lot of work. 1717 1718 ``GET /helper_status/`` 1719 1720 If the node is running a helper (i.e. if [helper]enabled is set to True in 1721 tahoe.cfg), then this page will provide a list of all the helper operations 1722 currently in progress. If "?t=json" is added to the URL, it will return a 1723 JSON-formatted list of helper statistics, which can then be used to produce 1724 graphs to indicate how busy the helper is. 1725 1726 ``GET /statistics/`` 1727 1728 This page provides "node statistics", which are collected from a variety of 1729 sources:: 1730 1731 load_monitor: every second, the node schedules a timer for one second in 1732 the future, then measures how late the subsequent callback 1733 is. The "load_average" is this tardiness, measured in 1734 seconds, averaged over the last minute. It is an indication 1735 of a busy node, one which is doing more work than can be 1736 completed in a timely fashion. The "max_load" value is the 1737 highest value that has been seen in the last 60 seconds. 1738 1739 cpu_monitor: every minute, the node uses time.clock() to measure how much 1740 CPU time it has used, and it uses this value to produce 1741 1min/5min/15min moving averages. These values range from 0% 1742 (0.0) to 100% (1.0), and indicate what fraction of the CPU 1743 has been used by the Tahoe node. Not all operating systems 1744 provide meaningful data to time.clock(): they may report 100% 1745 CPU usage at all times. 1746 1747 uploader: this counts how many immutable files (and bytes) have been 1748 uploaded since the node was started 1749 1750 downloader: this counts how many immutable files have been downloaded 1751 since the node was started 1752 1753 publishes: this counts how many mutable files (including directories) have 1754 been modified since the node was started 1755 1756 retrieves: this counts how many mutable files (including directories) have 1757 been read since the node was started 1758 1759 There are other statistics that are tracked by the node. The "raw stats" 1760 section shows a formatted dump of all of them. 1761 1762 By adding "?t=json" to the URL, the node will return a JSON-formatted 1763 dictionary of stats values, which can be used by other tools to produce 1764 graphs of node behavior. The misc/munin/ directory in the source 1765 distribution provides some tools to produce these graphs. 1766 1767 ``GET /`` (introducer status) 1768 1769 For Introducer nodes, the welcome page displays information about both 1770 clients and servers which are connected to the introducer. Servers make 1771 "service announcements", and these are listed in a table. Clients will 1772 subscribe to hear about service announcements, and these subscriptions are 1773 listed in a separate table. Both tables contain information about what 1774 version of Tahoe is being run by the remote node, their advertised and 1775 outbound IP addresses, their nodeid and nickname, and how long they have 1776 been available. 1777 1778 By adding "?t=json" to the URL, the node will return a JSON-formatted 1779 dictionary of stats values, which can be used to produce graphs of connected 1780 clients over time. This dictionary has the following keys:: 1781 1782 ["subscription_summary"] : a dictionary mapping service name (like 1783 "storage") to an integer with the number of 1784 clients that have subscribed to hear about that 1785 service 1786 ["announcement_summary"] : a dictionary mapping service name to an integer 1787 with the number of servers which are announcing 1788 that service 1789 ["announcement_distinct_hosts"] : a dictionary mapping service name to an 1790 integer which represents the number of 1791 distinct hosts that are providing that 1792 service. If two servers have announced 1793 FURLs which use the same hostnames (but 1794 different ports and tubids), they are 1795 considered to be on the same host. 1796 1797 1798 Static Files in /public_html 1799 ============================ 1800 1801 The webapi server will take any request for a URL that starts with /static 1802 and serve it from a configurable directory which defaults to 1803 $BASEDIR/public_html . This is configured by setting the "[node]web.static" 1804 value in $BASEDIR/tahoe.cfg . If this is left at the default value of 1805 "public_html", then http://localhost:3456/static/subdir/foo.html will be 1806 served with the contents of the file $BASEDIR/public_html/subdir/foo.html . 1807 1808 This can be useful to serve a javascript application which provides a 1809 prettier front-end to the rest of the Tahoe webapi. 1810 1811 1812 Safety and security issues -- names vs. URIs 1813 ============================================ 1814 1815 Summary: use explicit file- and dir- caps whenever possible, to reduce the 1816 potential for surprises when the filesystem structure is changed. 1817 1818 Tahoe provides a mutable filesystem, but the ways that the filesystem can 1819 change are limited. The only thing that can change is that the mapping from 1820 child names to child objects that each directory contains can be changed by 1821 adding a new child name pointing to an object, removing an existing child name, 1822 or changing an existing child name to point to a different object. 1823 1824 Obviously if you query Tahoe for information about the filesystem and then act 1825 to change the filesystem (such as by getting a listing of the contents of a 1826 directory and then adding a file to the directory), then the filesystem might 1827 have been changed after you queried it and before you acted upon it. However, 1828 if you use the URI instead of the pathname of an object when you act upon the 1829 object, then the only change that can happen is if the object is a directory 1830 then the set of child names it has might be different. If, on the other hand, 1831 you act upon the object using its pathname, then a different object might be in 1832 that place, which can result in more kinds of surprises. 1833 1834 For example, suppose you are writing code which recursively downloads the 1835 contents of a directory. The first thing your code does is fetch the listing 1836 of the contents of the directory. For each child that it fetched, if that 1837 child is a file then it downloads the file, and if that child is a directory 1838 then it recurses into that directory. Now, if the download and the recurse 1839 actions are performed using the child's name, then the results might be 1840 wrong, because for example a child name that pointed to a sub-directory when 1841 you listed the directory might have been changed to point to a file (in which 1842 case your attempt to recurse into it would result in an error and the file 1843 would be skipped), or a child name that pointed to a file when you listed the 1844 directory might now point to a sub-directory (in which case your attempt to 1845 download the child would result in a file containing HTML text describing the 1846 sub-directory!). 1847 1848 If your recursive algorithm uses the uri of the child instead of the name of 1849 the child, then those kinds of mistakes just can't happen. Note that both the 1850 child's name and the child's URI are included in the results of listing the 1851 parent directory, so it isn't any harder to use the URI for this purpose. 1852 1853 The read and write caps in a given directory node are separate URIs, and 1854 can't be assumed to point to the same object even if they were retrieved in 1855 the same operation (although the webapi server attempts to ensure this 1856 in most cases). If you need to rely on that property, you should explicitly 1857 verify it. More generally, you should not make assumptions about the 1858 internal consistency of the contents of mutable directories. As a result 1859 of the signatures on mutable object versions, it is guaranteed that a given 1860 version was written in a single update, but -- as in the case of a file -- 1861 the contents may have been chosen by a malicious writer in a way that is 1862 designed to confuse applications that rely on their consistency. 1863 1864 In general, use names if you want "whatever object (whether file or 1865 directory) is found by following this name (or sequence of names) when my 1866 request reaches the server". Use URIs if you want "this particular object". 1867 1868 Concurrency Issues 1869 ================== 1870 1871 Tahoe uses both mutable and immutable files. Mutable files can be created 1872 explicitly by doing an upload with ?mutable=true added, or implicitly by 1873 creating a new directory (since a directory is just a special way to 1874 interpret a given mutable file). 1875 1876 Mutable files suffer from the same consistency-vs-availability tradeoff that 1877 all distributed data storage systems face. It is not possible to 1878 simultaneously achieve perfect consistency and perfect availability in the 1879 face of network partitions (servers being unreachable or faulty). 1880 1881 Tahoe tries to achieve a reasonable compromise, but there is a basic rule in 1882 place, known as the Prime Coordination Directive: "Don't Do That". What this 1883 means is that if write-access to a mutable file is available to several 1884 parties, then those parties are responsible for coordinating their activities 1885 to avoid multiple simultaneous updates. This could be achieved by having 1886 these parties talk to each other and using some sort of locking mechanism, or 1887 by serializing all changes through a single writer. 1888 1889 The consequences of performing uncoordinated writes can vary. Some of the 1890 writers may lose their changes, as somebody else wins the race condition. In 1891 many cases the file will be left in an "unhealthy" state, meaning that there 1892 are not as many redundant shares as we would like (reducing the reliability 1893 of the file against server failures). In the worst case, the file can be left 1894 in such an unhealthy state that no version is recoverable, even the old ones. 1895 It is this small possibility of data loss that prompts us to issue the Prime 1896 Coordination Directive. 1897 1898 Tahoe nodes implement internal serialization to make sure that a single Tahoe 1899 node cannot conflict with itself. For example, it is safe to issue two 1900 directory modification requests to a single tahoe node's webapi server at the 1901 same time, because the Tahoe node will internally delay one of them until 1902 after the other has finished being applied. (This feature was introduced in 1903 Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing 1904 web requests themselves). 1905 1906 For more details, please see the "Consistency vs Availability" and "The Prime 1907 Coordination Directive" sections of mutable.txt, in the same directory as 1908 this file. 1909 1910 1911 .. [1] URLs and HTTP and UTF-8, Oh My 1912 1913 HTTP does not provide a mechanism to specify the character set used to 1914 encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that 1915 the filename= argument shall be a URL-encoded UTF-8 encoded unicode object. 1916 For example, suppose we want to provoke the server into using a filename of 1917 "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this 1918 is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's 1919 repr() function would show). To encode this into a URL, the non-printable 1920 characters must be escaped with the urlencode '%XX' mechansim, giving us 1921 "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET 1922 /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers 1923 provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e. 1924 1925 The response header will need to indicate a non-ASCII filename. The actual 1926 mechanism to do this is not clear. For ASCII filenames, the response header 1927 would look like:: 1928 1929 Content-Disposition: attachment; filename="english.txt" 1930 1931 If Tahoe were to enforce the utf-8 convention, it would need to decode the 1932 URL argument into a unicode string, and then encode it back into a sequence 1933 of bytes when creating the response header. One possibility would be to use 1934 unencoded utf-8. Developers suggest that IE7 might accept this:: 1935 1936 #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e" 1937 (note, the last four bytes of that line, not including the newline, are 1938 0xC3 0xA9 0x65 0x22) 1939 1940 RFC2231#4 (dated 1997): suggests that the following might work, and some 1941 developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that 1942 it is supported by firefox (but not IE7):: 1943 1944 #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e 1945 1946 My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that 1947 the filename= parameter is defined to be wrapped in quotes (presumeably to 1948 allow spaces without breaking the parsing of subsequent parameters), which 1949 would give us:: 1950 1951 #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e" 1952 1953 However this is contrary to the examples in the email thread listed above. 1954 1955 Developers report that IE7 (when it is configured for UTF-8 URL encoding, 1956 which is not the default in asian countries), will accept:: 1957 1958 #4: Content-Disposition: attachment; filename=fianc%C3%A9e 1959 1960 However, for maximum compatibility, Tahoe simply copies bytes from the URL 1961 into the response header, rather than enforcing the utf-8 convention. This 1962 means it does not try to decode the filename from the URL argument, nor does 1963 it encode the filename into the response header. -
deleted file docs/frontends/webapi.txt
diff --git a/docs/frontends/webapi.txt b/docs/frontends/webapi.txt deleted file mode 100644 index bf23daf..0000000
+ - 1 2 = The Tahoe REST-ful Web API =3 4 1. Enabling the web-API port5 2. Basic Concepts: GET, PUT, DELETE, POST6 3. URLs, Machine-Oriented Interfaces7 4. Browser Operations: Human-Oriented Interfaces8 5. Welcome / Debug / Status pages9 6. Static Files in /public_html10 7. Safety and security issues -- names vs. URIs11 8. Concurrency Issues12 13 14 == Enabling the web-API port ==15 16 Every Tahoe node is capable of running a built-in HTTP server. To enable17 this, just write a port number into the "[node]web.port" line of your node's18 tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"19 section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port20 3456.21 22 This string is actually a Twisted "strports" specification, meaning you can23 get more control over the interface to which the server binds by supplying24 additional arguments. For more details, see the documentation on25 twisted.application.strports:26 http://twistedmatrix.com/documents/current/api/twisted.application.strports.html27 28 Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same29 but binds to the loopback interface, ensuring that only the programs on the30 local host can connect. Using31 "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.32 33 This webport can be set when the node is created by passing a --webport34 option to the 'tahoe create-node' command. By default, the node listens on35 port 3456, on the loopback (127.0.0.1) interface.36 37 == Basic Concepts ==38 39 As described in architecture.txt, each file and directory in a Tahoe virtual40 filesystem is referenced by an identifier that combines the designation of41 the object with the authority to do something with it (such as read or modify42 the contents). This identifier is called a "read-cap" or "write-cap",43 depending upon whether it enables read-only or read-write access. These44 "caps" are also referred to as URIs.45 46 The Tahoe web-based API is "REST-ful", meaning it implements the concepts of47 "REpresentational State Transfer": the original scheme by which the World48 Wide Web was intended to work. Each object (file or directory) is referenced49 by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and50 DELETE) are used to manipulate these objects. You can think of the URL as a51 noun, and the method as a verb.52 53 In REST, the GET method is used to retrieve information about an object, or54 to retrieve some representation of the object itself. When the object is a55 file, the basic GET method will simply return the contents of that file.56 Other variations (generally implemented by adding query parameters to the57 URL) will return information about the object, such as metadata. GET58 operations are required to have no side-effects.59 60 PUT is used to upload new objects into the filesystem, or to replace an61 existing object. DELETE it used to delete objects from the filesystem. Both62 PUT and DELETE are required to be idempotent: performing the same operation63 multiple times must have the same side-effects as only performing it once.64 65 POST is used for more complicated actions that cannot be expressed as a GET,66 PUT, or DELETE. POST operations can be thought of as a method call: sending67 some message to the object referenced by the URL. In Tahoe, POST is also used68 for operations that must be triggered by an HTML form (including upload and69 delete), because otherwise a regular web browser has no way to accomplish70 these tasks. In general, everything that can be done with a PUT or DELETE can71 also be done with a POST.72 73 Tahoe's web API is designed for two different kinds of consumer. The first is74 a program that needs to manipulate the virtual file system. Such programs are75 expected to use the RESTful interface described above. The second is a human76 using a standard web browser to work with the filesystem. This user is given77 a series of HTML pages with links to download files, and forms that use POST78 actions to upload, rename, and delete files.79 80 When an error occurs, the HTTP response code will be set to an appropriate81 400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request82 when the parameters to a webapi operation are invalid), and the HTTP response83 body will usually contain a few lines of explanation as to the cause of the84 error and possible responses. Unusual exceptions may result in a85 500 Internal Server Error as a catch-all, with a default response body containing86 a Nevow-generated HTML-ized representation of the Python exception stack trace87 that caused the problem. CLI programs which want to copy the response body to88 stderr should provide an "Accept: text/plain" header to their requests to get89 a plain text stack trace instead. If the Accept header contains */*, or90 text/*, or text/html (or if there is no Accept header), HTML tracebacks will91 be generated.92 93 == URLs ==94 95 Tahoe uses a variety of read- and write- caps to identify files and96 directories. The most common of these is the "immutable file read-cap", which97 is used for most uploaded files. These read-caps look like the following:98 99 URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202100 101 The next most common is a "directory write-cap", which provides both read and102 write access to a directory, and look like this:103 104 URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq105 106 There are also "directory read-caps", which start with "URI:DIR2-RO:", and107 give read-only access to a directory. Finally there are also mutable file108 read- and write- caps, which start with "URI:SSK", and give access to mutable109 files.110 111 (Later versions of Tahoe will make these strings shorter, and will remove the112 unfortunate colons, which must be escaped when these caps are embedded in113 URLs.)114 115 To refer to any Tahoe object through the web API, you simply need to combine116 a prefix (which indicates the HTTP server to use) with the cap (which117 indicates which object inside that server to access). Since the default Tahoe118 webport is 3456, the most common prefix is one that will use a local node119 listening on this port:120 121 http://127.0.0.1:3456/uri/ + $CAP122 123 So, to access the directory named above (which happens to be the124 publically-writeable sample directory on the Tahoe test grid, described at125 http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:126 127 http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/128 129 (note that the colons in the directory-cap are url-encoded into "%3A"130 sequences).131 132 Likewise, to access the file named above, use:133 134 http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202135 136 In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap137 or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap138 that refers to a file (whether mutable or immutable). So those URLs above can139 be abbreviated as:140 141 http://127.0.0.1:3456/uri/$DIRCAP/142 http://127.0.0.1:3456/uri/$FILECAP143 144 The operation summaries below will abbreviate these further, by eliding the145 server prefix. They will be displayed like this:146 147 /uri/$DIRCAP/148 /uri/$FILECAP149 150 151 === Child Lookup ===152 153 Tahoe directories contain named child entries, just like directories in a regular154 local filesystem. These child entries, called "dirnodes", consist of a name,155 metadata, a write slot, and a read slot. The write and read slots normally contain156 a write-cap and read-cap referring to the same object, which can be either a file157 or a subdirectory. The write slot may be empty (actually, both may be empty,158 but that is unusual).159 160 If you have a Tahoe URL that refers to a directory, and want to reference a161 named child inside it, just append the child name to the URL. For example, if162 our sample directory contains a file named "welcome.txt", we can refer to163 that file with:164 165 http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt166 167 (or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)168 169 Multiple levels of subdirectories can be handled this way:170 171 http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt172 173 In this document, when we need to refer to a URL that references a file using174 this child-of-some-directory format, we'll use the following string:175 176 /uri/$DIRCAP/[SUBDIRS../]FILENAME177 178 The "[SUBDIRS../]" part means that there are zero or more (optional)179 subdirectory names in the middle of the URL. The "FILENAME" at the end means180 that this whole URL refers to a file of some sort, rather than to a181 directory.182 183 When we need to refer specifically to a directory in this way, we'll write:184 185 /uri/$DIRCAP/[SUBDIRS../]SUBDIR186 187 188 Note that all components of pathnames in URLs are required to be UTF-8189 encoded, so "resume.doc" (with an acute accent on both E's) would be accessed190 with:191 192 http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc193 194 Also note that the filenames inside upload POST forms are interpreted using195 whatever character set was provided in the conventional '_charset' field, and196 defaults to UTF-8 if not otherwise specified. The JSON representation of each197 directory contains native unicode strings. Tahoe directories are specified to198 contain unicode filenames, and cannot contain binary strings that are not199 representable as such.200 201 All Tahoe operations that refer to existing files or directories must include202 a suitable read- or write- cap in the URL: the webapi server won't add one203 for you. If you don't know the cap, you can't access the file. This allows204 the security properties of Tahoe caps to be extended across the webapi205 interface.206 207 == Slow Operations, Progress, and Cancelling ==208 209 Certain operations can be expected to take a long time. The "t=deep-check",210 described below, will recursively visit every file and directory reachable211 from a given starting point, which can take minutes or even hours for212 extremely large directory structures. A single long-running HTTP request is a213 fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient214 with waiting and give up on the connection.215 216 For this reason, long-running operations have an "operation handle", which217 can be used to poll for status/progress messages while the operation218 proceeds. This handle can also be used to cancel the operation. These handles219 are created by the client, and passed in as a an "ophandle=" query argument220 to the POST or PUT request which starts the operation. The following221 operations can then be used to retrieve status:222 223 GET /operations/$HANDLE?output=HTML (with or without t=status)224 GET /operations/$HANDLE?output=JSON (same)225 226 These two retrieve the current status of the given operation. Each operation227 presents a different sort of information, but in general the page retrieved228 will indicate:229 230 * whether the operation is complete, or if it is still running231 * how much of the operation is complete, and how much is left, if possible232 233 Note that the final status output can be quite large: a deep-manifest of a234 directory structure with 300k directories and 200k unique files is about235 275MB of JSON, and might take two minutes to generate. For this reason, the236 full status is not provided until the operation has completed.237 238 The HTML form will include a meta-refresh tag, which will cause a regular239 web browser to reload the status page about 60 seconds later. This tag will240 be removed once the operation has completed.241 242 There may be more status information available under243 /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.244 245 POST /operations/$HANDLE?t=cancel246 247 This terminates the operation, and returns an HTML page explaining what was248 cancelled. If the operation handle has already expired (see below), this249 POST will return a 404, which indicates that the operation is no longer250 running (either it was completed or terminated). The response body will be251 the same as a GET /operations/$HANDLE on this operation handle, and the252 handle will be expired immediately afterwards.253 254 The operation handle will eventually expire, to avoid consuming an unbounded255 amount of memory. The handle's time-to-live can be reset at any time, by256 passing a retain-for= argument (with a count of seconds) to either the257 initial POST that starts the operation, or the subsequent GET request which258 asks about the operation. For example, if a 'GET259 /operations/$HANDLE?output=JSON&retain-for=600' query is performed, the260 handle will remain active for 600 seconds (10 minutes) after the GET was261 received.262 263 In addition, if the GET includes a release-after-complete=True argument, and264 the operation has completed, the operation handle will be released265 immediately.266 267 If a retain-for= argument is not used, the default handle lifetimes are:268 269 * handles will remain valid at least until their operation finishes270 * uncollected handles for finished operations (i.e. handles for271 operations that have finished but for which the GET page has not been272 accessed since completion) will remain valid for four days, or for273 the total time consumed by the operation, whichever is greater.274 * collected handles (i.e. the GET page has been retrieved at least once275 since the operation completed) will remain valid for one day.276 277 Many "slow" operations can begin to use unacceptable amounts of memory when278 operating on large directory structures. The memory usage increases when the279 ophandle is polled, as the results must be copied into a JSON string, sent280 over the wire, then parsed by a client. So, as an alternative, many "slow"281 operations have streaming equivalents. These equivalents do not use operation282 handles. Instead, they emit line-oriented status results immediately. Client283 code can cancel the operation by simply closing the HTTP connection.284 285 == Programmatic Operations ==286 287 Now that we know how to build URLs that refer to files and directories in a288 Tahoe virtual filesystem, what sorts of operations can we do with those URLs?289 This section contains a catalog of GET, PUT, DELETE, and POST operations that290 can be performed on these URLs. This set of operations are aimed at programs291 that use HTTP to communicate with a Tahoe node. A later section describes292 operations that are intended for web browsers.293 294 === Reading A File ===295 296 GET /uri/$FILECAP297 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME298 299 This will retrieve the contents of the given file. The HTTP response body300 will contain the sequence of bytes that make up the file.301 302 To view files in a web browser, you may want more control over the303 Content-Type and Content-Disposition headers. Please see the next section304 "Browser Operations", for details on how to modify these URLs for that305 purpose.306 307 === Writing/Uploading A File ===308 309 PUT /uri/$FILECAP310 PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME311 312 Upload a file, using the data from the HTTP request body, and add whatever313 child links and subdirectories are necessary to make the file available at314 the given location. Once this operation succeeds, a GET on the same URL will315 retrieve the same contents that were just uploaded. This will create any316 necessary intermediate subdirectories.317 318 To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.319 320 In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a321 writeable mutable file, that file's contents will be overwritten in-place. If322 it is a read-cap for a mutable file, an error will occur. If it is an323 immutable file, the old file will be discarded, and a new one will be put in324 its place.325 326 When creating a new file, if "mutable=true" is in the query arguments, the327 operation will create a mutable file instead of an immutable one.328 329 This returns the file-cap of the resulting file. If a new file was created330 by this method, the HTTP response code (as dictated by rfc2616) will be set331 to 201 CREATED. If an existing file was replaced or modified, the response332 code will be 200 OK.333 334 Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'335 command can be used to invoke this operation.336 337 PUT /uri338 339 This uploads a file, and produces a file-cap for the contents, but does not340 attach the file into the filesystem. No directories will be modified by341 this operation. The file-cap is returned as the body of the HTTP response.342 343 If "mutable=true" is in the query arguments, the operation will create a344 mutable file, and return its write-cap in the HTTP respose. The default is345 to create an immutable file, returning the read-cap as a response.346 347 === Creating A New Directory ===348 349 POST /uri?t=mkdir350 PUT /uri?t=mkdir351 352 Create a new empty directory and return its write-cap as the HTTP response353 body. This does not make the newly created directory visible from the354 filesystem. The "PUT" operation is provided for backwards compatibility:355 new code should use POST.356 357 POST /uri?t=mkdir-with-children358 359 Create a new directory, populated with a set of child nodes, and return its360 write-cap as the HTTP response body. The new directory is not attached to361 any other directory: the returned write-cap is the only reference to it.362 363 Initial children are provided as the body of the POST form (this is more364 efficient than doing separate mkdir and set_children operations). If the365 body is empty, the new directory will be empty. If not empty, the body will366 be interpreted as a UTF-8 JSON-encoded dictionary of children with which the367 new directory should be populated, using the same format as would be368 returned in the 'children' value of the t=json GET request, described below.369 Each dictionary key should be a child name, and each value should be a list370 of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and371 "metadata" keys (all others are ignored). For example, the PUT request body372 could be:373 374 {375 "Fran\u00e7ais": [ "filenode", {376 "ro_uri": "URI:CHK:...",377 "size": bytes,378 "metadata": {379 "ctime": 1202777696.7564139,380 "mtime": 1202777696.7564139,381 "tahoe": {382 "linkcrtime": 1202777696.7564139,383 "linkmotime": 1202777696.7564139384 } } } ],385 "subdir": [ "dirnode", {386 "rw_uri": "URI:DIR2:...",387 "ro_uri": "URI:DIR2-RO:...",388 "metadata": {389 "ctime": 1202778102.7589991,390 "mtime": 1202778111.2160511,391 "tahoe": {392 "linkcrtime": 1202777696.7564139,393 "linkmotime": 1202777696.7564139394 } } } ]395 }396 397 For forward-compatibility, a mutable directory can also contain caps in398 a format that is unknown to the webapi server. When such caps are retrieved399 from a mutable directory in a "ro_uri" field, they will be prefixed with400 the string "ro.", indicating that they must not be decoded without401 checking that they are read-only. The "ro." prefix must not be stripped402 off without performing this check. (Future versions of the webapi server403 will perform it where necessary.)404 405 If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,406 and the webapi server recognizes the rw_uri as a write cap, then it will407 reset the ro_uri to the corresponding read cap and discard the original408 contents of ro_uri (in order to ensure that the two caps correspond to the409 same object and that the ro_uri is in fact read-only). However this may not410 happen for caps in a format unknown to the webapi server. Therefore, when411 writing a directory the webapi client should ensure that the contents412 of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent413 (write cap, read cap) pair if possible. If the webapi client only has414 one cap and does not know whether it is a write cap or read cap, then415 it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The416 client must not put a write cap into a "ro_uri" field.417 418 The metadata may have a "no-write" field. If this is set to true in the419 metadata of a link, it will not be possible to open that link for writing420 via the SFTP frontend; see docs/frontends/FTP-and-SFTP.txt for details.421 Also, if the "no-write" field is set to true in the metadata of a link to422 a mutable child, it will cause the link to be diminished to read-only.423 424 Note that the webapi-using client application must not provide the425 "Content-Type: multipart/form-data" header that usually accompanies HTML426 form submissions, since the body is not formatted this way. Doing so will427 cause a server error as the lower-level code misparses the request body.428 429 Child file names should each be expressed as a unicode string, then used as430 keys of the dictionary. The dictionary should then be converted into JSON,431 and the resulting string encoded into UTF-8. This UTF-8 bytestring should432 then be used as the POST body.433 434 POST /uri?t=mkdir-immutable435 436 Like t=mkdir-with-children above, but the new directory will be437 deep-immutable. This means that the directory itself is immutable, and that438 it can only contain objects that are treated as being deep-immutable, like439 immutable files, literal files, and deep-immutable directories.440 441 For forward-compatibility, a deep-immutable directory can also contain caps442 in a format that is unknown to the webapi server. When such caps are retrieved443 from a deep-immutable directory in a "ro_uri" field, they will be prefixed444 with the string "imm.", indicating that they must not be decoded without445 checking that they are immutable. The "imm." prefix must not be stripped446 off without performing this check. (Future versions of the webapi server447 will perform it where necessary.)448 449 The cap for each child may be given either in the "rw_uri" or "ro_uri"450 field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,451 then the webapi server will check that it is an immutable read-cap of a452 *known* format, and give an error if it is not. If a cap is given in the453 "ro_uri" field, then the webapi server will still check whether known454 caps are immutable, but for unknown caps it will simply assume that the455 cap can be stored, as described above. Note that an attacker would be456 able to store any cap in an immutable directory, so this check when457 creating the directory is only to help non-malicious clients to avoid458 accidentally giving away more authority than intended.459 460 A non-empty request body is mandatory, since after the directory is created,461 it will not be possible to add more children to it.462 463 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir464 PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir465 466 Create new directories as necessary to make sure that the named target467 ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional468 intermediate mutable directories as necessary. If the named target directory469 already exists, this will make no changes to it.470 471 If the final directory is created, it will be empty.472 473 This operation will return an error if a blocking file is present at any of474 the parent names, preventing the server from creating the necessary parent475 directory; or if it would require changing an immutable directory.476 477 The write-cap of the new directory will be returned as the HTTP response478 body.479 480 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children481 482 Like /uri?t=mkdir-with-children, but the final directory is created as a483 child of an existing mutable directory. This will create additional484 intermediate mutable directories as necessary. If the final directory is485 created, it will be populated with initial children from the POST request486 body, as described above.487 488 This operation will return an error if a blocking file is present at any of489 the parent names, preventing the server from creating the necessary parent490 directory; or if it would require changing an immutable directory; or if491 the immediate parent directory already has a a child named SUBDIR.492 493 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable494 495 Like /uri?t=mkdir-immutable, but the final directory is created as a child496 of an existing mutable directory. The final directory will be deep-immutable,497 and will be populated with the children specified as a JSON dictionary in498 the POST request body.499 500 In Tahoe 1.6 this operation creates intermediate mutable directories if501 necessary, but that behaviour should not be relied on; see ticket #920.502 503 This operation will return an error if the parent directory is immutable,504 or already has a child named SUBDIR.505 506 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME507 508 Create a new empty mutable directory and attach it to the given existing509 directory. This will create additional intermediate directories as necessary.510 511 This operation will return an error if a blocking file is present at any of512 the parent names, preventing the server from creating the necessary parent513 directory, or if it would require changing any immutable directory.514 515 The URL of this operation points to the parent of the bottommost new directory,516 whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL517 that points directly to the bottommost new directory.518 519 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME520 521 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will522 be populated with initial children via the POST request body. This command523 will create additional intermediate mutable directories as necessary.524 525 This operation will return an error if a blocking file is present at any of526 the parent names, preventing the server from creating the necessary parent527 directory; or if it would require changing an immutable directory; or if528 the immediate parent directory already has a a child named NAME.529 530 Note that the name= argument must be passed as a queryarg, because the POST531 request body is used for the initial children JSON.532 533 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME534 535 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the536 final directory will be deep-immutable. The children are specified as a537 JSON dictionary in the POST request body. Again, the name= argument must be538 passed as a queryarg.539 540 In Tahoe 1.6 this operation creates intermediate mutable directories if541 necessary, but that behaviour should not be relied on; see ticket #920.542 543 This operation will return an error if the parent directory is immutable,544 or already has a child named NAME.545 546 === Get Information About A File Or Directory (as JSON) ===547 548 GET /uri/$FILECAP?t=json549 GET /uri/$DIRCAP?t=json550 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json551 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json552 553 This returns a machine-parseable JSON-encoded description of the given554 object. The JSON always contains a list, and the first element of the list is555 always a flag that indicates whether the referenced object is a file or a556 directory. If it is a capability to a file, then the information includes557 file size and URI, like this:558 559 GET /uri/$FILECAP?t=json :560 561 [ "filenode", {562 "ro_uri": file_uri,563 "verify_uri": verify_uri,564 "size": bytes,565 "mutable": false566 } ]567 568 If it is a capability to a directory followed by a path from that directory569 to a file, then the information also includes metadata from the link to the570 file in the parent directory, like this:571 572 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :573 574 [ "filenode", {575 "ro_uri": file_uri,576 "verify_uri": verify_uri,577 "size": bytes,578 "mutable": false,579 "metadata": {580 "ctime": 1202777696.7564139,581 "mtime": 1202777696.7564139,582 "tahoe": {583 "linkcrtime": 1202777696.7564139,584 "linkmotime": 1202777696.7564139585 } } } ]586 587 If it is a directory, then it includes information about the children of588 this directory, as a mapping from child name to a set of data about the589 child (the same data that would appear in a corresponding GET?t=json of the590 child itself). The child entries also include metadata about each child,591 including link-creation- and link-change- timestamps. The output looks like592 this:593 594 GET /uri/$DIRCAP?t=json :595 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :596 597 [ "dirnode", {598 "rw_uri": read_write_uri,599 "ro_uri": read_only_uri,600 "verify_uri": verify_uri,601 "mutable": true,602 "children": {603 "foo.txt": [ "filenode", {604 "ro_uri": uri,605 "size": bytes,606 "metadata": {607 "ctime": 1202777696.7564139,608 "mtime": 1202777696.7564139,609 "tahoe": {610 "linkcrtime": 1202777696.7564139,611 "linkmotime": 1202777696.7564139612 } } } ],613 "subdir": [ "dirnode", {614 "rw_uri": rwuri,615 "ro_uri": rouri,616 "metadata": {617 "ctime": 1202778102.7589991,618 "mtime": 1202778111.2160511,619 "tahoe": {620 "linkcrtime": 1202777696.7564139,621 "linkmotime": 1202777696.7564139622 } } } ]623 } } ]624 625 In the above example, note how 'children' is a dictionary in which the keys626 are child names and the values depend upon whether the child is a file or a627 directory. The value is mostly the same as the JSON representation of the628 child object (except that directories do not recurse -- the "children"629 entry of the child is omitted, and the directory view includes the metadata630 that is stored on the directory edge).631 632 The rw_uri field will be present in the information about a directory633 if and only if you have read-write access to that directory. The verify_uri634 field will be present if and only if the object has a verify-cap635 (non-distributed LIT files do not have verify-caps).636 637 If the cap is of an unknown format, then the file size and verify_uri will638 not be available:639 640 GET /uri/$UNKNOWNCAP?t=json :641 642 [ "unknown", {643 "ro_uri": unknown_read_uri644 } ]645 646 GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json :647 648 [ "unknown", {649 "rw_uri": unknown_write_uri,650 "ro_uri": unknown_read_uri,651 "mutable": true,652 "metadata": {653 "ctime": 1202777696.7564139,654 "mtime": 1202777696.7564139,655 "tahoe": {656 "linkcrtime": 1202777696.7564139,657 "linkmotime": 1202777696.7564139658 } } } ]659 660 As in the case of file nodes, the metadata will only be present when the661 capability is to a directory followed by a path. The "mutable" field is also662 not always present; when it is absent, the mutability of the object is not663 known.664 665 ==== About the metadata ====666 667 The value of the 'tahoe':'linkmotime' key is updated whenever a link to a668 child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever669 a link to a child is created -- i.e. when there was not previously a link670 under that name.671 672 Note however, that if the edge in the Tahoe filesystem points to a mutable673 file and the contents of that mutable file is changed, then the674 'tahoe':'linkmotime' value on that edge will *not* be updated, since the675 edge itself wasn't updated -- only the mutable file was.676 677 The timestamps are represented as a number of seconds since the UNIX epoch678 (1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long679 term.680 681 In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated682 instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting683 in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict684 are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'685 sub-dict to be deleted by webapi requests in which new metadata is686 specified, and not to be added to existing child links that lack it.687 688 From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer689 populated or updated (see ticket #924), except by "tahoe backup" as690 explained below. For backward compatibility, when an existing link is691 updated and 'tahoe':'linkcrtime' is not present in the previous metadata692 but 'ctime' is, the old value of 'ctime' is used as the new value of693 'tahoe':'linkcrtime'.694 695 The reason we added the new fields in Tahoe v1.4.0 is that there is a696 "set_children" API (described below) which you can use to overwrite the697 values of the 'mtime'/'ctime' pair, and this API is used by the698 "tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and699 'ctime' values when backing up files from a local filesystem into the700 Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used701 to set anything under the 'tahoe' key of the metadata dict -- if you702 include 'tahoe' keys in your 'metadata' arguments then it will silently703 ignore those keys.704 705 Therefore, if the 'tahoe' sub-dict is present, you can rely on the706 'linkcrtime' and 'linkmotime' values therein to have the semantics described707 above. (This is assuming that only official Tahoe clients have been used to708 write those links, and that their system clocks were set to what you expected709 -- there is nothing preventing someone from editing their Tahoe client or710 writing their own Tahoe client which would overwrite those values however711 they like, and there is nothing to constrain their system clock from taking712 any value.)713 714 When an edge is created or updated by "tahoe backup", the 'mtime' and715 'ctime' keys on that edge are set as follows:716 717 * 'mtime' is set to the timestamp read from the local filesystem for the718 "mtime" of the local file in question, which means the last time the719 contents of that file were changed.720 721 * On Windows, 'ctime' is set to the creation timestamp for the file722 read from the local filesystem. On other platforms, 'ctime' is set to723 the UNIX "ctime" of the local file, which means the last time that724 either the contents or the metadata of the local file was changed.725 726 There are several ways that the 'ctime' field could be confusing:727 728 1. You might be confused about whether it reflects the time of the creation729 of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a730 timestamp copied in by "tahoe backup" from a local filesystem.731 732 2. You might be confused about whether it is a copy of the file creation733 time (if "tahoe backup" was run on a Windows system) or of the last734 contents-or-metadata change (if "tahoe backup" was run on a different735 operating system).736 737 3. You might be confused by the fact that changing the contents of a738 mutable file in Tahoe doesn't have any effect on any links pointing at739 that file in any directories, although "tahoe backup" sets the link740 'ctime'/'mtime' to reflect timestamps about the local file corresponding741 to the Tahoe file to which the link points.742 743 4. Also, quite apart from Tahoe, you might be confused about the meaning744 of the "ctime" in UNIX local filesystems, which people sometimes think745 means file creation time, but which actually means, in UNIX local746 filesystems, the most recent time that the file contents or the file747 metadata (such as owner, permission bits, extended attributes, etc.)748 has changed. Note that although "ctime" does not mean file creation time749 in UNIX, links created by a version of Tahoe prior to v1.7.0, and never750 written by "tahoe backup", will have 'ctime' set to the link creation751 time.752 753 754 === Attaching an existing File or Directory by its read- or write- cap ===755 756 PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri757 758 This attaches a child object (either a file or directory) to a specified759 location in the virtual filesystem. The child object is referenced by its760 read- or write- cap, as provided in the HTTP request body. This will create761 intermediate directories as necessary.762 763 This is similar to a UNIX hardlink: by referencing a previously-uploaded file764 (or previously-created directory) instead of uploading/creating a new one,765 you can create two references to the same object.766 767 The read- or write- cap of the child is provided in the body of the HTTP768 request, and this same cap is returned in the response body.769 770 The default behavior is to overwrite any existing object at the same771 location. To prevent this (and make the operation return an error instead772 of overwriting), add a "replace=false" argument, as "?t=uri&replace=false".773 With replace=false, this operation will return an HTTP 409 "Conflict" error774 if there is already an object at the given location, rather than775 overwriting the existing object. To allow the operation to overwrite a776 file, but return an error when trying to overwrite a directory, use777 "replace=only-files" (this behavior is closer to the traditional UNIX "mv"778 command). Note that "true", "t", and "1" are all synonyms for "True", and779 "false", "f", and "0" are synonyms for "False", and the parameter is780 case-insensitive.781 782 Note that this operation does not take its child cap in the form of783 separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a784 child cap in a format unknown to the webapi server, unless its URI785 starts with "ro." or "imm.". This restriction is necessary because the786 server is not able to attenuate an unknown write cap to a read cap.787 Unknown URIs starting with "ro." or "imm.", on the other hand, are788 assumed to represent read caps. The client should not prefix a write789 cap with "ro." or "imm." and pass it to this operation, since that790 would result in granting the cap's write authority to holders of the791 directory read cap.792 793 === Adding multiple files or directories to a parent directory at once ===794 795 POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children796 POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children (Tahoe >= v1.6)797 798 This command adds multiple children to a directory in a single operation.799 It reads the request body and interprets it as a JSON-encoded description800 of the child names and read/write-caps that should be added.801 802 The body should be a JSON-encoded dictionary, in the same format as the803 "children" value returned by the "GET /uri/$DIRCAP?t=json" operation804 described above. In this format, each key is a child names, and the805 corresponding value is a tuple of (type, childinfo). "type" is ignored, and806 "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and807 "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and808 use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2809 look very much like DIR1 (except for any existing children of DIR2 that810 were not overwritten, and any existing "tahoe" metadata keys as described811 below).812 813 When the set_children request contains a child name that already exists in814 the target directory, this command defaults to overwriting that child with815 the new value (both child cap and metadata, but if the JSON data does not816 contain a "metadata" key, the old child's metadata is preserved). The817 command takes a boolean "overwrite=" query argument to control this818 behavior. If you use "?t=set_children&overwrite=false", then an attempt to819 replace an existing child will instead cause an error.820 821 Any "tahoe" key in the new child's "metadata" value is ignored. Any822 existing "tahoe" metadata is preserved. The metadata["tahoe"] value is823 reserved for metadata generated by the tahoe node itself. The only two keys824 currently placed here are "linkcrtime" and "linkmotime". For details, see825 the section above entitled "Get Information About A File Or Directory (as826 JSON)", in the "About the metadata" subsection.827 828 Note that this command was introduced with the name "set_children", which829 uses an underscore rather than a hyphen as other multi-word command names830 do. The variant with a hyphen is now accepted, but clients that desire831 backward compatibility should continue to use "set_children".832 833 834 === Deleting a File or Directory ===835 836 DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME837 838 This removes the given name from its parent directory. CHILDNAME is the839 name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will840 be modified.841 842 Note that this does not actually delete the file or directory that the name843 points to from the tahoe grid -- it only removes the named reference from844 this directory. If there are other names in this directory or in other845 directories that point to the resource, then it will remain accessible846 through those paths. Even if all names pointing to this object are removed847 from their parent directories, then someone with possession of its read-cap848 can continue to access the object through that cap.849 850 The object will only become completely unreachable once 1: there are no851 reachable directories that reference it, and 2: nobody is holding a read-852 or write- cap to the object. (This behavior is very similar to the way853 hardlinks and anonymous files work in traditional UNIX filesystems).854 855 This operation will not modify more than a single directory. Intermediate856 directories which were implicitly created by PUT or POST methods will *not*857 be automatically removed by DELETE.858 859 This method returns the file- or directory- cap of the object that was just860 removed.861 862 == Browser Operations ==863 864 This section describes the HTTP operations that provide support for humans865 running a web browser. Most of these operations use HTML forms that use POST866 to drive the Tahoe node. This section is intended for HTML authors who want867 to write web pages that contain forms and buttons which manipulate the Tahoe868 filesystem.869 870 Note that for all POST operations, the arguments listed can be provided871 either as URL query arguments or as form body fields. URL query arguments are872 separated from the main URL by "?", and from each other by "&". For example,873 "POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually874 specified by using <input type="hidden"> elements. For clarity, the875 descriptions below display the most significant arguments as URL query args.876 877 === Viewing A Directory (as HTML) ===878 879 GET /uri/$DIRCAP/[SUBDIRS../]880 881 This returns an HTML page, intended to be displayed to a human by a web882 browser, which contains HREF links to all files and directories reachable883 from this directory. These HREF links do not have a t= argument, meaning884 that a human who follows them will get pages also meant for a human. It also885 contains forms to upload new files, and to delete files and directories.886 Those forms use POST methods to do their job.887 888 === Viewing/Downloading a File ===889 890 GET /uri/$FILECAP891 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME892 893 This will retrieve the contents of the given file. The HTTP response body894 will contain the sequence of bytes that make up the file.895 896 If you want the HTTP response to include a useful Content-Type header,897 either use the second form (which starts with a $DIRCAP), or add a898 "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".899 The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information900 to determine a Content-Type (since Tahoe immutable files are merely901 sequences of bytes, not typed+named file objects).902 903 If the URL has both filename= and "save=true" in the query arguments, then904 the server to add a "Content-Disposition: attachment" header, along with a905 filename= parameter. When a user clicks on such a link, most browsers will906 offer to let the user save the file instead of displaying it inline (indeed,907 most browsers will refuse to display it inline). "true", "t", "1", and other908 case-insensitive equivalents are all treated the same.909 910 Character-set handling in URLs and HTTP headers is a dubious art[1]. For911 maximum compatibility, Tahoe simply copies the bytes from the filename=912 argument into the Content-Disposition header's filename= parameter, without913 trying to interpret them in any particular way.914 915 916 GET /named/$FILECAP/FILENAME917 918 This is an alternate download form which makes it easier to get the correct919 filename. The Tahoe server will provide the contents of the given file, with920 a Content-Type header derived from the given filename. This form is used to921 get browsers to use the "Save Link As" feature correctly, and also helps922 command-line tools like "wget" and "curl" use the right filename. Note that923 this form can *only* be used with file caps; it is an error to use a924 directory cap after the /named/ prefix.925 926 === Get Information About A File Or Directory (as HTML) ===927 928 GET /uri/$FILECAP?t=info929 GET /uri/$DIRCAP/?t=info930 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info931 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info932 933 This returns a human-oriented HTML page with more detail about the selected934 file or directory object. This page contains the following items:935 936 object size937 storage index938 JSON representation939 raw contents (text/plain)940 access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)941 check/verify/repair form942 deep-check/deep-size/deep-stats/manifest (for directories)943 replace-conents form (for mutable files)944 945 === Creating a Directory ===946 947 POST /uri?t=mkdir948 949 This creates a new empty directory, but does not attach it to the virtual950 filesystem.951 952 If a "redirect_to_result=true" argument is provided, then the HTTP response953 will cause the web browser to be redirected to a /uri/$DIRCAP page that954 gives access to the newly-created directory. If you bookmark this page,955 you'll be able to get back to the directory again in the future. This is the956 recommended way to start working with a Tahoe server: create a new unlinked957 directory (using redirect_to_result=true), then bookmark the resulting958 /uri/$DIRCAP page. There is a "create directory" button on the Welcome page959 to invoke this action.960 961 If "redirect_to_result=true" is not provided (or is given a value of962 "false"), then the HTTP response body will simply be the write-cap of the963 new directory.964 965 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME966 967 This creates a new empty directory as a child of the designated SUBDIR. This968 will create additional intermediate directories as necessary.969 970 If a "when_done=URL" argument is provided, the HTTP response will cause the971 web browser to redirect to the given URL. This provides a convenient way to972 return the browser to the directory that was just modified. Without a973 when_done= argument, the HTTP response will simply contain the write-cap of974 the directory that was just created.975 976 977 === Uploading a File ===978 979 POST /uri?t=upload980 981 This uploads a file, and produces a file-cap for the contents, but does not982 attach the file into the filesystem. No directories will be modified by983 this operation.984 985 The file must be provided as the "file" field of an HTML encoded form body,986 produced in response to an HTML form like this:987 <form action="/uri" method="POST" enctype="multipart/form-data">988 <input type="hidden" name="t" value="upload" />989 <input type="file" name="file" />990 <input type="submit" value="Upload Unlinked" />991 </form>992 993 If a "when_done=URL" argument is provided, the response body will cause the994 browser to redirect to the given URL. If the when_done= URL has the string995 "%(uri)s" in it, that string will be replaced by a URL-escaped form of the996 newly created file-cap. (Note that without this substitution, there is no997 way to access the file that was just uploaded).998 999 The default (in the absence of when_done=) is to return an HTML page that1000 describes the results of the upload. This page will contain information1001 about which storage servers were used for the upload, how long each1002 operation took, etc.1003 1004 If a "mutable=true" argument is provided, the operation will create a1005 mutable file, and the response body will contain the write-cap instead of1006 the upload results page. The default is to create an immutable file,1007 returning the upload results page as a response.1008 1009 1010 POST /uri/$DIRCAP/[SUBDIRS../]?t=upload1011 1012 This uploads a file, and attaches it as a new child of the given directory,1013 which must be mutable. The file must be provided as the "file" field of an1014 HTML-encoded form body, produced in response to an HTML form like this:1015 <form action="." method="POST" enctype="multipart/form-data">1016 <input type="hidden" name="t" value="upload" />1017 <input type="file" name="file" />1018 <input type="submit" value="Upload" />1019 </form>1020 1021 A "name=" argument can be provided to specify the new child's name,1022 otherwise it will be taken from the "filename" field of the upload form1023 (most web browsers will copy the last component of the original file's1024 pathname into this field). To avoid confusion, name= is not allowed to1025 contain a slash.1026 1027 If there is already a child with that name, and it is a mutable file, then1028 its contents are replaced with the data being uploaded. If it is not a1029 mutable file, the default behavior is to remove the existing child before1030 creating a new one. To prevent this (and make the operation return an error1031 instead of overwriting the old child), add a "replace=false" argument, as1032 "?t=upload&replace=false". With replace=false, this operation will return an1033 HTTP 409 "Conflict" error if there is already an object at the given1034 location, rather than overwriting the existing object. Note that "true",1035 "t", and "1" are all synonyms for "True", and "false", "f", and "0" are1036 synonyms for "False". the parameter is case-insensitive.1037 1038 This will create additional intermediate directories as necessary, although1039 since it is expected to be triggered by a form that was retrieved by "GET1040 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will1041 already exist.1042 1043 If a "mutable=true" argument is provided, any new file that is created will1044 be a mutable file instead of an immutable one. <input type="checkbox"1045 name="mutable" /> will give the user a way to set this option.1046 1047 If a "when_done=URL" argument is provided, the HTTP response will cause the1048 web browser to redirect to the given URL. This provides a convenient way to1049 return the browser to the directory that was just modified. Without a1050 when_done= argument, the HTTP response will simply contain the file-cap of1051 the file that was just uploaded (a write-cap for mutable files, or a1052 read-cap for immutable files).1053 1054 POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload1055 1056 This also uploads a file and attaches it as a new child of the given1057 directory, which must be mutable. It is a slight variant of the previous1058 operation, as the URL refers to the target file rather than the parent1059 directory. It is otherwise identical: this accepts mutable= and when_done=1060 arguments too.1061 1062 POST /uri/$FILECAP?t=upload1063 1064 This modifies the contents of an existing mutable file in-place. An error is1065 signalled if $FILECAP does not refer to a mutable file. It behaves just like1066 the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms1067 in a web browser.1068 1069 === Attaching An Existing File Or Directory (by URI) ===1070 1071 POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP1072 1073 This attaches a given read- or write- cap "CHILDCAP" to the designated1074 directory, with a specified child name. This behaves much like the PUT t=uri1075 operation, and is a lot like a UNIX hardlink. It is subject to the same1076 restrictions as that operation on the use of cap formats unknown to the1077 webapi server.1078 1079 This will create additional intermediate directories as necessary, although1080 since it is expected to be triggered by a form that was retrieved by "GET1081 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will1082 already exist.1083 1084 This accepts the same replace= argument as POST t=upload.1085 1086 === Deleting A Child ===1087 1088 POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME1089 1090 This instructs the node to remove a child object (file or subdirectory) from1091 the given directory, which must be mutable. Note that the entire subtree is1092 unlinked from the parent. Unlike deleting a subdirectory in a UNIX local1093 filesystem, the subtree need not be empty; if it isn't, then other references1094 into the subtree will see that the child subdirectories are not modified by1095 this operation. Only the link from the given directory to its child is severed.1096 1097 === Renaming A Child ===1098 1099 POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW1100 1101 This instructs the node to rename a child of the given directory, which must1102 be mutable. This has a similar effect to removing the child, then adding the1103 same child-cap under the new name, except that it preserves metadata. This1104 operation cannot move the child to a different directory.1105 1106 This operation will replace any existing child of the new name, making it1107 behave like the UNIX "mv -f" command.1108 1109 === Other Utilities ===1110 1111 GET /uri?uri=$CAP1112 1113 This causes a redirect to /uri/$CAP, and retains any additional query1114 arguments (like filename= or save=). This is for the convenience of web1115 forms which allow the user to paste in a read- or write- cap (obtained1116 through some out-of-band channel, like IM or email).1117 1118 Note that this form merely redirects to the specific file or directory1119 indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot1120 traverse to children by appending additional path segments to the URL.1121 1122 GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME1123 1124 This provides a useful facility to browser-based user interfaces. It1125 returns a page containing a form targetting the "POST $DIRCAP t=rename"1126 functionality described above, with the provided $CHILDNAME present in the1127 'from_name' field of that form. I.e. this presents a form offering to1128 rename $CHILDNAME, requesting the new name, and submitting POST rename.1129 1130 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri1131 1132 This returns the file- or directory- cap for the specified object.1133 1134 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri1135 1136 This returns a read-only file- or directory- cap for the specified object.1137 If the object is an immutable file, this will return the same value as1138 t=uri.1139 1140 === Debugging and Testing Features ===1141 1142 These URLs are less-likely to be helpful to the casual Tahoe user, and are1143 mainly intended for developers.1144 1145 POST $URL?t=check1146 1147 This triggers the FileChecker to determine the current "health" of the1148 given file or directory, by counting how many shares are available. The1149 page that is returned will display the results. This can be used as a "show1150 me detailed information about this file" page.1151 1152 If a verify=true argument is provided, the node will perform a more1153 intensive check, downloading and verifying every single bit of every share.1154 1155 If an add-lease=true argument is provided, the node will also add (or1156 renew) a lease to every share it encounters. Each lease will keep the share1157 alive for a certain period of time (one month by default). Once the last1158 lease expires or is explicitly cancelled, the storage server is allowed to1159 delete the share.1160 1161 If an output=JSON argument is provided, the response will be1162 machine-readable JSON instead of human-oriented HTML. The data is a1163 dictionary with the following keys:1164 1165 storage-index: a base32-encoded string with the objects's storage index,1166 or an empty string for LIT files1167 summary: a string, with a one-line summary of the stats of the file1168 results: a dictionary that describes the state of the file. For LIT files,1169 this dictionary has only the 'healthy' key, which will always be1170 True. For distributed files, this dictionary has the following1171 keys:1172 count-shares-good: the number of good shares that were found1173 count-shares-needed: 'k', the number of shares required for recovery1174 count-shares-expected: 'N', the number of total shares generated1175 count-good-share-hosts: this was intended to be the number of distinct1176 storage servers with good shares. It is currently1177 (as of Tahoe-LAFS v1.8.0) computed incorrectly;1178 see ticket #1115.1179 count-wrong-shares: for mutable files, the number of shares for1180 versions other than the 'best' one (highest1181 sequence number, highest roothash). These are1182 either old ...1183 count-recoverable-versions: for mutable files, the number of1184 recoverable versions of the file. For1185 a healthy file, this will equal 1.1186 count-unrecoverable-versions: for mutable files, the number of1187 unrecoverable versions of the file.1188 For a healthy file, this will be 0.1189 count-corrupt-shares: the number of shares with integrity failures1190 list-corrupt-shares: a list of "share locators", one for each share1191 that was found to be corrupt. Each share locator1192 is a list of (serverid, storage_index, sharenum).1193 needs-rebalancing: (bool) True if there are multiple shares on a single1194 storage server, indicating a reduction in reliability1195 that could be resolved by moving shares to new1196 servers.1197 servers-responding: list of base32-encoded storage server identifiers,1198 one for each server which responded to the share1199 query.1200 healthy: (bool) True if the file is completely healthy, False otherwise.1201 Healthy files have at least N good shares. Overlapping shares1202 do not currently cause a file to be marked unhealthy. If there1203 are at least N good shares, then corrupt shares do not cause the1204 file to be marked unhealthy, although the corrupt shares will be1205 listed in the results (list-corrupt-shares) and should be manually1206 removed to wasting time in subsequent downloads (as the1207 downloader rediscovers the corruption and uses alternate shares).1208 Future compatibility: the meaning of this field may change to1209 reflect whether the servers-of-happiness criterion is met1210 (see ticket #614).1211 sharemap: dict mapping share identifier to list of serverids1212 (base32-encoded strings). This indicates which servers are1213 holding which shares. For immutable files, the shareid is1214 an integer (the share number, from 0 to N-1). For1215 immutable files, it is a string of the form1216 'seq%d-%s-sh%d', containing the sequence number, the1217 roothash, and the share number.1218 1219 POST $URL?t=start-deep-check (must add &ophandle=XYZ)1220 1221 This initiates a recursive walk of all files and directories reachable from1222 the target, performing a check on each one just like t=check. The result1223 page will contain a summary of the results, including details on any1224 file/directory that was not fully healthy.1225 1226 t=start-deep-check can only be invoked on a directory. An error (4001227 BAD_REQUEST) will be signalled if it is invoked on a file. The recursive1228 walker will deal with loops safely.1229 1230 This accepts the same verify= and add-lease= arguments as t=check.1231 1232 Since this operation can take a long time (perhaps a second per object),1233 the ophandle= argument is required (see "Slow Operations, Progress, and1234 Cancelling" above). The response to this POST will be a redirect to the1235 corresponding /operations/$HANDLE page (with output=HTML or output=JSON to1236 match the output= argument given to the POST). The deep-check operation1237 will continue to run in the background, and the /operations page should be1238 used to find out when the operation is done.1239 1240 Detailed check results for non-healthy files and directories will be1241 available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will1242 contain links to these detailed results.1243 1244 The HTML /operations/$HANDLE page for incomplete operations will contain a1245 meta-refresh tag, set to 60 seconds, so that a browser which uses1246 deep-check will automatically poll until the operation has completed.1247 1248 The JSON page (/options/$HANDLE?output=JSON) will contain a1249 machine-readable JSON dictionary with the following keys:1250 1251 finished: a boolean, True if the operation is complete, else False. Some1252 of the remaining keys may not be present until the operation1253 is complete.1254 root-storage-index: a base32-encoded string with the storage index of the1255 starting point of the deep-check operation1256 count-objects-checked: count of how many objects were checked. Note that1257 non-distributed objects (i.e. small immutable LIT1258 files) are not checked, since for these objects,1259 the data is contained entirely in the URI.1260 count-objects-healthy: how many of those objects were completely healthy1261 count-objects-unhealthy: how many were damaged in some way1262 count-corrupt-shares: how many shares were found to have corruption,1263 summed over all objects examined1264 list-corrupt-shares: a list of "share identifiers", one for each share1265 that was found to be corrupt. Each share identifier1266 is a list of (serverid, storage_index, sharenum).1267 list-unhealthy-files: a list of (pathname, check-results) tuples, for1268 each file that was not fully healthy. 'pathname' is1269 a list of strings (which can be joined by "/"1270 characters to turn it into a single string),1271 relative to the directory on which deep-check was1272 invoked. The 'check-results' field is the same as1273 that returned by t=check&output=JSON, described1274 above.1275 stats: a dictionary with the same keys as the t=start-deep-stats command1276 (described below)1277 1278 POST $URL?t=stream-deep-check1279 1280 This initiates a recursive walk of all files and directories reachable from1281 the target, performing a check on each one just like t=check. For each1282 unique object (duplicates are skipped), a single line of JSON is emitted to1283 the HTTP response channel (or an error indication, see below). When the walk1284 is complete, a final line of JSON is emitted which contains the accumulated1285 file-size/count "deep-stats" data.1286 1287 This command takes the same arguments as t=start-deep-check.1288 1289 A CLI tool can split the response stream on newlines into "response units",1290 and parse each response unit as JSON. Each such parsed unit will be a1291 dictionary, and will contain at least the "type" key: a string, one of1292 "file", "directory", or "stats".1293 1294 For all units that have a type of "file" or "directory", the dictionary will1295 contain the following keys:1296 1297 "path": a list of strings, with the path that is traversed to reach the1298 object1299 "cap": a write-cap URI for the file or directory, if available, else a1300 read-cap URI1301 "verifycap": a verify-cap URI for the file or directory1302 "repaircap": an URI for the weakest cap that can still be used to repair1303 the object1304 "storage-index": a base32 storage index for the object1305 "check-results": a copy of the dictionary which would be returned by1306 t=check&output=json, with three top-level keys:1307 "storage-index", "summary", and "results", and a variety1308 of counts and sharemaps in the "results" value.1309 1310 Note that non-distributed files (i.e. LIT files) will have values of None1311 for verifycap, repaircap, and storage-index, since these files can neither1312 be verified nor repaired, and are not stored on the storage servers.1313 Likewise the check-results dictionary will be limited: an empty string for1314 storage-index, and a results dictionary with only the "healthy" key.1315 1316 The last unit in the stream will have a type of "stats", and will contain1317 the keys described in the "start-deep-stats" operation, below.1318 1319 If any errors occur during the traversal (specifically if a directory is1320 unrecoverable, such that further traversal is not possible), an error1321 indication is written to the response body, instead of the usual line of1322 JSON. This error indication line will begin with the string "ERROR:" (in all1323 caps), and contain a summary of the error on the rest of the line. The1324 remaining lines of the response body will be a python exception. The client1325 application should look for the ERROR: and stop processing JSON as soon as1326 it is seen. Note that neither a file being unrecoverable nor a directory1327 merely being unhealthy will cause traversal to stop. The line just before1328 the ERROR: will describe the directory that was untraversable, since the1329 unit is emitted to the HTTP response body before the child is traversed.1330 1331 1332 POST $URL?t=check&repair=true1333 1334 This performs a health check of the given file or directory, and if the1335 checker determines that the object is not healthy (some shares are missing1336 or corrupted), it will perform a "repair". During repair, any missing1337 shares will be regenerated and uploaded to new servers.1338 1339 This accepts the same verify=true and add-lease= arguments as t=check. When1340 an output=JSON argument is provided, the machine-readable JSON response1341 will contain the following keys:1342 1343 storage-index: a base32-encoded string with the objects's storage index,1344 or an empty string for LIT files1345 repair-attempted: (bool) True if repair was attempted1346 repair-successful: (bool) True if repair was attempted and the file was1347 fully healthy afterwards. False if no repair was1348 attempted, or if a repair attempt failed.1349 pre-repair-results: a dictionary that describes the state of the file1350 before any repair was performed. This contains exactly1351 the same keys as the 'results' value of the t=check1352 response, described above.1353 post-repair-results: a dictionary that describes the state of the file1354 after any repair was performed. If no repair was1355 performed, post-repair-results and pre-repair-results1356 will be the same. This contains exactly the same keys1357 as the 'results' value of the t=check response,1358 described above.1359 1360 POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ)1361 1362 This triggers a recursive walk of all files and directories, performing a1363 t=check&repair=true on each one.1364 1365 Like t=start-deep-check without the repair= argument, this can only be1366 invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it1367 is invoked on a file. The recursive walker will deal with loops safely.1368 1369 This accepts the same verify= and add-lease= arguments as1370 t=start-deep-check. It uses the same ophandle= mechanism as1371 start-deep-check. When an output=JSON argument is provided, the response1372 will contain the following keys:1373 1374 finished: (bool) True if the operation has completed, else False1375 root-storage-index: a base32-encoded string with the storage index of the1376 starting point of the deep-check operation1377 count-objects-checked: count of how many objects were checked1378 1379 count-objects-healthy-pre-repair: how many of those objects were completely1380 healthy, before any repair1381 count-objects-unhealthy-pre-repair: how many were damaged in some way1382 count-objects-healthy-post-repair: how many of those objects were completely1383 healthy, after any repair1384 count-objects-unhealthy-post-repair: how many were damaged in some way1385 1386 count-repairs-attempted: repairs were attempted on this many objects.1387 count-repairs-successful: how many repairs resulted in healthy objects1388 count-repairs-unsuccessful: how many repairs resulted did not results in1389 completely healthy objects1390 count-corrupt-shares-pre-repair: how many shares were found to have1391 corruption, summed over all objects1392 examined, before any repair1393 count-corrupt-shares-post-repair: how many shares were found to have1394 corruption, summed over all objects1395 examined, after any repair1396 list-corrupt-shares: a list of "share identifiers", one for each share1397 that was found to be corrupt (before any repair).1398 Each share identifier is a list of (serverid,1399 storage_index, sharenum).1400 list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares1401 that were successfully repaired are not1402 included. These are shares that need1403 manual processing. Since immutable shares1404 cannot be modified by clients, all corruption1405 in immutable shares will be listed here.1406 list-unhealthy-files: a list of (pathname, check-results) tuples, for1407 each file that was not fully healthy. 'pathname' is1408 relative to the directory on which deep-check was1409 invoked. The 'check-results' field is the same as1410 that returned by t=check&repair=true&output=JSON,1411 described above.1412 stats: a dictionary with the same keys as the t=start-deep-stats command1413 (described below)1414 1415 POST $URL?t=stream-deep-check&repair=true1416 1417 This triggers a recursive walk of all files and directories, performing a1418 t=check&repair=true on each one. For each unique object (duplicates are1419 skipped), a single line of JSON is emitted to the HTTP response channel (or1420 an error indication). When the walk is complete, a final line of JSON is1421 emitted which contains the accumulated file-size/count "deep-stats" data.1422 1423 This emits the same data as t=stream-deep-check (without the repair=true),1424 except that the "check-results" field is replaced with a1425 "check-and-repair-results" field, which contains the keys returned by1426 t=check&repair=true&output=json (i.e. repair-attempted, repair-successful,1427 pre-repair-results, and post-repair-results). The output does not contain1428 the summary dictionary that is provied by t=start-deep-check&repair=true1429 (the one with count-objects-checked and list-unhealthy-files), since the1430 receiving client is expected to calculate those values itself from the1431 stream of per-object check-and-repair-results.1432 1433 Note that the "ERROR:" indication will only be emitted if traversal stops,1434 which will only occur if an unrecoverable directory is encountered. If a1435 file or directory repair fails, the traversal will continue, and the repair1436 failure will be indicated in the JSON data (in the "repair-successful" key).1437 1438 POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)1439 1440 This operation generates a "manfest" of the given directory tree, mostly1441 for debugging. This is a table of (path, filecap/dircap), for every object1442 reachable from the starting directory. The path will be slash-joined, and1443 the filecap/dircap will contain a link to the object in question. This page1444 gives immediate access to every object in the virtual filesystem subtree.1445 1446 This operation uses the same ophandle= mechanism as deep-check. The1447 corresponding /operations/$HANDLE page has three different forms. The1448 default is output=HTML.1449 1450 If output=text is added to the query args, the results will be a text/plain1451 list. The first line is special: it is either "finished: yes" or "finished:1452 no"; if the operation is not finished, you must periodically reload the1453 page until it completes. The rest of the results are a plaintext list, with1454 one file/dir per line, slash-separated, with the filecap/dircap separated1455 by a space.1456 1457 If output=JSON is added to the queryargs, then the results will be a1458 JSON-formatted dictionary with six keys. Note that because large directory1459 structures can result in very large JSON results, the full results will not1460 be available until the operation is complete (i.e. until output["finished"]1461 is True):1462 1463 finished (bool): if False then you must reload the page until True1464 origin_si (base32 str): the storage index of the starting point1465 manifest: list of (path, cap) tuples, where path is a list of strings.1466 verifycaps: list of (printable) verify cap strings1467 storage-index: list of (base32) storage index strings1468 stats: a dictionary with the same keys as the t=start-deep-stats command1469 (described below)1470 1471 POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ)1472 1473 This operation generates a number (in bytes) containing the sum of the1474 filesize of all directories and immutable files reachable from the given1475 directory. This is a rough lower bound of the total space consumed by this1476 subtree. It does not include space consumed by mutable files, nor does it1477 take expansion or encoding overhead into account. Later versions of the1478 code may improve this estimate upwards.1479 1480 The /operations/$HANDLE status output consists of two lines of text:1481 1482 finished: yes1483 size: 12341484 1485 POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ)1486 1487 This operation performs a recursive walk of all files and directories1488 reachable from the given directory, and generates a collection of1489 statistics about those objects.1490 1491 The result (obtained from the /operations/$OPHANDLE page) is a1492 JSON-serialized dictionary with the following keys (note that some of these1493 keys may be missing until 'finished' is True):1494 1495 finished: (bool) True if the operation has finished, else False1496 count-immutable-files: count of how many CHK files are in the set1497 count-mutable-files: same, for mutable files (does not include directories)1498 count-literal-files: same, for LIT files (data contained inside the URI)1499 count-files: sum of the above three1500 count-directories: count of directories1501 count-unknown: count of unrecognized objects (perhaps from the future)1502 size-immutable-files: total bytes for all CHK files in the set, =deep-size1503 size-mutable-files (TODO): same, for current version of all mutable files1504 size-literal-files: same, for LIT files1505 size-directories: size of directories (includes size-literal-files)1506 size-files-histogram: list of (minsize, maxsize, count) buckets,1507 with a histogram of filesizes, 5dB/bucket,1508 for both literal and immutable files1509 largest-directory: number of children in the largest directory1510 largest-immutable-file: number of bytes in the largest CHK file1511 1512 size-mutable-files is not implemented, because it would require extra1513 queries to each mutable file to get their size. This may be implemented in1514 the future.1515 1516 Assuming no sharing, the basic space consumed by a single root directory is1517 the sum of size-immutable-files, size-mutable-files, and size-directories.1518 The actual disk space used by the shares is larger, because of the1519 following sources of overhead:1520 1521 integrity data1522 expansion due to erasure coding1523 share management data (leases)1524 backend (ext3) minimum block size1525 1526 POST $URL?t=stream-manifest1527 1528 This operation performs a recursive walk of all files and directories1529 reachable from the given starting point. For each such unique object1530 (duplicates are skipped), a single line of JSON is emitted to the HTTP1531 response channel (or an error indication, see below). When the walk is1532 complete, a final line of JSON is emitted which contains the accumulated1533 file-size/count "deep-stats" data.1534 1535 A CLI tool can split the response stream on newlines into "response units",1536 and parse each response unit as JSON. Each such parsed unit will be a1537 dictionary, and will contain at least the "type" key: a string, one of1538 "file", "directory", or "stats".1539 1540 For all units that have a type of "file" or "directory", the dictionary will1541 contain the following keys:1542 1543 "path": a list of strings, with the path that is traversed to reach the1544 object1545 "cap": a write-cap URI for the file or directory, if available, else a1546 read-cap URI1547 "verifycap": a verify-cap URI for the file or directory1548 "repaircap": an URI for the weakest cap that can still be used to repair1549 the object1550 "storage-index": a base32 storage index for the object1551 1552 Note that non-distributed files (i.e. LIT files) will have values of None1553 for verifycap, repaircap, and storage-index, since these files can neither1554 be verified nor repaired, and are not stored on the storage servers.1555 1556 The last unit in the stream will have a type of "stats", and will contain1557 the keys described in the "start-deep-stats" operation, below.1558 1559 If any errors occur during the traversal (specifically if a directory is1560 unrecoverable, such that further traversal is not possible), an error1561 indication is written to the response body, instead of the usual line of1562 JSON. This error indication line will begin with the string "ERROR:" (in all1563 caps), and contain a summary of the error on the rest of the line. The1564 remaining lines of the response body will be a python exception. The client1565 application should look for the ERROR: and stop processing JSON as soon as1566 it is seen. The line just before the ERROR: will describe the directory that1567 was untraversable, since the manifest entry is emitted to the HTTP response1568 body before the child is traversed.1569 1570 == Other Useful Pages ==1571 1572 The portion of the web namespace that begins with "/uri" (and "/named") is1573 dedicated to giving users (both humans and programs) access to the Tahoe1574 virtual filesystem. The rest of the namespace provides status information1575 about the state of the Tahoe node.1576 1577 GET / (the root page)1578 1579 This is the "Welcome Page", and contains a few distinct sections:1580 1581 Node information: library versions, local nodeid, services being provided.1582 1583 Filesystem Access Forms: create a new directory, view a file/directory by1584 URI, upload a file (unlinked), download a file by1585 URI.1586 1587 Grid Status: introducer information, helper information, connected storage1588 servers.1589 1590 GET /status/1591 1592 This page lists all active uploads and downloads, and contains a short list1593 of recent upload/download operations. Each operation has a link to a page1594 that describes file sizes, servers that were involved, and the time consumed1595 in each phase of the operation.1596 1597 A GET of /status/?t=json will contain a machine-readable subset of the same1598 data. It returns a JSON-encoded dictionary. The only key defined at this1599 time is "active", with a value that is a list of operation dictionaries, one1600 for each active operation. Once an operation is completed, it will no longer1601 appear in data["active"] .1602 1603 Each op-dict contains a "type" key, one of "upload", "download",1604 "mapupdate", "publish", or "retrieve" (the first two are for immutable1605 files, while the latter three are for mutable files and directories).1606 1607 The "upload" op-dict will contain the following keys:1608 1609 type (string): "upload"1610 storage-index-string (string): a base32-encoded storage index1611 total-size (int): total size of the file1612 status (string): current status of the operation1613 progress-hash (float): 1.0 when the file has been hashed1614 progress-ciphertext (float): 1.0 when the file has been encrypted.1615 progress-encode-push (float): 1.0 when the file has been encoded and1616 pushed to the storage servers. For helper1617 uploads, the ciphertext value climbs to 1.01618 first, then encoding starts. For unassisted1619 uploads, ciphertext and encode-push progress1620 will climb at the same pace.1621 1622 The "download" op-dict will contain the following keys:1623 1624 type (string): "download"1625 storage-index-string (string): a base32-encoded storage index1626 total-size (int): total size of the file1627 status (string): current status of the operation1628 progress (float): 1.0 when the file has been fully downloaded1629 1630 Front-ends which want to report progress information are advised to simply1631 average together all the progress-* indicators. A slightly more accurate1632 value can be found by ignoring the progress-hash value (since the current1633 implementation hashes synchronously, so clients will probably never see1634 progress-hash!=1.0).1635 1636 GET /provisioning/1637 1638 This page provides a basic tool to predict the likely storage and bandwidth1639 requirements of a large Tahoe grid. It provides forms to input things like1640 total number of users, number of files per user, average file size, number1641 of servers, expansion ratio, hard drive failure rate, etc. It then provides1642 numbers like how many disks per server will be needed, how many read1643 operations per second should be expected, and the likely MTBF for files in1644 the grid. This information is very preliminary, and the model upon which it1645 is based still needs a lot of work.1646 1647 GET /helper_status/1648 1649 If the node is running a helper (i.e. if [helper]enabled is set to True in1650 tahoe.cfg), then this page will provide a list of all the helper operations1651 currently in progress. If "?t=json" is added to the URL, it will return a1652 JSON-formatted list of helper statistics, which can then be used to produce1653 graphs to indicate how busy the helper is.1654 1655 GET /statistics/1656 1657 This page provides "node statistics", which are collected from a variety of1658 sources.1659 1660 load_monitor: every second, the node schedules a timer for one second in1661 the future, then measures how late the subsequent callback1662 is. The "load_average" is this tardiness, measured in1663 seconds, averaged over the last minute. It is an indication1664 of a busy node, one which is doing more work than can be1665 completed in a timely fashion. The "max_load" value is the1666 highest value that has been seen in the last 60 seconds.1667 1668 cpu_monitor: every minute, the node uses time.clock() to measure how much1669 CPU time it has used, and it uses this value to produce1670 1min/5min/15min moving averages. These values range from 0%1671 (0.0) to 100% (1.0), and indicate what fraction of the CPU1672 has been used by the Tahoe node. Not all operating systems1673 provide meaningful data to time.clock(): they may report 100%1674 CPU usage at all times.1675 1676 uploader: this counts how many immutable files (and bytes) have been1677 uploaded since the node was started1678 1679 downloader: this counts how many immutable files have been downloaded1680 since the node was started1681 1682 publishes: this counts how many mutable files (including directories) have1683 been modified since the node was started1684 1685 retrieves: this counts how many mutable files (including directories) have1686 been read since the node was started1687 1688 There are other statistics that are tracked by the node. The "raw stats"1689 section shows a formatted dump of all of them.1690 1691 By adding "?t=json" to the URL, the node will return a JSON-formatted1692 dictionary of stats values, which can be used by other tools to produce1693 graphs of node behavior. The misc/munin/ directory in the source1694 distribution provides some tools to produce these graphs.1695 1696 GET / (introducer status)1697 1698 For Introducer nodes, the welcome page displays information about both1699 clients and servers which are connected to the introducer. Servers make1700 "service announcements", and these are listed in a table. Clients will1701 subscribe to hear about service announcements, and these subscriptions are1702 listed in a separate table. Both tables contain information about what1703 version of Tahoe is being run by the remote node, their advertised and1704 outbound IP addresses, their nodeid and nickname, and how long they have1705 been available.1706 1707 By adding "?t=json" to the URL, the node will return a JSON-formatted1708 dictionary of stats values, which can be used to produce graphs of connected1709 clients over time. This dictionary has the following keys:1710 1711 ["subscription_summary"] : a dictionary mapping service name (like1712 "storage") to an integer with the number of1713 clients that have subscribed to hear about that1714 service1715 ["announcement_summary"] : a dictionary mapping service name to an integer1716 with the number of servers which are announcing1717 that service1718 ["announcement_distinct_hosts"] : a dictionary mapping service name to an1719 integer which represents the number of1720 distinct hosts that are providing that1721 service. If two servers have announced1722 FURLs which use the same hostnames (but1723 different ports and tubids), they are1724 considered to be on the same host.1725 1726 1727 == Static Files in /public_html ==1728 1729 The webapi server will take any request for a URL that starts with /static1730 and serve it from a configurable directory which defaults to1731 $BASEDIR/public_html . This is configured by setting the "[node]web.static"1732 value in $BASEDIR/tahoe.cfg . If this is left at the default value of1733 "public_html", then http://localhost:3456/static/subdir/foo.html will be1734 served with the contents of the file $BASEDIR/public_html/subdir/foo.html .1735 1736 This can be useful to serve a javascript application which provides a1737 prettier front-end to the rest of the Tahoe webapi.1738 1739 1740 == Safety and security issues -- names vs. URIs ==1741 1742 Summary: use explicit file- and dir- caps whenever possible, to reduce the1743 potential for surprises when the filesystem structure is changed.1744 1745 Tahoe provides a mutable filesystem, but the ways that the filesystem can1746 change are limited. The only thing that can change is that the mapping from1747 child names to child objects that each directory contains can be changed by1748 adding a new child name pointing to an object, removing an existing child name,1749 or changing an existing child name to point to a different object.1750 1751 Obviously if you query Tahoe for information about the filesystem and then act1752 to change the filesystem (such as by getting a listing of the contents of a1753 directory and then adding a file to the directory), then the filesystem might1754 have been changed after you queried it and before you acted upon it. However,1755 if you use the URI instead of the pathname of an object when you act upon the1756 object, then the only change that can happen is if the object is a directory1757 then the set of child names it has might be different. If, on the other hand,1758 you act upon the object using its pathname, then a different object might be in1759 that place, which can result in more kinds of surprises.1760 1761 For example, suppose you are writing code which recursively downloads the1762 contents of a directory. The first thing your code does is fetch the listing1763 of the contents of the directory. For each child that it fetched, if that1764 child is a file then it downloads the file, and if that child is a directory1765 then it recurses into that directory. Now, if the download and the recurse1766 actions are performed using the child's name, then the results might be1767 wrong, because for example a child name that pointed to a sub-directory when1768 you listed the directory might have been changed to point to a file (in which1769 case your attempt to recurse into it would result in an error and the file1770 would be skipped), or a child name that pointed to a file when you listed the1771 directory might now point to a sub-directory (in which case your attempt to1772 download the child would result in a file containing HTML text describing the1773 sub-directory!).1774 1775 If your recursive algorithm uses the uri of the child instead of the name of1776 the child, then those kinds of mistakes just can't happen. Note that both the1777 child's name and the child's URI are included in the results of listing the1778 parent directory, so it isn't any harder to use the URI for this purpose.1779 1780 The read and write caps in a given directory node are separate URIs, and1781 can't be assumed to point to the same object even if they were retrieved in1782 the same operation (although the webapi server attempts to ensure this1783 in most cases). If you need to rely on that property, you should explicitly1784 verify it. More generally, you should not make assumptions about the1785 internal consistency of the contents of mutable directories. As a result1786 of the signatures on mutable object versions, it is guaranteed that a given1787 version was written in a single update, but -- as in the case of a file --1788 the contents may have been chosen by a malicious writer in a way that is1789 designed to confuse applications that rely on their consistency.1790 1791 In general, use names if you want "whatever object (whether file or1792 directory) is found by following this name (or sequence of names) when my1793 request reaches the server". Use URIs if you want "this particular object".1794 1795 == Concurrency Issues ==1796 1797 Tahoe uses both mutable and immutable files. Mutable files can be created1798 explicitly by doing an upload with ?mutable=true added, or implicitly by1799 creating a new directory (since a directory is just a special way to1800 interpret a given mutable file).1801 1802 Mutable files suffer from the same consistency-vs-availability tradeoff that1803 all distributed data storage systems face. It is not possible to1804 simultaneously achieve perfect consistency and perfect availability in the1805 face of network partitions (servers being unreachable or faulty).1806 1807 Tahoe tries to achieve a reasonable compromise, but there is a basic rule in1808 place, known as the Prime Coordination Directive: "Don't Do That". What this1809 means is that if write-access to a mutable file is available to several1810 parties, then those parties are responsible for coordinating their activities1811 to avoid multiple simultaneous updates. This could be achieved by having1812 these parties talk to each other and using some sort of locking mechanism, or1813 by serializing all changes through a single writer.1814 1815 The consequences of performing uncoordinated writes can vary. Some of the1816 writers may lose their changes, as somebody else wins the race condition. In1817 many cases the file will be left in an "unhealthy" state, meaning that there1818 are not as many redundant shares as we would like (reducing the reliability1819 of the file against server failures). In the worst case, the file can be left1820 in such an unhealthy state that no version is recoverable, even the old ones.1821 It is this small possibility of data loss that prompts us to issue the Prime1822 Coordination Directive.1823 1824 Tahoe nodes implement internal serialization to make sure that a single Tahoe1825 node cannot conflict with itself. For example, it is safe to issue two1826 directory modification requests to a single tahoe node's webapi server at the1827 same time, because the Tahoe node will internally delay one of them until1828 after the other has finished being applied. (This feature was introduced in1829 Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing1830 web requests themselves).1831 1832 For more details, please see the "Consistency vs Availability" and "The Prime1833 Coordination Directive" sections of mutable.txt, in the same directory as1834 this file.1835 1836 1837 [1]: URLs and HTTP and UTF-8, Oh My1838 1839 HTTP does not provide a mechanism to specify the character set used to1840 encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that1841 the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.1842 For example, suppose we want to provoke the server into using a filename of1843 "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this1844 is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's1845 repr() function would show). To encode this into a URL, the non-printable1846 characters must be escaped with the urlencode '%XX' mechansim, giving us1847 "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET1848 /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers1849 provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.1850 1851 The response header will need to indicate a non-ASCII filename. The actual1852 mechanism to do this is not clear. For ASCII filenames, the response header1853 would look like:1854 1855 Content-Disposition: attachment; filename="english.txt"1856 1857 If Tahoe were to enforce the utf-8 convention, it would need to decode the1858 URL argument into a unicode string, and then encode it back into a sequence1859 of bytes when creating the response header. One possibility would be to use1860 unencoded utf-8. Developers suggest that IE7 might accept this:1861 1862 #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"1863 (note, the last four bytes of that line, not including the newline, are1864 0xC3 0xA9 0x65 0x22)1865 1866 RFC2231#4 (dated 1997): suggests that the following might work, and some1867 developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that1868 it is supported by firefox (but not IE7):1869 1870 #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e1871 1872 My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that1873 the filename= parameter is defined to be wrapped in quotes (presumeably to1874 allow spaces without breaking the parsing of subsequent parameters), which1875 would give us:1876 1877 #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"1878 1879 However this is contrary to the examples in the email thread listed above.1880 1881 Developers report that IE7 (when it is configured for UTF-8 URL encoding,1882 which is not the default in asian countries), will accept:1883 1884 #4: Content-Disposition: attachment; filename=fianc%C3%A9e1885 1886 However, for maximum compatibility, Tahoe simply copies bytes from the URL1887 into the response header, rather than enforcing the utf-8 convention. This1888 means it does not try to decode the filename from the URL argument, nor does1889 it encode the filename into the response header. -
new file docs/specifications/URI-extension.rst
diff --git a/docs/specifications/URI-extension.rst b/docs/specifications/URI-extension.rst new file mode 100644 index 0000000..6d40652
- + 1 =================== 2 URI Extension Block 3 =================== 4 5 This block is a serialized dictionary with string keys and string values 6 (some of which represent numbers, some of which are SHA-256 hashes). All 7 buckets hold an identical copy. The hash of the serialized data is kept in 8 the URI. 9 10 The download process must obtain a valid copy of this data before any 11 decoding can take place. The download process must also obtain other data 12 before incremental validation can be performed. Full-file validation (for 13 clients who do not wish to do incremental validation) can be performed solely 14 with the data from this block. 15 16 At the moment, this data block contains the following keys (and an estimate 17 on their sizes):: 18 19 size 5 20 segment_size 7 21 num_segments 2 22 needed_shares 2 23 total_shares 3 24 25 codec_name 3 26 codec_params 5+1+2+1+3=12 27 tail_codec_params 12 28 29 share_root_hash 32 (binary) or 52 (base32-encoded) each 30 plaintext_hash 31 plaintext_root_hash 32 crypttext_hash 33 crypttext_root_hash 34 35 Some pieces are needed elsewhere (size should be visible without pulling the 36 block, the Tahoe3 algorithm needs total_shares to find the right peers, all 37 peer selection algorithms need needed_shares to ask a minimal set of peers). 38 Some pieces are arguably redundant but are convenient to have present 39 (test_encode.py makes use of num_segments). 40 41 The rule for this data block is that it should be a constant size for all 42 files, regardless of file size. Therefore hash trees (which have a size that 43 depends linearly upon the number of segments) are stored elsewhere in the 44 bucket, with only the hash tree root stored in this data block. 45 46 This block will be serialized as follows:: 47 48 assert that all keys match ^[a-zA-z_\-]+$ 49 sort all the keys lexicographically 50 for k in keys: 51 write("%s:" % k) 52 write(netstring(data[k])) 53 54 55 Serialized size:: 56 57 dense binary (but decimal) packing: 160+46=206 58 including 'key:' (185) and netstring (6*3+7*4=46) on values: 231 59 including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=504 60 61 We'll go with the 231-sized block, and provide a tool to dump it as text if 62 we really want one. -
deleted file docs/specifications/URI-extension.txt
diff --git a/docs/specifications/URI-extension.txt b/docs/specifications/URI-extension.txt deleted file mode 100644 index 8ec383e..0000000
+ - 1 2 "URI Extension Block"3 4 This block is a serialized dictionary with string keys and string values5 (some of which represent numbers, some of which are SHA-256 hashes). All6 buckets hold an identical copy. The hash of the serialized data is kept in7 the URI.8 9 The download process must obtain a valid copy of this data before any10 decoding can take place. The download process must also obtain other data11 before incremental validation can be performed. Full-file validation (for12 clients who do not wish to do incremental validation) can be performed solely13 with the data from this block.14 15 At the moment, this data block contains the following keys (and an estimate16 on their sizes):17 18 size 519 segment_size 720 num_segments 221 needed_shares 222 total_shares 323 24 codec_name 325 codec_params 5+1+2+1+3=1226 tail_codec_params 1227 28 share_root_hash 32 (binary) or 52 (base32-encoded) each29 plaintext_hash30 plaintext_root_hash31 crypttext_hash32 crypttext_root_hash33 34 Some pieces are needed elsewhere (size should be visible without pulling the35 block, the Tahoe3 algorithm needs total_shares to find the right peers, all36 peer selection algorithms need needed_shares to ask a minimal set of peers).37 Some pieces are arguably redundant but are convenient to have present38 (test_encode.py makes use of num_segments).39 40 The rule for this data block is that it should be a constant size for all41 files, regardless of file size. Therefore hash trees (which have a size that42 depends linearly upon the number of segments) are stored elsewhere in the43 bucket, with only the hash tree root stored in this data block.44 45 This block will be serialized as follows:46 47 assert that all keys match ^[a-zA-z_\-]+$48 sort all the keys lexicographically49 for k in keys:50 write("%s:" % k)51 write(netstring(data[k]))52 53 54 Serialized size:55 56 dense binary (but decimal) packing: 160+46=20657 including 'key:' (185) and netstring (6*3+7*4=46) on values: 23158 including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=50459 60 We'll go with the 231-sized block, and provide a tool to dump it as text if61 we really want one. -
new file docs/specifications/dirnodes.rst
diff --git a/docs/specifications/dirnodes.rst b/docs/specifications/dirnodes.rst new file mode 100644 index 0000000..129e499
- + 1 ========================== 2 Tahoe-LAFS Directory Nodes 3 ========================== 4 5 As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as 6 a collection of three layers. The lowest layer is the key-value store: it 7 provides operations that accept files and upload them to the grid, creating 8 a URI in the process which securely references the file's contents. 9 The middle layer is the filesystem, creating a structure of directories and 10 filenames resembling the traditional unix/windows filesystems. The top layer 11 is the application layer, which uses the lower layers to provide useful 12 services to users, like a backup application, or a way to share files with 13 friends. 14 15 This document examines the middle layer, the "filesystem". 16 17 1. `Key-value Store Primitives`_ 18 2. `Filesystem goals`_ 19 3. `Dirnode goals`_ 20 4. `Dirnode secret values`_ 21 5. `Dirnode storage format`_ 22 6. `Dirnode sizes, mutable-file initial read sizes`_ 23 7. `Design Goals, redux`_ 24 25 1. `Confidentiality leaks in the storage servers`_ 26 2. `Integrity failures in the storage servers`_ 27 3. `Improving the efficiency of dirnodes`_ 28 4. `Dirnode expiration and leases`_ 29 30 8. `Starting Points: root dirnodes`_ 31 9. `Mounting and Sharing Directories`_ 32 10. `Revocation`_ 33 34 Key-value Store Primitives 35 ========================== 36 37 In the lowest layer (key-value store), there are two operations that reference 38 immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or 39 "CHK read-caps"). One puts data into the grid (but only if it doesn't exist 40 already), the other retrieves it:: 41 42 chk_uri = put(data) 43 data = get(chk_uri) 44 45 We also have three operations which reference mutable data (which we refer to 46 as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK 47 slots"). One creates a slot with some initial contents, a second replaces the 48 contents of a pre-existing slot, and the third retrieves the contents:: 49 50 mutable_uri = create(initial_data) 51 replace(mutable_uri, new_data) 52 data = get(mutable_uri) 53 54 Filesystem Goals 55 ================ 56 57 The main goal for the middle (filesystem) layer is to give users a way to 58 organize the data that they have uploaded into the grid. The traditional way 59 to do this in computer filesystems is to put this data into files, give those 60 files names, and collect these names into directories. 61 62 Each directory is a set of name-entry pairs, each of which maps a "child name" 63 to a directory entry pointing to an object of some kind. Those child objects 64 might be files, or they might be other directories. Each directory entry also 65 contains metadata. 66 67 The directory structure is therefore a directed graph of nodes, in which each 68 node might be a directory node or a file node. All file nodes are terminal 69 nodes. 70 71 Dirnode Goals 72 ============= 73 74 What properties might be desirable for these directory nodes? In no 75 particular order: 76 77 1. functional. Code which does not work doesn't count. 78 2. easy to document, explain, and understand 79 3. confidential: it should not be possible for others to see the contents of 80 a directory 81 4. integrity: it should not be possible for others to modify the contents 82 of a directory 83 5. available: directories should survive host failure, just like files do 84 6. efficient: in storage, communication bandwidth, number of round-trips 85 7. easy to delegate individual directories in a flexible way 86 8. updateness: everybody looking at a directory should see the same contents 87 9. monotonicity: everybody looking at a directory should see the same 88 sequence of updates 89 90 Some of these goals are mutually exclusive. For example, availability and 91 consistency are opposing, so it is not possible to achieve #5 and #8 at the 92 same time. Moreover, it takes a more complex architecture to get close to the 93 available-and-consistent ideal, so #2/#6 is in opposition to #5/#8. 94 95 Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key 96 cryptography for integrity, and erasure coding for availability. These 97 achieve roughly the same properties as immutable CHK files, but their 98 contents can be replaced without changing their identity. Dirnodes are then 99 just a special way of interpreting the contents of a specific mutable file. 100 Earlier releases used a "vdrive server": this server was abolished in the 101 v0.7.0 release. 102 103 For details of how mutable files work, please see "mutable.txt" in this 104 directory. 105 106 For releases since v0.7.0, we achieve most of our desired properties. The 107 integrity and availability of dirnodes is equivalent to that of regular 108 (immutable) files, with the exception that there are more simultaneous-update 109 failure modes for mutable slots. Delegation is quite strong: you can give 110 read-write or read-only access to any subtree, and the data format used for 111 dirnodes is such that read-only access is transitive: i.e. if you grant Bob 112 read-only access to a parent directory, then Bob will get read-only access 113 (and *not* read-write access) to its children. 114 115 Relative to the previous "vdrive-server" based scheme, the current 116 distributed dirnode approach gives better availability, but cannot guarantee 117 updateness quite as well, and requires far more network traffic for each 118 retrieval and update. Mutable files are somewhat less available than 119 immutable files, simply because of the increased number of combinations 120 (shares of an immutable file are either present or not, whereas there are 121 multiple versions of each mutable file, and you might have some shares of 122 version 1 and other shares of version 2). In extreme cases of simultaneous 123 update, mutable files might suffer from non-monotonicity. 124 125 126 Dirnode secret values 127 ===================== 128 129 As mentioned before, dirnodes are simply a special way to interpret the 130 contents of a mutable file, so the secret keys and capability strings 131 described in "mutable.txt" are all the same. Each dirnode contains an RSA 132 public/private keypair, and the holder of the "write capability" will be able 133 to retrieve the private key (as well as the AES encryption key used for the 134 data itself). The holder of the "read capability" will be able to obtain the 135 public key and the AES data key, but not the RSA private key needed to modify 136 the data. 137 138 The "write capability" for a dirnode grants read-write access to its 139 contents. This is expressed on concrete form as the "dirnode write cap": a 140 printable string which contains the necessary secrets to grant this access. 141 Likewise, the "read capability" grants read-only access to a dirnode, and can 142 be represented by a "dirnode read cap" string. 143 144 For example, 145 URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o 146 is a write-capability URI, while 147 URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o 148 is a read-capability URI, both for the same dirnode. 149 150 151 Dirnode storage format 152 ====================== 153 154 Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS 155 grid. The contents of this file are a serialized list of netstrings, one per 156 child. Each child is a list of four netstrings: (name, rocap, rwcap, 157 metadata). (Remember that the contents of the mutable file are encrypted by 158 the read-cap, so this section describes the plaintext contents of the mutable 159 file, *after* it has been decrypted by the read-cap.) 160 161 The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only 162 capability URI to that child, either an immutable (CHK) file, a mutable file, 163 or a directory. It is also possible to store 'unknown' URIs that are not 164 recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write 165 capability URI for that child, encrypted with the dirnode's write-cap: this 166 enables the "transitive readonlyness" property, described further below. The 167 'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some 168 metadata keys are pre-defined, the rest are left up to the application. 169 170 Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random 171 value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI 172 string, using a key that is formed from a tagged hash of the IV and the 173 dirnode's writekey. The MAC is written only for compatibility with older 174 Tahoe-LAFS versions and is no longer verified. 175 176 If Bob has read-only access to the 'bar' directory, and he adds it as a child 177 to the 'foo' directory, then he will put the read-only cap for 'bar' in both 178 the rwcap and rocap slots (encrypting the rwcap contents as described above). 179 If he has full read-write access to 'bar', then he will put the read-write 180 cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since 181 other users who have read-only access to 'foo' will be unable to decrypt its 182 rwcap slot, this limits those users to read-only access to 'bar' as well, 183 thus providing the transitive readonlyness that we desire. 184 185 Dirnode sizes, mutable-file initial read sizes 186 ============================================== 187 188 How big are dirnodes? When reading dirnode data out of mutable files, how 189 large should our initial read be? If we guess exactly, we can read a dirnode 190 in a single round-trip, and update one in two RTT. If we guess too high, 191 we'll waste some amount of bandwidth. If we guess low, we need to make a 192 second pass to get the data (or the encrypted privkey, for writes), which 193 will cost us at least another RTT. 194 195 Assuming child names are between 10 and 99 characters long, how long are the 196 various pieces of a dirnode? 197 198 :: 199 200 netstring(name) ~= 4+len(name) 201 chk-cap = 97 (for 4-char filesizes) 202 dir-rw-cap = 88 203 dir-ro-cap = 91 204 netstring(cap) = 4+len(cap) 205 encrypted(cap) = 16+cap+32 206 JSON({}) = 2 207 JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137 208 netstring(metadata) = 4+137 = 141 209 210 so a CHK entry is:: 211 212 5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137 213 214 And a 15-byte filename gives a 416-byte entry. When the entry points at a 215 subdirectory instead of a file, the entry is a little bit smaller. So an 216 empty directory uses 0 bytes, a directory with one child uses about 416 217 bytes, a directory with two children uses about 832, etc. 218 219 When the dirnode data is encoding using our default 3-of-10, that means we 220 get 139ish bytes of data in each share per child. 221 222 The pubkey, signature, and hashes form the first 935ish bytes of the 223 container, then comes our data, then about 1216 bytes of encprivkey. So if we 224 read the first:: 225 226 1kB: we get 65bytes of dirnode data : only empty directories 227 2kB: 1065bytes: about 8 228 3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey 229 4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey 230 231 So we've written the code to do an initial read of 4kB from each share when 232 we read the mutable file, which should give good performance (one RTT) for 233 small directories. 234 235 236 Design Goals, redux 237 =================== 238 239 How well does this design meet the goals? 240 241 1. functional: YES: the code works and has extensive unit tests 242 2. documentable: YES: this document is the existence proof 243 3. confidential: YES: see below 244 4. integrity: MOSTLY: a coalition of storage servers can rollback individual 245 mutable files, but not a single one. No server can 246 substitute fake data as genuine. 247 5. availability: YES: as long as 'k' storage servers are present and have 248 the same version of the mutable file, the dirnode will 249 be available. 250 6. efficient: MOSTLY: 251 network: single dirnode lookup is very efficient, since clients can 252 fetch specific keys rather than being required to get or set 253 the entire dirnode each time. Traversing many directories 254 takes a lot of roundtrips, and these can't be collapsed with 255 promise-pipelining because the intermediate values must only 256 be visible to the client. Modifying many dirnodes at once 257 (e.g. importing a large pre-existing directory tree) is pretty 258 slow, since each graph edge must be created independently. 259 storage: each child has a separate IV, which makes them larger than 260 if all children were aggregated into a single encrypted string 261 7. delegation: VERY: each dirnode is a completely independent object, 262 to which clients can be granted separate read-write or 263 read-only access 264 8. updateness: VERY: with only a single point of access, and no caching, 265 each client operation starts by fetching the current 266 value, so there are no opportunities for staleness 267 9. monotonicity: VERY: the single point of access also protects against 268 retrograde motion 269 270 271 272 Confidentiality leaks in the storage servers 273 -------------------------------------------- 274 275 Dirnode (and the mutable files upon which they are based) are very private 276 against other clients: traffic between the client and the storage servers is 277 protected by the Foolscap SSL connection, so they can observe very little. 278 Storage index values are hashes of secrets and thus unguessable, and they are 279 not made public, so other clients cannot snoop through encrypted dirnodes 280 that they have not been told about. 281 282 Storage servers can observe access patterns and see ciphertext, but they 283 cannot see the plaintext (of child names, metadata, or URIs). If an attacker 284 operates a significant number of storage servers, they can infer the shape of 285 the directory structure by assuming that directories are usually accessed 286 from root to leaf in rapid succession. Since filenames are usually much 287 shorter than read-caps and write-caps, the attacker can use the length of the 288 ciphertext to guess the number of children of each node, and might be able to 289 guess the length of the child names (or at least their sum). From this, the 290 attacker may be able to build up a graph with the same shape as the plaintext 291 filesystem, but with unlabeled edges and unknown file contents. 292 293 294 Integrity failures in the storage servers 295 ----------------------------------------- 296 297 The mutable file's integrity mechanism (RSA signature on the hash of the file 298 contents) prevents the storage server from modifying the dirnode's contents 299 without detection. Therefore the storage servers can make the dirnode 300 unavailable, but not corrupt it. 301 302 A sufficient number of colluding storage servers can perform a rollback 303 attack: replace all shares of the whole mutable file with an earlier version. 304 To prevent this, when retrieving the contents of a mutable file, the 305 client queries more servers than necessary and uses the highest available 306 version number. This insures that one or two misbehaving storage servers 307 cannot cause this rollback on their own. 308 309 310 Improving the efficiency of dirnodes 311 ------------------------------------ 312 313 The current mutable-file -based dirnode scheme suffers from certain 314 inefficiencies. A very large directory (with thousands or millions of 315 children) will take a significant time to extract any single entry, because 316 the whole file must be downloaded first, then parsed and searched to find the 317 desired child entry. Likewise, modifying a single child will require the 318 whole file to be re-uploaded. 319 320 The current design assumes (and in some cases, requires) that dirnodes remain 321 small. The mutable files on which dirnodes are based are currently using 322 "SDMF" ("Small Distributed Mutable File") design rules, which state that the 323 size of the data shall remain below one megabyte. More advanced forms of 324 mutable files (MDMF and LDMF) are in the design phase to allow efficient 325 manipulation of larger mutable files. This would reduce the work needed to 326 modify a single entry in a large directory. 327 328 Judicious caching may help improve the reading-large-directory case. Some 329 form of mutable index at the beginning of the dirnode might help as well. The 330 MDMF design rules allow for efficient random-access reads from the middle of 331 the file, which would give the index something useful to point at. 332 333 The current SDMF design generates a new RSA public/private keypair for each 334 directory. This takes considerable time and CPU effort, generally one or two 335 seconds per directory. We have designed (but not yet built) a DSA-based 336 mutable file scheme which will use shared parameters to reduce the 337 directory-creation effort to a bare minimum (picking a random number instead 338 of generating two random primes). 339 340 When a backup program is run for the first time, it needs to copy a large 341 amount of data from a pre-existing filesystem into reliable storage. This 342 means that a large and complex directory structure needs to be duplicated in 343 the dirnode layer. With the one-object-per-dirnode approach described here, 344 this requires as many operations as there are edges in the imported 345 filesystem graph. 346 347 Another approach would be to aggregate multiple directories into a single 348 storage object. This object would contain a serialized graph rather than a 349 single name-to-child dictionary. Most directory operations would fetch the 350 whole block of data (and presumeably cache it for a while to avoid lots of 351 re-fetches), and modification operations would need to replace the whole 352 thing at once. This "realm" approach would have the added benefit of 353 combining more data into a single encrypted bundle (perhaps hiding the shape 354 of the graph from a determined attacker), and would reduce round-trips when 355 performing deep directory traversals (assuming the realm was already cached). 356 It would also prevent fine-grained rollback attacks from working: a coalition 357 of storage servers could change the entire realm to look like an earlier 358 state, but it could not independently roll back individual directories. 359 360 The drawbacks of this aggregation would be that small accesses (adding a 361 single child, looking up a single child) would require pulling or pushing a 362 lot of unrelated data, increasing network overhead (and necessitating 363 test-and-set semantics for the modification side, which increases the chances 364 that a user operation will fail, making it more challenging to provide 365 promises of atomicity to the user). 366 367 It would also make it much more difficult to enable the delegation 368 ("sharing") of specific directories. Since each aggregate "realm" provides 369 all-or-nothing access control, the act of delegating any directory from the 370 middle of the realm would require the realm first be split into the upper 371 piece that isn't being shared and the lower piece that is. This splitting 372 would have to be done in response to what is essentially a read operation, 373 which is not traditionally supposed to be a high-effort action. On the other 374 hand, it may be possible to aggregate the ciphertext, but use distinct 375 encryption keys for each component directory, to get the benefits of both 376 schemes at once. 377 378 379 Dirnode expiration and leases 380 ----------------------------- 381 382 Dirnodes are created any time a client wishes to add a new directory. How 383 long do they live? What's to keep them from sticking around forever, taking 384 up space that nobody can reach any longer? 385 386 Mutable files are created with limited-time "leases", which keep the shares 387 alive until the last lease has expired or been cancelled. Clients which know 388 and care about specific dirnodes can ask to keep them alive for a while, by 389 renewing a lease on them (with a typical period of one month). Clients are 390 expected to assist in the deletion of dirnodes by canceling their leases as 391 soon as they are done with them. This means that when a client deletes a 392 directory, it should also cancel its lease on that directory. When the lease 393 count on a given share goes to zero, the storage server can delete the 394 related storage. Multiple clients may all have leases on the same dirnode: 395 the server may delete the shares only after all of the leases have gone away. 396 397 We expect that clients will periodically create a "manifest": a list of 398 so-called "refresh capabilities" for all of the dirnodes and files that they 399 can reach. They will give this manifest to the "repairer", which is a service 400 that keeps files (and dirnodes) alive on behalf of clients who cannot take on 401 this responsibility for themselves. These refresh capabilities include the 402 storage index, but do *not* include the readkeys or writekeys, so the 403 repairer does not get to read the files or directories that it is helping to 404 keep alive. 405 406 After each change to the user's vdrive, the client creates a manifest and 407 looks for differences from their previous version. Anything which was removed 408 prompts the client to send out lease-cancellation messages, allowing the data 409 to be deleted. 410 411 412 Starting Points: root dirnodes 413 ============================== 414 415 Any client can record the URI of a directory node in some external form (say, 416 in a local file) and use it as the starting point of later traversal. Each 417 Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first 418 start using the grid, and record its URI for later use. 419 420 Mounting and Sharing Directories 421 ================================ 422 423 The biggest benefit of this dirnode approach is that sharing individual 424 directories is almost trivial. Alice creates a subdirectory that she wants to 425 use to share files with Bob. This subdirectory is attached to Alice's 426 filesystem at "~alice/share-with-bob". She asks her filesystem for the 427 read-write directory URI for that new directory, and emails it to Bob. When 428 Bob receives the URI, he asks his own local vdrive to attach the given URI, 429 perhaps at a place named "~bob/shared-with-alice". Every time either party 430 writes a file into this directory, the other will be able to read it. If 431 Alice prefers, she can give a read-only URI to Bob instead, and then Bob will 432 be able to read files but not change the contents of the directory. Neither 433 Alice nor Bob will get access to any files above the mounted directory: there 434 are no 'parent directory' pointers. If Alice creates a nested set of 435 directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to 436 share-with-bob to Bob, then Bob will be unable to write to either 437 share-with-bob/ or subdir2/. 438 439 A suitable UI needs to be created to allow users to easily perform this 440 sharing action: dragging a folder their vdrive to an IM or email user icon, 441 for example. The UI will need to give the sending user an opportunity to 442 indicate whether they want to grant read-write or read-only access to the 443 recipient. The recipient then needs an interface to drag the new folder into 444 their vdrive and give it a home. 445 446 Revocation 447 ========== 448 449 When Alice decides that she no longer wants Bob to be able to access the 450 shared directory, what should she do? Suppose she's shared this folder with 451 both Bob and Carol, and now she wants Carol to retain access to it but Bob to 452 be shut out. Ideally Carol should not have to do anything: her access should 453 continue unabated. 454 455 The current plan is to have her client create a deep copy of the folder in 456 question, delegate access to the new folder to the remaining members of the 457 group (Carol), asking the lucky survivors to replace their old reference with 458 the new one. Bob may still have access to the old folder, but he is now the 459 only one who cares: everyone else has moved on, and he will no longer be able 460 to see their new changes. In a strict sense, this is the strongest form of 461 revocation that can be accomplished: there is no point trying to force Bob to 462 forget about the files that he read a moment before being kicked out. In 463 addition it must be noted that anyone who can access the directory can proxy 464 for Bob, reading files to him and accepting changes whenever he wants. 465 Preventing delegation between communication parties is just as pointless as 466 asking Bob to forget previously accessed files. However, there may be value 467 to configuring the UI to ask Carol to not share files with Bob, or to 468 removing all files from Bob's view at the same time his access is revoked. 469 -
deleted file docs/specifications/dirnodes.txt
diff --git a/docs/specifications/dirnodes.txt b/docs/specifications/dirnodes.txt deleted file mode 100644 index fad7641..0000000
+ - 1 2 = Tahoe-LAFS Directory Nodes =3 4 As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as5 a collection of three layers. The lowest layer is the key-value store: it6 provides operations that accept files and upload them to the grid, creating7 a URI in the process which securely references the file's contents.8 The middle layer is the filesystem, creating a structure of directories and9 filenames resembling the traditional unix/windows filesystems. The top layer10 is the application layer, which uses the lower layers to provide useful11 services to users, like a backup application, or a way to share files with12 friends.13 14 This document examines the middle layer, the "filesystem".15 16 == Key-value Store Primitives ==17 18 In the lowest layer (key-value store), there are two operations that reference19 immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or20 "CHK read-caps"). One puts data into the grid (but only if it doesn't exist21 already), the other retrieves it:22 23 chk_uri = put(data)24 data = get(chk_uri)25 26 We also have three operations which reference mutable data (which we refer to27 as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK28 slots"). One creates a slot with some initial contents, a second replaces the29 contents of a pre-existing slot, and the third retrieves the contents:30 31 mutable_uri = create(initial_data)32 replace(mutable_uri, new_data)33 data = get(mutable_uri)34 35 == Filesystem Goals ==36 37 The main goal for the middle (filesystem) layer is to give users a way to38 organize the data that they have uploaded into the grid. The traditional way39 to do this in computer filesystems is to put this data into files, give those40 files names, and collect these names into directories.41 42 Each directory is a set of name-entry pairs, each of which maps a "child name"43 to a directory entry pointing to an object of some kind. Those child objects44 might be files, or they might be other directories. Each directory entry also45 contains metadata.46 47 The directory structure is therefore a directed graph of nodes, in which each48 node might be a directory node or a file node. All file nodes are terminal49 nodes.50 51 == Dirnode Goals ==52 53 What properties might be desirable for these directory nodes? In no54 particular order:55 56 1: functional. Code which does not work doesn't count.57 2: easy to document, explain, and understand58 3: confidential: it should not be possible for others to see the contents of59 a directory60 4: integrity: it should not be possible for others to modify the contents61 of a directory62 5: available: directories should survive host failure, just like files do63 6: efficient: in storage, communication bandwidth, number of round-trips64 7: easy to delegate individual directories in a flexible way65 8: updateness: everybody looking at a directory should see the same contents66 9: monotonicity: everybody looking at a directory should see the same67 sequence of updates68 69 Some of these goals are mutually exclusive. For example, availability and70 consistency are opposing, so it is not possible to achieve #5 and #8 at the71 same time. Moreover, it takes a more complex architecture to get close to the72 available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.73 74 Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key75 cryptography for integrity, and erasure coding for availability. These76 achieve roughly the same properties as immutable CHK files, but their77 contents can be replaced without changing their identity. Dirnodes are then78 just a special way of interpreting the contents of a specific mutable file.79 Earlier releases used a "vdrive server": this server was abolished in the80 v0.7.0 release.81 82 For details of how mutable files work, please see "mutable.txt" in this83 directory.84 85 For releases since v0.7.0, we achieve most of our desired properties. The86 integrity and availability of dirnodes is equivalent to that of regular87 (immutable) files, with the exception that there are more simultaneous-update88 failure modes for mutable slots. Delegation is quite strong: you can give89 read-write or read-only access to any subtree, and the data format used for90 dirnodes is such that read-only access is transitive: i.e. if you grant Bob91 read-only access to a parent directory, then Bob will get read-only access92 (and *not* read-write access) to its children.93 94 Relative to the previous "vdrive-server" based scheme, the current95 distributed dirnode approach gives better availability, but cannot guarantee96 updateness quite as well, and requires far more network traffic for each97 retrieval and update. Mutable files are somewhat less available than98 immutable files, simply because of the increased number of combinations99 (shares of an immutable file are either present or not, whereas there are100 multiple versions of each mutable file, and you might have some shares of101 version 1 and other shares of version 2). In extreme cases of simultaneous102 update, mutable files might suffer from non-monotonicity.103 104 105 == Dirnode secret values ==106 107 As mentioned before, dirnodes are simply a special way to interpret the108 contents of a mutable file, so the secret keys and capability strings109 described in "mutable.txt" are all the same. Each dirnode contains an RSA110 public/private keypair, and the holder of the "write capability" will be able111 to retrieve the private key (as well as the AES encryption key used for the112 data itself). The holder of the "read capability" will be able to obtain the113 public key and the AES data key, but not the RSA private key needed to modify114 the data.115 116 The "write capability" for a dirnode grants read-write access to its117 contents. This is expressed on concrete form as the "dirnode write cap": a118 printable string which contains the necessary secrets to grant this access.119 Likewise, the "read capability" grants read-only access to a dirnode, and can120 be represented by a "dirnode read cap" string.121 122 For example,123 URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o124 is a write-capability URI, while125 URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o126 is a read-capability URI, both for the same dirnode.127 128 129 == Dirnode storage format ==130 131 Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS132 grid. The contents of this file are a serialized list of netstrings, one per133 child. Each child is a list of four netstrings: (name, rocap, rwcap,134 metadata). (Remember that the contents of the mutable file are encrypted by135 the read-cap, so this section describes the plaintext contents of the mutable136 file, *after* it has been decrypted by the read-cap.)137 138 The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only139 capability URI to that child, either an immutable (CHK) file, a mutable file,140 or a directory. It is also possible to store 'unknown' URIs that are not141 recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write142 capability URI for that child, encrypted with the dirnode's write-cap: this143 enables the "transitive readonlyness" property, described further below. The144 'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some145 metadata keys are pre-defined, the rest are left up to the application.146 147 Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random148 value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI149 string, using a key that is formed from a tagged hash of the IV and the150 dirnode's writekey. The MAC is written only for compatibility with older151 Tahoe-LAFS versions and is no longer verified.152 153 If Bob has read-only access to the 'bar' directory, and he adds it as a child154 to the 'foo' directory, then he will put the read-only cap for 'bar' in both155 the rwcap and rocap slots (encrypting the rwcap contents as described above).156 If he has full read-write access to 'bar', then he will put the read-write157 cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since158 other users who have read-only access to 'foo' will be unable to decrypt its159 rwcap slot, this limits those users to read-only access to 'bar' as well,160 thus providing the transitive readonlyness that we desire.161 162 === Dirnode sizes, mutable-file initial read sizes ===163 164 How big are dirnodes? When reading dirnode data out of mutable files, how165 large should our initial read be? If we guess exactly, we can read a dirnode166 in a single round-trip, and update one in two RTT. If we guess too high,167 we'll waste some amount of bandwidth. If we guess low, we need to make a168 second pass to get the data (or the encrypted privkey, for writes), which169 will cost us at least another RTT.170 171 Assuming child names are between 10 and 99 characters long, how long are the172 various pieces of a dirnode?173 174 netstring(name) ~= 4+len(name)175 chk-cap = 97 (for 4-char filesizes)176 dir-rw-cap = 88177 dir-ro-cap = 91178 netstring(cap) = 4+len(cap)179 encrypted(cap) = 16+cap+32180 JSON({}) = 2181 JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137182 netstring(metadata) = 4+137 = 141183 184 so a CHK entry is:185 5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137186 And a 15-byte filename gives a 416-byte entry. When the entry points at a187 subdirectory instead of a file, the entry is a little bit smaller. So an188 empty directory uses 0 bytes, a directory with one child uses about 416189 bytes, a directory with two children uses about 832, etc.190 191 When the dirnode data is encoding using our default 3-of-10, that means we192 get 139ish bytes of data in each share per child.193 194 The pubkey, signature, and hashes form the first 935ish bytes of the195 container, then comes our data, then about 1216 bytes of encprivkey. So if we196 read the first:197 198 1kB: we get 65bytes of dirnode data : only empty directories199 2kB: 1065bytes: about 8200 3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey201 4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey202 203 So we've written the code to do an initial read of 4kB from each share when204 we read the mutable file, which should give good performance (one RTT) for205 small directories.206 207 208 == Design Goals, redux ==209 210 How well does this design meet the goals?211 212 #1 functional: YES: the code works and has extensive unit tests213 #2 documentable: YES: this document is the existence proof214 #3 confidential: YES: see below215 #4 integrity: MOSTLY: a coalition of storage servers can rollback individual216 mutable files, but not a single one. No server can217 substitute fake data as genuine.218 #5 availability: YES: as long as 'k' storage servers are present and have219 the same version of the mutable file, the dirnode will220 be available.221 #6 efficient: MOSTLY:222 network: single dirnode lookup is very efficient, since clients can223 fetch specific keys rather than being required to get or set224 the entire dirnode each time. Traversing many directories225 takes a lot of roundtrips, and these can't be collapsed with226 promise-pipelining because the intermediate values must only227 be visible to the client. Modifying many dirnodes at once228 (e.g. importing a large pre-existing directory tree) is pretty229 slow, since each graph edge must be created independently.230 storage: each child has a separate IV, which makes them larger than231 if all children were aggregated into a single encrypted string232 #7 delegation: VERY: each dirnode is a completely independent object,233 to which clients can be granted separate read-write or234 read-only access235 #8 updateness: VERY: with only a single point of access, and no caching,236 each client operation starts by fetching the current237 value, so there are no opportunities for staleness238 #9 monotonicity: VERY: the single point of access also protects against239 retrograde motion240 241 242 243 === Confidentiality leaks in the storage servers ===244 245 Dirnode (and the mutable files upon which they are based) are very private246 against other clients: traffic between the client and the storage servers is247 protected by the Foolscap SSL connection, so they can observe very little.248 Storage index values are hashes of secrets and thus unguessable, and they are249 not made public, so other clients cannot snoop through encrypted dirnodes250 that they have not been told about.251 252 Storage servers can observe access patterns and see ciphertext, but they253 cannot see the plaintext (of child names, metadata, or URIs). If an attacker254 operates a significant number of storage servers, they can infer the shape of255 the directory structure by assuming that directories are usually accessed256 from root to leaf in rapid succession. Since filenames are usually much257 shorter than read-caps and write-caps, the attacker can use the length of the258 ciphertext to guess the number of children of each node, and might be able to259 guess the length of the child names (or at least their sum). From this, the260 attacker may be able to build up a graph with the same shape as the plaintext261 filesystem, but with unlabeled edges and unknown file contents.262 263 264 === Integrity failures in the storage servers ===265 266 The mutable file's integrity mechanism (RSA signature on the hash of the file267 contents) prevents the storage server from modifying the dirnode's contents268 without detection. Therefore the storage servers can make the dirnode269 unavailable, but not corrupt it.270 271 A sufficient number of colluding storage servers can perform a rollback272 attack: replace all shares of the whole mutable file with an earlier version.273 To prevent this, when retrieving the contents of a mutable file, the274 client queries more servers than necessary and uses the highest available275 version number. This insures that one or two misbehaving storage servers276 cannot cause this rollback on their own.277 278 279 === Improving the efficiency of dirnodes ===280 281 The current mutable-file -based dirnode scheme suffers from certain282 inefficiencies. A very large directory (with thousands or millions of283 children) will take a significant time to extract any single entry, because284 the whole file must be downloaded first, then parsed and searched to find the285 desired child entry. Likewise, modifying a single child will require the286 whole file to be re-uploaded.287 288 The current design assumes (and in some cases, requires) that dirnodes remain289 small. The mutable files on which dirnodes are based are currently using290 "SDMF" ("Small Distributed Mutable File") design rules, which state that the291 size of the data shall remain below one megabyte. More advanced forms of292 mutable files (MDMF and LDMF) are in the design phase to allow efficient293 manipulation of larger mutable files. This would reduce the work needed to294 modify a single entry in a large directory.295 296 Judicious caching may help improve the reading-large-directory case. Some297 form of mutable index at the beginning of the dirnode might help as well. The298 MDMF design rules allow for efficient random-access reads from the middle of299 the file, which would give the index something useful to point at.300 301 The current SDMF design generates a new RSA public/private keypair for each302 directory. This takes considerable time and CPU effort, generally one or two303 seconds per directory. We have designed (but not yet built) a DSA-based304 mutable file scheme which will use shared parameters to reduce the305 directory-creation effort to a bare minimum (picking a random number instead306 of generating two random primes).307 308 309 When a backup program is run for the first time, it needs to copy a large310 amount of data from a pre-existing filesystem into reliable storage. This311 means that a large and complex directory structure needs to be duplicated in312 the dirnode layer. With the one-object-per-dirnode approach described here,313 this requires as many operations as there are edges in the imported314 filesystem graph.315 316 Another approach would be to aggregate multiple directories into a single317 storage object. This object would contain a serialized graph rather than a318 single name-to-child dictionary. Most directory operations would fetch the319 whole block of data (and presumeably cache it for a while to avoid lots of320 re-fetches), and modification operations would need to replace the whole321 thing at once. This "realm" approach would have the added benefit of322 combining more data into a single encrypted bundle (perhaps hiding the shape323 of the graph from a determined attacker), and would reduce round-trips when324 performing deep directory traversals (assuming the realm was already cached).325 It would also prevent fine-grained rollback attacks from working: a coalition326 of storage servers could change the entire realm to look like an earlier327 state, but it could not independently roll back individual directories.328 329 The drawbacks of this aggregation would be that small accesses (adding a330 single child, looking up a single child) would require pulling or pushing a331 lot of unrelated data, increasing network overhead (and necessitating332 test-and-set semantics for the modification side, which increases the chances333 that a user operation will fail, making it more challenging to provide334 promises of atomicity to the user).335 336 It would also make it much more difficult to enable the delegation337 ("sharing") of specific directories. Since each aggregate "realm" provides338 all-or-nothing access control, the act of delegating any directory from the339 middle of the realm would require the realm first be split into the upper340 piece that isn't being shared and the lower piece that is. This splitting341 would have to be done in response to what is essentially a read operation,342 which is not traditionally supposed to be a high-effort action. On the other343 hand, it may be possible to aggregate the ciphertext, but use distinct344 encryption keys for each component directory, to get the benefits of both345 schemes at once.346 347 348 === Dirnode expiration and leases ===349 350 Dirnodes are created any time a client wishes to add a new directory. How351 long do they live? What's to keep them from sticking around forever, taking352 up space that nobody can reach any longer?353 354 Mutable files are created with limited-time "leases", which keep the shares355 alive until the last lease has expired or been cancelled. Clients which know356 and care about specific dirnodes can ask to keep them alive for a while, by357 renewing a lease on them (with a typical period of one month). Clients are358 expected to assist in the deletion of dirnodes by canceling their leases as359 soon as they are done with them. This means that when a client deletes a360 directory, it should also cancel its lease on that directory. When the lease361 count on a given share goes to zero, the storage server can delete the362 related storage. Multiple clients may all have leases on the same dirnode:363 the server may delete the shares only after all of the leases have gone away.364 365 We expect that clients will periodically create a "manifest": a list of366 so-called "refresh capabilities" for all of the dirnodes and files that they367 can reach. They will give this manifest to the "repairer", which is a service368 that keeps files (and dirnodes) alive on behalf of clients who cannot take on369 this responsibility for themselves. These refresh capabilities include the370 storage index, but do *not* include the readkeys or writekeys, so the371 repairer does not get to read the files or directories that it is helping to372 keep alive.373 374 After each change to the user's vdrive, the client creates a manifest and375 looks for differences from their previous version. Anything which was removed376 prompts the client to send out lease-cancellation messages, allowing the data377 to be deleted.378 379 380 == Starting Points: root dirnodes ==381 382 Any client can record the URI of a directory node in some external form (say,383 in a local file) and use it as the starting point of later traversal. Each384 Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first385 start using the grid, and record its URI for later use.386 387 == Mounting and Sharing Directories ==388 389 The biggest benefit of this dirnode approach is that sharing individual390 directories is almost trivial. Alice creates a subdirectory that she wants to391 use to share files with Bob. This subdirectory is attached to Alice's392 filesystem at "~alice/share-with-bob". She asks her filesystem for the393 read-write directory URI for that new directory, and emails it to Bob. When394 Bob receives the URI, he asks his own local vdrive to attach the given URI,395 perhaps at a place named "~bob/shared-with-alice". Every time either party396 writes a file into this directory, the other will be able to read it. If397 Alice prefers, she can give a read-only URI to Bob instead, and then Bob will398 be able to read files but not change the contents of the directory. Neither399 Alice nor Bob will get access to any files above the mounted directory: there400 are no 'parent directory' pointers. If Alice creates a nested set of401 directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to402 share-with-bob to Bob, then Bob will be unable to write to either403 share-with-bob/ or subdir2/.404 405 A suitable UI needs to be created to allow users to easily perform this406 sharing action: dragging a folder their vdrive to an IM or email user icon,407 for example. The UI will need to give the sending user an opportunity to408 indicate whether they want to grant read-write or read-only access to the409 recipient. The recipient then needs an interface to drag the new folder into410 their vdrive and give it a home.411 412 == Revocation ==413 414 When Alice decides that she no longer wants Bob to be able to access the415 shared directory, what should she do? Suppose she's shared this folder with416 both Bob and Carol, and now she wants Carol to retain access to it but Bob to417 be shut out. Ideally Carol should not have to do anything: her access should418 continue unabated.419 420 The current plan is to have her client create a deep copy of the folder in421 question, delegate access to the new folder to the remaining members of the422 group (Carol), asking the lucky survivors to replace their old reference with423 the new one. Bob may still have access to the old folder, but he is now the424 only one who cares: everyone else has moved on, and he will no longer be able425 to see their new changes. In a strict sense, this is the strongest form of426 revocation that can be accomplished: there is no point trying to force Bob to427 forget about the files that he read a moment before being kicked out. In428 addition it must be noted that anyone who can access the directory can proxy429 for Bob, reading files to him and accepting changes whenever he wants.430 Preventing delegation between communication parties is just as pointless as431 asking Bob to forget previously accessed files. However, there may be value432 to configuring the UI to ask Carol to not share files with Bob, or to433 removing all files from Bob's view at the same time his access is revoked.434 -
new file docs/specifications/file-encoding.rst
diff --git a/docs/specifications/file-encoding.rst b/docs/specifications/file-encoding.rst new file mode 100644 index 0000000..1f2ee74
- + 1 ============= 2 File Encoding 3 ============= 4 5 When the client wishes to upload an immutable file, the first step is to 6 decide upon an encryption key. There are two methods: convergent or random. 7 The goal of the convergent-key method is to make sure that multiple uploads 8 of the same file will result in only one copy on the grid, whereas the 9 random-key method does not provide this "convergence" feature. 10 11 The convergent-key method computes the SHA-256d hash of a single-purpose tag, 12 the encoding parameters, a "convergence secret", and the contents of the 13 file. It uses a portion of the resulting hash as the AES encryption key. 14 There are security concerns with using convergence this approach (the 15 "partial-information guessing attack", please see ticket #365 for some 16 references), so Tahoe uses a separate (randomly-generated) "convergence 17 secret" for each node, stored in NODEDIR/private/convergence . The encoding 18 parameters (k, N, and the segment size) are included in the hash to make sure 19 that two different encodings of the same file will get different keys. This 20 method requires an extra IO pass over the file, to compute this key, and 21 encryption cannot be started until the pass is complete. This means that the 22 convergent-key method will require at least two total passes over the file. 23 24 The random-key method simply chooses a random encryption key. Convergence is 25 disabled, however this method does not require a separate IO pass, so upload 26 can be done with a single pass. This mode makes it easier to perform 27 streaming upload. 28 29 Regardless of which method is used to generate the key, the plaintext file is 30 encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is 31 then erasure-coded and uploaded to the servers. Two hashes of the ciphertext 32 are generated as the encryption proceeds: a flat hash of the whole 33 ciphertext, and a Merkle tree. These are used to verify the correctness of 34 the erasure decoding step, and can be used by a "verifier" process to make 35 sure the file is intact without requiring the decryption key. 36 37 The encryption key is hashed (with SHA-256d and a single-purpose tag) to 38 produce the "Storage Index". This Storage Index (or SI) is used to identify 39 the shares produced by the method described below. The grid can be thought of 40 as a large table that maps Storage Index to a ciphertext. Since the 41 ciphertext is stored as erasure-coded shares, it can also be thought of as a 42 table that maps SI to shares. 43 44 Anybody who knows a Storage Index can retrieve the associated ciphertext: 45 ciphertexts are not secret. 46 47 .. image:: file-encoding1.svg 48 49 The ciphertext file is then broken up into segments. The last segment is 50 likely to be shorter than the rest. Each segment is erasure-coded into a 51 number of "blocks". This takes place one segment at a time. (In fact, 52 encryption and erasure-coding take place at the same time, once per plaintext 53 segment). Larger segment sizes result in less overhead overall, but increase 54 both the memory footprint and the "alacrity" (the number of bytes we have to 55 receive before we can deliver validated plaintext to the user). The current 56 default segment size is 128KiB. 57 58 One block from each segment is sent to each shareholder (aka leaseholder, 59 aka landlord, aka storage node, aka peer). The "share" held by each remote 60 shareholder is nominally just a collection of these blocks. The file will 61 be recoverable when a certain number of shares have been retrieved. 62 63 .. image:: file-encoding2.svg 64 65 The blocks are hashed as they are generated and transmitted. These 66 block hashes are put into a Merkle hash tree. When the last share has been 67 created, the merkle tree is completed and delivered to the peer. Later, when 68 we retrieve these blocks, the peer will send many of the merkle hash tree 69 nodes ahead of time, so we can validate each block independently. 70 71 The root of this block hash tree is called the "block root hash" and 72 used in the next step. 73 74 .. image:: file-encoding3.svg 75 76 There is a higher-level Merkle tree called the "share hash tree". Its leaves 77 are the block root hashes from each share. The root of this tree is called 78 the "share root hash" and is included in the "URI Extension Block", aka UEB. 79 The ciphertext hash and Merkle tree are also put here, along with the 80 original file size, and the encoding parameters. The UEB contains all the 81 non-secret values that could be put in the URI, but would have made the URI 82 too big. So instead, the UEB is stored with the share, and the hash of the 83 UEB is put in the URI. 84 85 The URI then contains the secret encryption key and the UEB hash. It also 86 contains the basic encoding parameters (k and N) and the file size, to make 87 download more efficient (by knowing the number of required shares ahead of 88 time, sufficient download queries can be generated in parallel). 89 90 The URI (also known as the immutable-file read-cap, since possessing it 91 grants the holder the capability to read the file's plaintext) is then 92 represented as a (relatively) short printable string like so:: 93 94 URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000 95 96 .. image:: file-encoding4.svg 97 98 During download, when a peer begins to transmit a share, it first transmits 99 all of the parts of the share hash tree that are necessary to validate its 100 block root hash. Then it transmits the portions of the block hash tree 101 that are necessary to validate the first block. Then it transmits the 102 first block. It then continues this loop: transmitting any portions of the 103 block hash tree to validate block#N, then sending block#N. 104 105 .. image:: file-encoding5.svg 106 107 So the "share" that is sent to the remote peer actually consists of three 108 pieces, sent in a specific order as they become available, and retrieved 109 during download in a different order according to when they are needed. 110 111 The first piece is the blocks themselves, one per segment. The last 112 block will likely be shorter than the rest, because the last segment is 113 probably shorter than the rest. The second piece is the block hash tree, 114 consisting of a total of two SHA-1 hashes per block. The third piece is a 115 hash chain from the share hash tree, consisting of log2(numshares) hashes. 116 117 During upload, all blocks are sent first, followed by the block hash 118 tree, followed by the share hash chain. During download, the share hash chain 119 is delivered first, followed by the block root hash. The client then uses 120 the hash chain to validate the block root hash. Then the peer delivers 121 enough of the block hash tree to validate the first block, followed by 122 the first block itself. The block hash chain is used to validate the 123 block, then it is passed (along with the first block from several other 124 peers) into decoding, to produce the first segment of crypttext, which is 125 then decrypted to produce the first segment of plaintext, which is finally 126 delivered to the user. 127 128 .. image:: file-encoding6.svg 129 130 Hashes 131 ====== 132 133 All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson 134 and Schneier). All hashes use a single-purpose tag, e.g. the hash that 135 converts an encryption key into a storage index is defined as follows:: 136 137 SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key) 138 139 When two separate values need to be combined together in a hash, we wrap each 140 in a netstring. 141 142 Using SHA-256d (instead of plain SHA-256) guards against length-extension 143 attacks. Using the tag protects our Merkle trees against attacks in which the 144 hash of a leaf is confused with a hash of two children (allowing an attacker 145 to generate corrupted data that nevertheless appears to be valid), and is 146 simply good "cryptograhic hygiene". The `"Chosen Protocol Attack" by Kelsey, 147 Schneier, and Wagner <http://www.schneier.com/paper-chosen-protocol.html>`_ is 148 relevant. Putting the tag in a netstring guards against attacks that seek to 149 confuse the end of the tag with the beginning of the subsequent value. 150 -
deleted file docs/specifications/file-encoding.txt
diff --git a/docs/specifications/file-encoding.txt b/docs/specifications/file-encoding.txt deleted file mode 100644 index 23862ea..0000000
+ - 1 2 == FileEncoding ==3 4 When the client wishes to upload an immutable file, the first step is to5 decide upon an encryption key. There are two methods: convergent or random.6 The goal of the convergent-key method is to make sure that multiple uploads7 of the same file will result in only one copy on the grid, whereas the8 random-key method does not provide this "convergence" feature.9 10 The convergent-key method computes the SHA-256d hash of a single-purpose tag,11 the encoding parameters, a "convergence secret", and the contents of the12 file. It uses a portion of the resulting hash as the AES encryption key.13 There are security concerns with using convergence this approach (the14 "partial-information guessing attack", please see ticket #365 for some15 references), so Tahoe uses a separate (randomly-generated) "convergence16 secret" for each node, stored in NODEDIR/private/convergence . The encoding17 parameters (k, N, and the segment size) are included in the hash to make sure18 that two different encodings of the same file will get different keys. This19 method requires an extra IO pass over the file, to compute this key, and20 encryption cannot be started until the pass is complete. This means that the21 convergent-key method will require at least two total passes over the file.22 23 The random-key method simply chooses a random encryption key. Convergence is24 disabled, however this method does not require a separate IO pass, so upload25 can be done with a single pass. This mode makes it easier to perform26 streaming upload.27 28 Regardless of which method is used to generate the key, the plaintext file is29 encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is30 then erasure-coded and uploaded to the servers. Two hashes of the ciphertext31 are generated as the encryption proceeds: a flat hash of the whole32 ciphertext, and a Merkle tree. These are used to verify the correctness of33 the erasure decoding step, and can be used by a "verifier" process to make34 sure the file is intact without requiring the decryption key.35 36 The encryption key is hashed (with SHA-256d and a single-purpose tag) to37 produce the "Storage Index". This Storage Index (or SI) is used to identify38 the shares produced by the method described below. The grid can be thought of39 as a large table that maps Storage Index to a ciphertext. Since the40 ciphertext is stored as erasure-coded shares, it can also be thought of as a41 table that maps SI to shares.42 43 Anybody who knows a Storage Index can retrieve the associated ciphertext:44 ciphertexts are not secret.45 46 47 [[Image(file-encoding1.png)]]48 49 The ciphertext file is then broken up into segments. The last segment is50 likely to be shorter than the rest. Each segment is erasure-coded into a51 number of "blocks". This takes place one segment at a time. (In fact,52 encryption and erasure-coding take place at the same time, once per plaintext53 segment). Larger segment sizes result in less overhead overall, but increase54 both the memory footprint and the "alacrity" (the number of bytes we have to55 receive before we can deliver validated plaintext to the user). The current56 default segment size is 128KiB.57 58 One block from each segment is sent to each shareholder (aka leaseholder,59 aka landlord, aka storage node, aka peer). The "share" held by each remote60 shareholder is nominally just a collection of these blocks. The file will61 be recoverable when a certain number of shares have been retrieved.62 63 [[Image(file-encoding2.png)]]64 65 The blocks are hashed as they are generated and transmitted. These66 block hashes are put into a Merkle hash tree. When the last share has been67 created, the merkle tree is completed and delivered to the peer. Later, when68 we retrieve these blocks, the peer will send many of the merkle hash tree69 nodes ahead of time, so we can validate each block independently.70 71 The root of this block hash tree is called the "block root hash" and72 used in the next step.73 74 [[Image(file-encoding3.png)]]75 76 There is a higher-level Merkle tree called the "share hash tree". Its leaves77 are the block root hashes from each share. The root of this tree is called78 the "share root hash" and is included in the "URI Extension Block", aka UEB.79 The ciphertext hash and Merkle tree are also put here, along with the80 original file size, and the encoding parameters. The UEB contains all the81 non-secret values that could be put in the URI, but would have made the URI82 too big. So instead, the UEB is stored with the share, and the hash of the83 UEB is put in the URI.84 85 The URI then contains the secret encryption key and the UEB hash. It also86 contains the basic encoding parameters (k and N) and the file size, to make87 download more efficient (by knowing the number of required shares ahead of88 time, sufficient download queries can be generated in parallel).89 90 The URI (also known as the immutable-file read-cap, since possessing it91 grants the holder the capability to read the file's plaintext) is then92 represented as a (relatively) short printable string like so:93 94 URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:100000095 96 [[Image(file-encoding4.png)]]97 98 During download, when a peer begins to transmit a share, it first transmits99 all of the parts of the share hash tree that are necessary to validate its100 block root hash. Then it transmits the portions of the block hash tree101 that are necessary to validate the first block. Then it transmits the102 first block. It then continues this loop: transmitting any portions of the103 block hash tree to validate block#N, then sending block#N.104 105 [[Image(file-encoding5.png)]]106 107 So the "share" that is sent to the remote peer actually consists of three108 pieces, sent in a specific order as they become available, and retrieved109 during download in a different order according to when they are needed.110 111 The first piece is the blocks themselves, one per segment. The last112 block will likely be shorter than the rest, because the last segment is113 probably shorter than the rest. The second piece is the block hash tree,114 consisting of a total of two SHA-1 hashes per block. The third piece is a115 hash chain from the share hash tree, consisting of log2(numshares) hashes.116 117 During upload, all blocks are sent first, followed by the block hash118 tree, followed by the share hash chain. During download, the share hash chain119 is delivered first, followed by the block root hash. The client then uses120 the hash chain to validate the block root hash. Then the peer delivers121 enough of the block hash tree to validate the first block, followed by122 the first block itself. The block hash chain is used to validate the123 block, then it is passed (along with the first block from several other124 peers) into decoding, to produce the first segment of crypttext, which is125 then decrypted to produce the first segment of plaintext, which is finally126 delivered to the user.127 128 [[Image(file-encoding6.png)]]129 130 == Hashes ==131 132 All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson133 and Schneier). All hashes use a single-purpose tag, e.g. the hash that134 converts an encryption key into a storage index is defined as follows:135 136 SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)137 138 When two separate values need to be combined together in a hash, we wrap each139 in a netstring.140 141 Using SHA-256d (instead of plain SHA-256) guards against length-extension142 attacks. Using the tag protects our Merkle trees against attacks in which the143 hash of a leaf is confused with a hash of two children (allowing an attacker144 to generate corrupted data that nevertheless appears to be valid), and is145 simply good "cryptograhic hygiene". The "Chosen Protocol Attack" by Kelsey,146 Schneier, and Wagner (http://www.schneier.com/paper-chosen-protocol.html) is147 relevant. Putting the tag in a netstring guards against attacks that seek to148 confuse the end of the tag with the beginning of the subsequent value. -
new file docs/specifications/mutable.rst
diff --git a/docs/specifications/mutable.rst b/docs/specifications/mutable.rst new file mode 100644 index 0000000..0d7e71e
- + 1 ============= 2 Mutable Files 3 ============= 4 5 This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0. 6 7 1. `Consistency vs. Availability`_ 8 2. `The Prime Coordination Directive: "Don't Do That"`_ 9 3. `Small Distributed Mutable Files`_ 10 11 1. `SDMF slots overview`_ 12 2. `Server Storage Protocol`_ 13 3. `Code Details`_ 14 4. `SMDF Slot Format`_ 15 5. `Recovery`_ 16 17 4. `Medium Distributed Mutable Files`_ 18 5. `Large Distributed Mutable Files`_ 19 6. `TODO`_ 20 21 Mutable File Slots are places with a stable identifier that can hold data 22 that changes over time. In contrast to CHK slots, for which the 23 URI/identifier is derived from the contents themselves, the Mutable File Slot 24 URI remains fixed for the life of the slot, regardless of what data is placed 25 inside it. 26 27 Each mutable slot is referenced by two different URIs. The "read-write" URI 28 grants read-write access to its holder, allowing them to put whatever 29 contents they like into the slot. The "read-only" URI is less powerful, only 30 granting read access, and not enabling modification of the data. The 31 read-write URI can be turned into the read-only URI, but not the other way 32 around. 33 34 The data in these slots is distributed over a number of servers, using the 35 same erasure coding that CHK files use, with 3-of-10 being a typical choice 36 of encoding parameters. The data is encrypted and signed in such a way that 37 only the holders of the read-write URI will be able to set the contents of 38 the slot, and only the holders of the read-only URI will be able to read 39 those contents. Holders of either URI will be able to validate the contents 40 as being written by someone with the read-write URI. The servers who hold the 41 shares cannot read or modify them: the worst they can do is deny service (by 42 deleting or corrupting the shares), or attempt a rollback attack (which can 43 only succeed with the cooperation of at least k servers). 44 45 Consistency vs. Availability 46 ============================ 47 48 There is an age-old battle between consistency and availability. Epic papers 49 have been written, elaborate proofs have been established, and generations of 50 theorists have learned that you cannot simultaneously achieve guaranteed 51 consistency with guaranteed reliability. In addition, the closer to 0 you get 52 on either axis, the cost and complexity of the design goes up. 53 54 Tahoe's design goals are to largely favor design simplicity, then slightly 55 favor read availability, over the other criteria. 56 57 As we develop more sophisticated mutable slots, the API may expose multiple 58 read versions to the application layer. The tahoe philosophy is to defer most 59 consistency recovery logic to the higher layers. Some applications have 60 effective ways to merge multiple versions, so inconsistency is not 61 necessarily a problem (i.e. directory nodes can usually merge multiple "add 62 child" operations). 63 64 The Prime Coordination Directive: "Don't Do That" 65 ================================================= 66 67 The current rule for applications which run on top of Tahoe is "do not 68 perform simultaneous uncoordinated writes". That means you need non-tahoe 69 means to make sure that two parties are not trying to modify the same mutable 70 slot at the same time. For example: 71 72 * don't give the read-write URI to anyone else. Dirnodes in a private 73 directory generally satisfy this case, as long as you don't use two 74 clients on the same account at the same time 75 * if you give a read-write URI to someone else, stop using it yourself. An 76 inbox would be a good example of this. 77 * if you give a read-write URI to someone else, call them on the phone 78 before you write into it 79 * build an automated mechanism to have your agents coordinate writes. 80 For example, we expect a future release to include a FURL for a 81 "coordination server" in the dirnodes. The rule can be that you must 82 contact the coordination server and obtain a lock/lease on the file 83 before you're allowed to modify it. 84 85 If you do not follow this rule, Bad Things will happen. The worst-case Bad 86 Thing is that the entire file will be lost. A less-bad Bad Thing is that one 87 or more of the simultaneous writers will lose their changes. An observer of 88 the file may not see monotonically-increasing changes to the file, i.e. they 89 may see version 1, then version 2, then 3, then 2 again. 90 91 Tahoe takes some amount of care to reduce the badness of these Bad Things. 92 One way you can help nudge it from the "lose your file" case into the "lose 93 some changes" case is to reduce the number of competing versions: multiple 94 versions of the file that different parties are trying to establish as the 95 one true current contents. Each simultaneous writer counts as a "competing 96 version", as does the previous version of the file. If the count "S" of these 97 competing versions is larger than N/k, then the file runs the risk of being 98 lost completely. [TODO] If at least one of the writers remains running after 99 the collision is detected, it will attempt to recover, but if S>(N/k) and all 100 writers crash after writing a few shares, the file will be lost. 101 102 Note that Tahoe uses serialization internally to make sure that a single 103 Tahoe node will not perform simultaneous modifications to a mutable file. It 104 accomplishes this by using a weakref cache of the MutableFileNode (so that 105 there will never be two distinct MutableFileNodes for the same file), and by 106 forcing all mutable file operations to obtain a per-node lock before they 107 run. The Prime Coordination Directive therefore applies to inter-node 108 conflicts, not intra-node ones. 109 110 111 Small Distributed Mutable Files 112 =============================== 113 114 SDMF slots are suitable for small (<1MB) files that are editing by rewriting 115 the entire file. The three operations are: 116 117 * allocate (with initial contents) 118 * set (with new contents) 119 * get (old contents) 120 121 The first use of SDMF slots will be to hold directories (dirnodes), which map 122 encrypted child names to rw-URI/ro-URI pairs. 123 124 SDMF slots overview 125 ------------------- 126 127 Each SDMF slot is created with a public/private key pair. The public key is 128 known as the "verification key", while the private key is called the 129 "signature key". The private key is hashed and truncated to 16 bytes to form 130 the "write key" (an AES symmetric key). The write key is then hashed and 131 truncated to form the "read key". The read key is hashed and truncated to 132 form the 16-byte "storage index" (a unique string used as an index to locate 133 stored data). 134 135 The public key is hashed by itself to form the "verification key hash". 136 137 The write key is hashed a different way to form the "write enabler master". 138 For each storage server on which a share is kept, the write enabler master is 139 concatenated with the server's nodeid and hashed, and the result is called 140 the "write enabler" for that particular server. Note that multiple shares of 141 the same slot stored on the same server will all get the same write enabler, 142 i.e. the write enabler is associated with the "bucket", rather than the 143 individual shares. 144 145 The private key is encrypted (using AES in counter mode) by the write key, 146 and the resulting crypttext is stored on the servers. so it will be 147 retrievable by anyone who knows the write key. The write key is not used to 148 encrypt anything else, and the private key never changes, so we do not need 149 an IV for this purpose. 150 151 The actual data is encrypted (using AES in counter mode) with a key derived 152 by concatenating the readkey with the IV, the hashing the results and 153 truncating to 16 bytes. The IV is randomly generated each time the slot is 154 updated, and stored next to the encrypted data. 155 156 The read-write URI consists of the write key and the verification key hash. 157 The read-only URI contains the read key and the verification key hash. The 158 verify-only URI contains the storage index and the verification key hash. 159 160 :: 161 162 URI:SSK-RW:b2a(writekey):b2a(verification_key_hash) 163 URI:SSK-RO:b2a(readkey):b2a(verification_key_hash) 164 URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash) 165 166 Note that this allows the read-only and verify-only URIs to be derived from 167 the read-write URI without actually retrieving the public keys. Also note 168 that it means the read-write agent must validate both the private key and the 169 public key when they are first fetched. All users validate the public key in 170 exactly the same way. 171 172 The SDMF slot is allocated by sending a request to the storage server with a 173 desired size, the storage index, and the write enabler for that server's 174 nodeid. If granted, the write enabler is stashed inside the slot's backing 175 store file. All further write requests must be accompanied by the write 176 enabler or they will not be honored. The storage server does not share the 177 write enabler with anyone else. 178 179 The SDMF slot structure will be described in more detail below. The important 180 pieces are: 181 182 * a sequence number 183 * a root hash "R" 184 * the encoding parameters (including k, N, file size, segment size) 185 * a signed copy of [seqnum,R,encoding_params], using the signature key 186 * the verification key (not encrypted) 187 * the share hash chain (part of a Merkle tree over the share hashes) 188 * the block hash tree (Merkle tree over blocks of share data) 189 * the share data itself (erasure-coding of read-key-encrypted file data) 190 * the signature key, encrypted with the write key 191 192 The access pattern for read is: 193 194 * hash read-key to get storage index 195 * use storage index to locate 'k' shares with identical 'R' values 196 197 * either get one share, read 'k' from it, then read k-1 shares 198 * or read, say, 5 shares, discover k, either get more or be finished 199 * or copy k into the URIs 200 201 * read verification key 202 * hash verification key, compare against verification key hash 203 * read seqnum, R, encoding parameters, signature 204 * verify signature against verification key 205 * read share data, compute block-hash Merkle tree and root "r" 206 * read share hash chain (leading from "r" to "R") 207 * validate share hash chain up to the root "R" 208 * submit share data to erasure decoding 209 * decrypt decoded data with read-key 210 * submit plaintext to application 211 212 The access pattern for write is: 213 214 * hash write-key to get read-key, hash read-key to get storage index 215 * use the storage index to locate at least one share 216 * read verification key and encrypted signature key 217 * decrypt signature key using write-key 218 * hash signature key, compare against write-key 219 * hash verification key, compare against verification key hash 220 * encrypt plaintext from application with read-key 221 222 * application can encrypt some data with the write-key to make it only 223 available to writers (use this for transitive read-onlyness of dirnodes) 224 225 * erasure-code crypttext to form shares 226 * split shares into blocks 227 * compute Merkle tree of blocks, giving root "r" for each share 228 * compute Merkle tree of shares, find root "R" for the file as a whole 229 * create share data structures, one per server: 230 231 * use seqnum which is one higher than the old version 232 * share hash chain has log(N) hashes, different for each server 233 * signed data is the same for each server 234 235 * now we have N shares and need homes for them 236 * walk through peers 237 238 * if share is not already present, allocate-and-set 239 * otherwise, try to modify existing share: 240 * send testv_and_writev operation to each one 241 * testv says to accept share if their(seqnum+R) <= our(seqnum+R) 242 * count how many servers wind up with which versions (histogram over R) 243 * keep going until N servers have the same version, or we run out of servers 244 245 * if any servers wound up with a different version, report error to 246 application 247 * if we ran out of servers, initiate recovery process (described below) 248 249 Server Storage Protocol 250 ----------------------- 251 252 The storage servers will provide a mutable slot container which is oblivious 253 to the details of the data being contained inside it. Each storage index 254 refers to a "bucket", and each bucket has one or more shares inside it. (In a 255 well-provisioned network, each bucket will have only one share). The bucket 256 is stored as a directory, using the base32-encoded storage index as the 257 directory name. Each share is stored in a single file, using the share number 258 as the filename. 259 260 The container holds space for a container magic number (for versioning), the 261 write enabler, the nodeid which accepted the write enabler (used for share 262 migration, described below), a small number of lease structures, the embedded 263 data itself, and expansion space for additional lease structures:: 264 265 # offset size name 266 1 0 32 magic verstr "tahoe mutable container v1" plus binary 267 2 32 20 write enabler's nodeid 268 3 52 32 write enabler 269 4 84 8 data size (actual share data present) (a) 270 5 92 8 offset of (8) count of extra leases (after data) 271 6 100 368 four leases, 92 bytes each 272 0 4 ownerid (0 means "no lease here") 273 4 4 expiration timestamp 274 8 32 renewal token 275 40 32 cancel token 276 72 20 nodeid which accepted the tokens 277 7 468 (a) data 278 8 ?? 4 count of extra leases 279 9 ?? n*92 extra leases 280 281 The "extra leases" field must be copied and rewritten each time the size of 282 the enclosed data changes. The hope is that most buckets will have four or 283 fewer leases and this extra copying will not usually be necessary. 284 285 The (4) "data size" field contains the actual number of bytes of data present 286 in field (7), such that a client request to read beyond 504+(a) will result 287 in an error. This allows the client to (one day) read relative to the end of 288 the file. The container size (that is, (8)-(7)) might be larger, especially 289 if extra size was pre-allocated in anticipation of filling the container with 290 a lot of data. 291 292 The offset in (5) points at the *count* of extra leases, at (8). The actual 293 leases (at (9)) begin 4 bytes later. If the container size changes, both (8) 294 and (9) must be relocated by copying. 295 296 The server will honor any write commands that provide the write token and do 297 not exceed the server-wide storage size limitations. Read and write commands 298 MUST be restricted to the 'data' portion of the container: the implementation 299 of those commands MUST perform correct bounds-checking to make sure other 300 portions of the container are inaccessible to the clients. 301 302 The two methods provided by the storage server on these "MutableSlot" share 303 objects are: 304 305 * readv(ListOf(offset=int, length=int)) 306 307 * returns a list of bytestrings, of the various requested lengths 308 * offset < 0 is interpreted relative to the end of the data 309 * spans which hit the end of the data will return truncated data 310 311 * testv_and_writev(write_enabler, test_vector, write_vector) 312 313 * this is a test-and-set operation which performs the given tests and only 314 applies the desired writes if all tests succeed. This is used to detect 315 simultaneous writers, and to reduce the chance that an update will lose 316 data recently written by some other party (written after the last time 317 this slot was read). 318 * test_vector=ListOf(TupleOf(offset, length, opcode, specimen)) 319 * the opcode is a string, from the set [gt, ge, eq, le, lt, ne] 320 * each element of the test vector is read from the slot's data and 321 compared against the specimen using the desired (in)equality. If all 322 tests evaluate True, the write is performed 323 * write_vector=ListOf(TupleOf(offset, newdata)) 324 325 * offset < 0 is not yet defined, it probably means relative to the 326 end of the data, which probably means append, but we haven't nailed 327 it down quite yet 328 * write vectors are executed in order, which specifies the results of 329 overlapping writes 330 331 * return value: 332 333 * error: OutOfSpace 334 * error: something else (io error, out of memory, whatever) 335 * (True, old_test_data): the write was accepted (test_vector passed) 336 * (False, old_test_data): the write was rejected (test_vector failed) 337 338 * both 'accepted' and 'rejected' return the old data that was used 339 for the test_vector comparison. This can be used by the client 340 to detect write collisions, including collisions for which the 341 desired behavior was to overwrite the old version. 342 343 In addition, the storage server provides several methods to access these 344 share objects: 345 346 * allocate_mutable_slot(storage_index, sharenums=SetOf(int)) 347 348 * returns DictOf(int, MutableSlot) 349 350 * get_mutable_slot(storage_index) 351 352 * returns DictOf(int, MutableSlot) 353 * or raises KeyError 354 355 We intend to add an interface which allows small slots to allocate-and-write 356 in a single call, as well as do update or read in a single call. The goal is 357 to allow a reasonably-sized dirnode to be created (or updated, or read) in 358 just one round trip (to all N shareholders in parallel). 359 360 migrating shares 361 ```````````````` 362 363 If a share must be migrated from one server to another, two values become 364 invalid: the write enabler (since it was computed for the old server), and 365 the lease renew/cancel tokens. 366 367 Suppose that a slot was first created on nodeA, and was thus initialized with 368 WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is 369 moved from nodeA to nodeB. 370 371 Readers may still be able to find the share in its new home, depending upon 372 how many servers are present in the grid, where the new nodeid lands in the 373 permuted index for this particular storage index, and how many servers the 374 reading client is willing to contact. 375 376 When a client attempts to write to this migrated share, it will get a "bad 377 write enabler" error, since the WE it computes for nodeB will not match the 378 WE(nodeA) that was embedded in the share. When this occurs, the "bad write 379 enabler" message must include the old nodeid (e.g. nodeA) that was in the 380 share. 381 382 The client then computes H(nodeB+H(WEM+nodeA)), which is the same as 383 H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which 384 is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to 385 anyone else. Also note that the client does not send a value to nodeB that 386 would allow the node to impersonate the client to a third node: everything 387 sent to nodeB will include something specific to nodeB in it. 388 389 The server locally computes H(nodeB+WE(nodeA)), using its own node id and the 390 old write enabler from the share. It compares this against the value supplied 391 by the client. If they match, this serves as proof that the client was able 392 to compute the old write enabler. The server then accepts the client's new 393 WE(nodeB) and writes it into the container. 394 395 This WE-fixup process requires an extra round trip, and requires the error 396 message to include the old nodeid, but does not require any public key 397 operations on either client or server. 398 399 Migrating the leases will require a similar protocol. This protocol will be 400 defined concretely at a later date. 401 402 Code Details 403 ------------ 404 405 The MutableFileNode class is used to manipulate mutable files (as opposed to 406 ImmutableFileNodes). These are initially generated with 407 client.create_mutable_file(), and later recreated from URIs with 408 client.create_node_from_uri(). Instances of this class will contain a URI and 409 a reference to the client (for peer selection and connection). 410 411 NOTE: this section is out of date. Please see src/allmydata/interfaces.py 412 (the section on IMutableFilesystemNode) for more accurate information. 413 414 The methods of MutableFileNode are: 415 416 * download_to_data() -> [deferred] newdata, NotEnoughSharesError 417 418 * if there are multiple retrieveable versions in the grid, get() returns 419 the first version it can reconstruct, and silently ignores the others. 420 In the future, a more advanced API will signal and provide access to 421 the multiple heads. 422 423 * update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError 424 * overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError 425 426 download_to_data() causes a new retrieval to occur, pulling the current 427 contents from the grid and returning them to the caller. At the same time, 428 this call caches information about the current version of the file. This 429 information will be used in a subsequent call to update(), and if another 430 change has occured between the two, this information will be out of date, 431 triggering the UncoordinatedWriteError. 432 433 update() is therefore intended to be used just after a download_to_data(), in 434 the following pattern:: 435 436 d = mfn.download_to_data() 437 d.addCallback(apply_delta) 438 d.addCallback(mfn.update) 439 440 If the update() call raises UCW, then the application can simply return an 441 error to the user ("you violated the Prime Coordination Directive"), and they 442 can try again later. Alternatively, the application can attempt to retry on 443 its own. To accomplish this, the app needs to pause, download the new 444 (post-collision and post-recovery) form of the file, reapply their delta, 445 then submit the update request again. A randomized pause is necessary to 446 reduce the chances of colliding a second time with another client that is 447 doing exactly the same thing:: 448 449 d = mfn.download_to_data() 450 d.addCallback(apply_delta) 451 d.addCallback(mfn.update) 452 def _retry(f): 453 f.trap(UncoordinatedWriteError) 454 d1 = pause(random.uniform(5, 20)) 455 d1.addCallback(lambda res: mfn.download_to_data()) 456 d1.addCallback(apply_delta) 457 d1.addCallback(mfn.update) 458 return d1 459 d.addErrback(_retry) 460 461 Enthusiastic applications can retry multiple times, using a randomized 462 exponential backoff between each. A particularly enthusiastic application can 463 retry forever, but such apps are encouraged to provide a means to the user of 464 giving up after a while. 465 466 UCW does not mean that the update was not applied, so it is also a good idea 467 to skip the retry-update step if the delta was already applied:: 468 469 d = mfn.download_to_data() 470 d.addCallback(apply_delta) 471 d.addCallback(mfn.update) 472 def _retry(f): 473 f.trap(UncoordinatedWriteError) 474 d1 = pause(random.uniform(5, 20)) 475 d1.addCallback(lambda res: mfn.download_to_data()) 476 def _maybe_apply_delta(contents): 477 new_contents = apply_delta(contents) 478 if new_contents != contents: 479 return mfn.update(new_contents) 480 d1.addCallback(_maybe_apply_delta) 481 return d1 482 d.addErrback(_retry) 483 484 update() is the right interface to use for delta-application situations, like 485 directory nodes (in which apply_delta might be adding or removing child 486 entries from a serialized table). 487 488 Note that any uncoordinated write has the potential to lose data. We must do 489 more analysis to be sure, but it appears that two clients who write to the 490 same mutable file at the same time (even if both eventually retry) will, with 491 high probability, result in one client observing UCW and the other silently 492 losing their changes. It is also possible for both clients to observe UCW. 493 The moral of the story is that the Prime Coordination Directive is there for 494 a reason, and that recovery/UCW/retry is not a subsitute for write 495 coordination. 496 497 overwrite() tells the client to ignore this cached version information, and 498 to unconditionally replace the mutable file's contents with the new data. 499 This should not be used in delta application, but rather in situations where 500 you want to replace the file's contents with completely unrelated ones. When 501 raw files are uploaded into a mutable slot through the tahoe webapi (using 502 POST and the ?mutable=true argument), they are put in place with overwrite(). 503 504 The peer-selection and data-structure manipulation (and signing/verification) 505 steps will be implemented in a separate class in allmydata/mutable.py . 506 507 SMDF Slot Format 508 ---------------- 509 510 This SMDF data lives inside a server-side MutableSlot container. The server 511 is oblivious to this format. 512 513 This data is tightly packed. In particular, the share data is defined to run 514 all the way to the beginning of the encrypted private key (the encprivkey 515 offset is used both to terminate the share data and to begin the encprivkey). 516 517 :: 518 519 # offset size name 520 1 0 1 version byte, \x00 for this format 521 2 1 8 sequence number. 2^64-1 must be handled specially, TBD 522 3 9 32 "R" (root of share hash Merkle tree) 523 4 41 16 IV (share data is AES(H(readkey+IV)) ) 524 5 57 18 encoding parameters: 525 57 1 k 526 58 1 N 527 59 8 segment size 528 67 8 data length (of original plaintext) 529 6 75 32 offset table: 530 75 4 (8) signature 531 79 4 (9) share hash chain 532 83 4 (10) block hash tree 533 87 4 (11) share data 534 91 8 (12) encrypted private key 535 99 8 (13) EOF 536 7 107 436ish verification key (2048 RSA key) 537 8 543ish 256ish signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm)) 538 9 799ish (a) share hash chain, encoded as: 539 "".join([pack(">H32s", shnum, hash) 540 for (shnum,hash) in needed_hashes]) 541 10 (927ish) (b) block hash tree, encoded as: 542 "".join([pack(">32s",hash) for hash in block_hash_tree]) 543 11 (935ish) LEN share data (no gap between this and encprivkey) 544 12 ?? 1216ish encrypted private key= AESenc(write-key, RSA-key) 545 13 ?? -- EOF 546 547 (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long. 548 This is the set of hashes necessary to validate this share's leaf in the 549 share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes. 550 (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes 551 long. This is the set of hashes necessary to validate any given block of 552 share data up to the per-share root "r". Each "r" is a leaf of the share 553 has tree (with root "R"), from which a minimal subset of hashes is put in 554 the share hash chain in (8). 555 556 Recovery 557 -------- 558 559 The first line of defense against damage caused by colliding writes is the 560 Prime Coordination Directive: "Don't Do That". 561 562 The second line of defense is to keep "S" (the number of competing versions) 563 lower than N/k. If this holds true, at least one competing version will have 564 k shares and thus be recoverable. Note that server unavailability counts 565 against us here: the old version stored on the unavailable server must be 566 included in the value of S. 567 568 The third line of defense is our use of testv_and_writev() (described below), 569 which increases the convergence of simultaneous writes: one of the writers 570 will be favored (the one with the highest "R"), and that version is more 571 likely to be accepted than the others. This defense is least effective in the 572 pathological situation where S simultaneous writers are active, the one with 573 the lowest "R" writes to N-k+1 of the shares and then dies, then the one with 574 the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the 575 one with the highest "R" writes to k-1 shares and dies. Any other sequencing 576 will allow the highest "R" to write to at least k shares and establish a new 577 revision. 578 579 The fourth line of defense is the fact that each client keeps writing until 580 at least one version has N shares. This uses additional servers, if 581 necessary, to make sure that either the client's version or some 582 newer/overriding version is highly available. 583 584 The fifth line of defense is the recovery algorithm, which seeks to make sure 585 that at least *one* version is highly available, even if that version is 586 somebody else's. 587 588 The write-shares-to-peers algorithm is as follows: 589 590 * permute peers according to storage index 591 * walk through peers, trying to assign one share per peer 592 * for each peer: 593 594 * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test 595 596 * this means that we will overwrite any old versions, and we will 597 overwrite simultaenous writers of the same version if our R is higher. 598 We will not overwrite writers using a higher seqnum. 599 600 * record the version that each share winds up with. If the write was 601 accepted, this is our own version. If it was rejected, read the 602 old_test_data to find out what version was retained. 603 * if old_test_data indicates the seqnum was equal or greater than our 604 own, mark the "Simultanous Writes Detected" flag, which will eventually 605 result in an error being reported to the writer (in their close() call). 606 * build a histogram of "R" values 607 * repeat until the histogram indicate that some version (possibly ours) 608 has N shares. Use new servers if necessary. 609 * If we run out of servers: 610 611 * if there are at least shares-of-happiness of any one version, we're 612 happy, so return. (the close() might still get an error) 613 * not happy, need to reinforce something, goto RECOVERY 614 615 Recovery: 616 617 * read all shares, count the versions, identify the recoverable ones, 618 discard the unrecoverable ones. 619 * sort versions: locate max(seqnums), put all versions with that seqnum 620 in the list, sort by number of outstanding shares. Then put our own 621 version. (TODO: put versions with seqnum <max but >us ahead of us?). 622 * for each version: 623 624 * attempt to recover that version 625 * if not possible, remove it from the list, go to next one 626 * if recovered, start at beginning of peer list, push that version, 627 continue until N shares are placed 628 * if pushing our own version, bump up the seqnum to one higher than 629 the max seqnum we saw 630 * if we run out of servers: 631 632 * schedule retry and exponential backoff to repeat RECOVERY 633 634 * admit defeat after some period? presumeably the client will be shut down 635 eventually, maybe keep trying (once per hour?) until then. 636 637 638 Medium Distributed Mutable Files 639 ================================ 640 641 These are just like the SDMF case, but: 642 643 * we actually take advantage of the Merkle hash tree over the blocks, by 644 reading a single segment of data at a time (and its necessary hashes), to 645 reduce the read-time alacrity 646 * we allow arbitrary writes to the file (i.e. seek() is provided, and 647 O_TRUNC is no longer required) 648 * we write more code on the client side (in the MutableFileNode class), to 649 first read each segment that a write must modify. This looks exactly like 650 the way a normal filesystem uses a block device, or how a CPU must perform 651 a cache-line fill before modifying a single word. 652 * we might implement some sort of copy-based atomic update server call, 653 to allow multiple writev() calls to appear atomic to any readers. 654 655 MDMF slots provide fairly efficient in-place edits of very large files (a few 656 GB). Appending data is also fairly efficient, although each time a power of 2 657 boundary is crossed, the entire file must effectively be re-uploaded (because 658 the size of the block hash tree changes), so if the filesize is known in 659 advance, that space ought to be pre-allocated (by leaving extra space between 660 the block hash tree and the actual data). 661 662 MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2 663 adds cache-line reads to allow random-access writes. 664 665 Large Distributed Mutable Files 666 =============================== 667 668 LDMF slots use a fundamentally different way to store the file, inspired by 669 Mercurial's "revlog" format. They enable very efficient insert/remove/replace 670 editing of arbitrary spans. Multiple versions of the file can be retained, in 671 a revision graph that can have multiple heads. Each revision can be 672 referenced by a cryptographic identifier. There are two forms of the URI, one 673 that means "most recent version", and a longer one that points to a specific 674 revision. 675 676 Metadata can be attached to the revisions, like timestamps, to enable rolling 677 back an entire tree to a specific point in history. 678 679 LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2 680 provides explicit support for revision identifiers and branching. 681 682 TODO 683 ==== 684 685 improve allocate-and-write or get-writer-buckets API to allow one-call (or 686 maybe two-call) updates. The challenge is in figuring out which shares are on 687 which machines. First cut will have lots of round trips. 688 689 (eventually) define behavior when seqnum wraps. At the very least make sure 690 it can't cause a security problem. "the slot is worn out" is acceptable. 691 692 (eventually) define share-migration lease update protocol. Including the 693 nodeid who accepted the lease is useful, we can use the same protocol as we 694 do for updating the write enabler. However we need to know which lease to 695 update.. maybe send back a list of all old nodeids that we find, then try all 696 of them when we accept the update? 697 698 We now do this in a specially-formatted IndexError exception: 699 "UNABLE to renew non-existent lease. I have leases accepted by " + 700 "nodeids: '12345','abcde','44221' ." 701 702 confirm that a repairer can regenerate shares without the private key. Hmm, 703 without the write-enabler they won't be able to write those shares to the 704 servers.. although they could add immutable new shares to new servers. -
deleted file docs/specifications/mutable.txt
diff --git a/docs/specifications/mutable.txt b/docs/specifications/mutable.txt deleted file mode 100644 index 40a5374..0000000
+ - 1 2 This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.3 4 = Mutable Files =5 6 Mutable File Slots are places with a stable identifier that can hold data7 that changes over time. In contrast to CHK slots, for which the8 URI/identifier is derived from the contents themselves, the Mutable File Slot9 URI remains fixed for the life of the slot, regardless of what data is placed10 inside it.11 12 Each mutable slot is referenced by two different URIs. The "read-write" URI13 grants read-write access to its holder, allowing them to put whatever14 contents they like into the slot. The "read-only" URI is less powerful, only15 granting read access, and not enabling modification of the data. The16 read-write URI can be turned into the read-only URI, but not the other way17 around.18 19 The data in these slots is distributed over a number of servers, using the20 same erasure coding that CHK files use, with 3-of-10 being a typical choice21 of encoding parameters. The data is encrypted and signed in such a way that22 only the holders of the read-write URI will be able to set the contents of23 the slot, and only the holders of the read-only URI will be able to read24 those contents. Holders of either URI will be able to validate the contents25 as being written by someone with the read-write URI. The servers who hold the26 shares cannot read or modify them: the worst they can do is deny service (by27 deleting or corrupting the shares), or attempt a rollback attack (which can28 only succeed with the cooperation of at least k servers).29 30 == Consistency vs Availability ==31 32 There is an age-old battle between consistency and availability. Epic papers33 have been written, elaborate proofs have been established, and generations of34 theorists have learned that you cannot simultaneously achieve guaranteed35 consistency with guaranteed reliability. In addition, the closer to 0 you get36 on either axis, the cost and complexity of the design goes up.37 38 Tahoe's design goals are to largely favor design simplicity, then slightly39 favor read availability, over the other criteria.40 41 As we develop more sophisticated mutable slots, the API may expose multiple42 read versions to the application layer. The tahoe philosophy is to defer most43 consistency recovery logic to the higher layers. Some applications have44 effective ways to merge multiple versions, so inconsistency is not45 necessarily a problem (i.e. directory nodes can usually merge multiple "add46 child" operations).47 48 == The Prime Coordination Directive: "Don't Do That" ==49 50 The current rule for applications which run on top of Tahoe is "do not51 perform simultaneous uncoordinated writes". That means you need non-tahoe52 means to make sure that two parties are not trying to modify the same mutable53 slot at the same time. For example:54 55 * don't give the read-write URI to anyone else. Dirnodes in a private56 directory generally satisfy this case, as long as you don't use two57 clients on the same account at the same time58 * if you give a read-write URI to someone else, stop using it yourself. An59 inbox would be a good example of this.60 * if you give a read-write URI to someone else, call them on the phone61 before you write into it62 * build an automated mechanism to have your agents coordinate writes.63 For example, we expect a future release to include a FURL for a64 "coordination server" in the dirnodes. The rule can be that you must65 contact the coordination server and obtain a lock/lease on the file66 before you're allowed to modify it.67 68 If you do not follow this rule, Bad Things will happen. The worst-case Bad69 Thing is that the entire file will be lost. A less-bad Bad Thing is that one70 or more of the simultaneous writers will lose their changes. An observer of71 the file may not see monotonically-increasing changes to the file, i.e. they72 may see version 1, then version 2, then 3, then 2 again.73 74 Tahoe takes some amount of care to reduce the badness of these Bad Things.75 One way you can help nudge it from the "lose your file" case into the "lose76 some changes" case is to reduce the number of competing versions: multiple77 versions of the file that different parties are trying to establish as the78 one true current contents. Each simultaneous writer counts as a "competing79 version", as does the previous version of the file. If the count "S" of these80 competing versions is larger than N/k, then the file runs the risk of being81 lost completely. [TODO] If at least one of the writers remains running after82 the collision is detected, it will attempt to recover, but if S>(N/k) and all83 writers crash after writing a few shares, the file will be lost.84 85 Note that Tahoe uses serialization internally to make sure that a single86 Tahoe node will not perform simultaneous modifications to a mutable file. It87 accomplishes this by using a weakref cache of the MutableFileNode (so that88 there will never be two distinct MutableFileNodes for the same file), and by89 forcing all mutable file operations to obtain a per-node lock before they90 run. The Prime Coordination Directive therefore applies to inter-node91 conflicts, not intra-node ones.92 93 94 == Small Distributed Mutable Files ==95 96 SDMF slots are suitable for small (<1MB) files that are editing by rewriting97 the entire file. The three operations are:98 99 * allocate (with initial contents)100 * set (with new contents)101 * get (old contents)102 103 The first use of SDMF slots will be to hold directories (dirnodes), which map104 encrypted child names to rw-URI/ro-URI pairs.105 106 === SDMF slots overview ===107 108 Each SDMF slot is created with a public/private key pair. The public key is109 known as the "verification key", while the private key is called the110 "signature key". The private key is hashed and truncated to 16 bytes to form111 the "write key" (an AES symmetric key). The write key is then hashed and112 truncated to form the "read key". The read key is hashed and truncated to113 form the 16-byte "storage index" (a unique string used as an index to locate114 stored data).115 116 The public key is hashed by itself to form the "verification key hash".117 118 The write key is hashed a different way to form the "write enabler master".119 For each storage server on which a share is kept, the write enabler master is120 concatenated with the server's nodeid and hashed, and the result is called121 the "write enabler" for that particular server. Note that multiple shares of122 the same slot stored on the same server will all get the same write enabler,123 i.e. the write enabler is associated with the "bucket", rather than the124 individual shares.125 126 The private key is encrypted (using AES in counter mode) by the write key,127 and the resulting crypttext is stored on the servers. so it will be128 retrievable by anyone who knows the write key. The write key is not used to129 encrypt anything else, and the private key never changes, so we do not need130 an IV for this purpose.131 132 The actual data is encrypted (using AES in counter mode) with a key derived133 by concatenating the readkey with the IV, the hashing the results and134 truncating to 16 bytes. The IV is randomly generated each time the slot is135 updated, and stored next to the encrypted data.136 137 The read-write URI consists of the write key and the verification key hash.138 The read-only URI contains the read key and the verification key hash. The139 verify-only URI contains the storage index and the verification key hash.140 141 URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)142 URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)143 URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)144 145 Note that this allows the read-only and verify-only URIs to be derived from146 the read-write URI without actually retrieving the public keys. Also note147 that it means the read-write agent must validate both the private key and the148 public key when they are first fetched. All users validate the public key in149 exactly the same way.150 151 The SDMF slot is allocated by sending a request to the storage server with a152 desired size, the storage index, and the write enabler for that server's153 nodeid. If granted, the write enabler is stashed inside the slot's backing154 store file. All further write requests must be accompanied by the write155 enabler or they will not be honored. The storage server does not share the156 write enabler with anyone else.157 158 The SDMF slot structure will be described in more detail below. The important159 pieces are:160 161 * a sequence number162 * a root hash "R"163 * the encoding parameters (including k, N, file size, segment size)164 * a signed copy of [seqnum,R,encoding_params], using the signature key165 * the verification key (not encrypted)166 * the share hash chain (part of a Merkle tree over the share hashes)167 * the block hash tree (Merkle tree over blocks of share data)168 * the share data itself (erasure-coding of read-key-encrypted file data)169 * the signature key, encrypted with the write key170 171 The access pattern for read is:172 * hash read-key to get storage index173 * use storage index to locate 'k' shares with identical 'R' values174 * either get one share, read 'k' from it, then read k-1 shares175 * or read, say, 5 shares, discover k, either get more or be finished176 * or copy k into the URIs177 * read verification key178 * hash verification key, compare against verification key hash179 * read seqnum, R, encoding parameters, signature180 * verify signature against verification key181 * read share data, compute block-hash Merkle tree and root "r"182 * read share hash chain (leading from "r" to "R")183 * validate share hash chain up to the root "R"184 * submit share data to erasure decoding185 * decrypt decoded data with read-key186 * submit plaintext to application187 188 The access pattern for write is:189 * hash write-key to get read-key, hash read-key to get storage index190 * use the storage index to locate at least one share191 * read verification key and encrypted signature key192 * decrypt signature key using write-key193 * hash signature key, compare against write-key194 * hash verification key, compare against verification key hash195 * encrypt plaintext from application with read-key196 * application can encrypt some data with the write-key to make it only197 available to writers (use this for transitive read-onlyness of dirnodes)198 * erasure-code crypttext to form shares199 * split shares into blocks200 * compute Merkle tree of blocks, giving root "r" for each share201 * compute Merkle tree of shares, find root "R" for the file as a whole202 * create share data structures, one per server:203 * use seqnum which is one higher than the old version204 * share hash chain has log(N) hashes, different for each server205 * signed data is the same for each server206 * now we have N shares and need homes for them207 * walk through peers208 * if share is not already present, allocate-and-set209 * otherwise, try to modify existing share:210 * send testv_and_writev operation to each one211 * testv says to accept share if their(seqnum+R) <= our(seqnum+R)212 * count how many servers wind up with which versions (histogram over R)213 * keep going until N servers have the same version, or we run out of servers214 * if any servers wound up with a different version, report error to215 application216 * if we ran out of servers, initiate recovery process (described below)217 218 === Server Storage Protocol ===219 220 The storage servers will provide a mutable slot container which is oblivious221 to the details of the data being contained inside it. Each storage index222 refers to a "bucket", and each bucket has one or more shares inside it. (In a223 well-provisioned network, each bucket will have only one share). The bucket224 is stored as a directory, using the base32-encoded storage index as the225 directory name. Each share is stored in a single file, using the share number226 as the filename.227 228 The container holds space for a container magic number (for versioning), the229 write enabler, the nodeid which accepted the write enabler (used for share230 migration, described below), a small number of lease structures, the embedded231 data itself, and expansion space for additional lease structures.232 233 # offset size name234 1 0 32 magic verstr "tahoe mutable container v1" plus binary235 2 32 20 write enabler's nodeid236 3 52 32 write enabler237 4 84 8 data size (actual share data present) (a)238 5 92 8 offset of (8) count of extra leases (after data)239 6 100 368 four leases, 92 bytes each240 0 4 ownerid (0 means "no lease here")241 4 4 expiration timestamp242 8 32 renewal token243 40 32 cancel token244 72 20 nodeid which accepted the tokens245 7 468 (a) data246 8 ?? 4 count of extra leases247 9 ?? n*92 extra leases248 249 The "extra leases" field must be copied and rewritten each time the size of250 the enclosed data changes. The hope is that most buckets will have four or251 fewer leases and this extra copying will not usually be necessary.252 253 The (4) "data size" field contains the actual number of bytes of data present254 in field (7), such that a client request to read beyond 504+(a) will result255 in an error. This allows the client to (one day) read relative to the end of256 the file. The container size (that is, (8)-(7)) might be larger, especially257 if extra size was pre-allocated in anticipation of filling the container with258 a lot of data.259 260 The offset in (5) points at the *count* of extra leases, at (8). The actual261 leases (at (9)) begin 4 bytes later. If the container size changes, both (8)262 and (9) must be relocated by copying.263 264 The server will honor any write commands that provide the write token and do265 not exceed the server-wide storage size limitations. Read and write commands266 MUST be restricted to the 'data' portion of the container: the implementation267 of those commands MUST perform correct bounds-checking to make sure other268 portions of the container are inaccessible to the clients.269 270 The two methods provided by the storage server on these "MutableSlot" share271 objects are:272 273 * readv(ListOf(offset=int, length=int))274 * returns a list of bytestrings, of the various requested lengths275 * offset < 0 is interpreted relative to the end of the data276 * spans which hit the end of the data will return truncated data277 278 * testv_and_writev(write_enabler, test_vector, write_vector)279 * this is a test-and-set operation which performs the given tests and only280 applies the desired writes if all tests succeed. This is used to detect281 simultaneous writers, and to reduce the chance that an update will lose282 data recently written by some other party (written after the last time283 this slot was read).284 * test_vector=ListOf(TupleOf(offset, length, opcode, specimen))285 * the opcode is a string, from the set [gt, ge, eq, le, lt, ne]286 * each element of the test vector is read from the slot's data and287 compared against the specimen using the desired (in)equality. If all288 tests evaluate True, the write is performed289 * write_vector=ListOf(TupleOf(offset, newdata))290 * offset < 0 is not yet defined, it probably means relative to the291 end of the data, which probably means append, but we haven't nailed292 it down quite yet293 * write vectors are executed in order, which specifies the results of294 overlapping writes295 * return value:296 * error: OutOfSpace297 * error: something else (io error, out of memory, whatever)298 * (True, old_test_data): the write was accepted (test_vector passed)299 * (False, old_test_data): the write was rejected (test_vector failed)300 * both 'accepted' and 'rejected' return the old data that was used301 for the test_vector comparison. This can be used by the client302 to detect write collisions, including collisions for which the303 desired behavior was to overwrite the old version.304 305 In addition, the storage server provides several methods to access these306 share objects:307 308 * allocate_mutable_slot(storage_index, sharenums=SetOf(int))309 * returns DictOf(int, MutableSlot)310 * get_mutable_slot(storage_index)311 * returns DictOf(int, MutableSlot)312 * or raises KeyError313 314 We intend to add an interface which allows small slots to allocate-and-write315 in a single call, as well as do update or read in a single call. The goal is316 to allow a reasonably-sized dirnode to be created (or updated, or read) in317 just one round trip (to all N shareholders in parallel).318 319 ==== migrating shares ====320 321 If a share must be migrated from one server to another, two values become322 invalid: the write enabler (since it was computed for the old server), and323 the lease renew/cancel tokens.324 325 Suppose that a slot was first created on nodeA, and was thus initialized with326 WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is327 moved from nodeA to nodeB.328 329 Readers may still be able to find the share in its new home, depending upon330 how many servers are present in the grid, where the new nodeid lands in the331 permuted index for this particular storage index, and how many servers the332 reading client is willing to contact.333 334 When a client attempts to write to this migrated share, it will get a "bad335 write enabler" error, since the WE it computes for nodeB will not match the336 WE(nodeA) that was embedded in the share. When this occurs, the "bad write337 enabler" message must include the old nodeid (e.g. nodeA) that was in the338 share.339 340 The client then computes H(nodeB+H(WEM+nodeA)), which is the same as341 H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which342 is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to343 anyone else. Also note that the client does not send a value to nodeB that344 would allow the node to impersonate the client to a third node: everything345 sent to nodeB will include something specific to nodeB in it.346 347 The server locally computes H(nodeB+WE(nodeA)), using its own node id and the348 old write enabler from the share. It compares this against the value supplied349 by the client. If they match, this serves as proof that the client was able350 to compute the old write enabler. The server then accepts the client's new351 WE(nodeB) and writes it into the container.352 353 This WE-fixup process requires an extra round trip, and requires the error354 message to include the old nodeid, but does not require any public key355 operations on either client or server.356 357 Migrating the leases will require a similar protocol. This protocol will be358 defined concretely at a later date.359 360 === Code Details ===361 362 The MutableFileNode class is used to manipulate mutable files (as opposed to363 ImmutableFileNodes). These are initially generated with364 client.create_mutable_file(), and later recreated from URIs with365 client.create_node_from_uri(). Instances of this class will contain a URI and366 a reference to the client (for peer selection and connection).367 368 NOTE: this section is out of date. Please see src/allmydata/interfaces.py369 (the section on IMutableFilesystemNode) for more accurate information.370 371 The methods of MutableFileNode are:372 373 * download_to_data() -> [deferred] newdata, NotEnoughSharesError374 * if there are multiple retrieveable versions in the grid, get() returns375 the first version it can reconstruct, and silently ignores the others.376 In the future, a more advanced API will signal and provide access to377 the multiple heads.378 * update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError379 * overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError380 381 download_to_data() causes a new retrieval to occur, pulling the current382 contents from the grid and returning them to the caller. At the same time,383 this call caches information about the current version of the file. This384 information will be used in a subsequent call to update(), and if another385 change has occured between the two, this information will be out of date,386 triggering the UncoordinatedWriteError.387 388 update() is therefore intended to be used just after a download_to_data(), in389 the following pattern:390 391 d = mfn.download_to_data()392 d.addCallback(apply_delta)393 d.addCallback(mfn.update)394 395 If the update() call raises UCW, then the application can simply return an396 error to the user ("you violated the Prime Coordination Directive"), and they397 can try again later. Alternatively, the application can attempt to retry on398 its own. To accomplish this, the app needs to pause, download the new399 (post-collision and post-recovery) form of the file, reapply their delta,400 then submit the update request again. A randomized pause is necessary to401 reduce the chances of colliding a second time with another client that is402 doing exactly the same thing:403 404 d = mfn.download_to_data()405 d.addCallback(apply_delta)406 d.addCallback(mfn.update)407 def _retry(f):408 f.trap(UncoordinatedWriteError)409 d1 = pause(random.uniform(5, 20))410 d1.addCallback(lambda res: mfn.download_to_data())411 d1.addCallback(apply_delta)412 d1.addCallback(mfn.update)413 return d1414 d.addErrback(_retry)415 416 Enthusiastic applications can retry multiple times, using a randomized417 exponential backoff between each. A particularly enthusiastic application can418 retry forever, but such apps are encouraged to provide a means to the user of419 giving up after a while.420 421 UCW does not mean that the update was not applied, so it is also a good idea422 to skip the retry-update step if the delta was already applied:423 424 d = mfn.download_to_data()425 d.addCallback(apply_delta)426 d.addCallback(mfn.update)427 def _retry(f):428 f.trap(UncoordinatedWriteError)429 d1 = pause(random.uniform(5, 20))430 d1.addCallback(lambda res: mfn.download_to_data())431 def _maybe_apply_delta(contents):432 new_contents = apply_delta(contents)433 if new_contents != contents:434 return mfn.update(new_contents)435 d1.addCallback(_maybe_apply_delta)436 return d1437 d.addErrback(_retry)438 439 update() is the right interface to use for delta-application situations, like440 directory nodes (in which apply_delta might be adding or removing child441 entries from a serialized table).442 443 Note that any uncoordinated write has the potential to lose data. We must do444 more analysis to be sure, but it appears that two clients who write to the445 same mutable file at the same time (even if both eventually retry) will, with446 high probability, result in one client observing UCW and the other silently447 losing their changes. It is also possible for both clients to observe UCW.448 The moral of the story is that the Prime Coordination Directive is there for449 a reason, and that recovery/UCW/retry is not a subsitute for write450 coordination.451 452 overwrite() tells the client to ignore this cached version information, and453 to unconditionally replace the mutable file's contents with the new data.454 This should not be used in delta application, but rather in situations where455 you want to replace the file's contents with completely unrelated ones. When456 raw files are uploaded into a mutable slot through the tahoe webapi (using457 POST and the ?mutable=true argument), they are put in place with overwrite().458 459 460 461 The peer-selection and data-structure manipulation (and signing/verification)462 steps will be implemented in a separate class in allmydata/mutable.py .463 464 === SMDF Slot Format ===465 466 This SMDF data lives inside a server-side MutableSlot container. The server467 is oblivious to this format.468 469 This data is tightly packed. In particular, the share data is defined to run470 all the way to the beginning of the encrypted private key (the encprivkey471 offset is used both to terminate the share data and to begin the encprivkey).472 473 # offset size name474 1 0 1 version byte, \x00 for this format475 2 1 8 sequence number. 2^64-1 must be handled specially, TBD476 3 9 32 "R" (root of share hash Merkle tree)477 4 41 16 IV (share data is AES(H(readkey+IV)) )478 5 57 18 encoding parameters:479 57 1 k480 58 1 N481 59 8 segment size482 67 8 data length (of original plaintext)483 6 75 32 offset table:484 75 4 (8) signature485 79 4 (9) share hash chain486 83 4 (10) block hash tree487 87 4 (11) share data488 91 8 (12) encrypted private key489 99 8 (13) EOF490 7 107 436ish verification key (2048 RSA key)491 8 543ish 256ish signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm))492 9 799ish (a) share hash chain, encoded as:493 "".join([pack(">H32s", shnum, hash)494 for (shnum,hash) in needed_hashes])495 10 (927ish) (b) block hash tree, encoded as:496 "".join([pack(">32s",hash) for hash in block_hash_tree])497 11 (935ish) LEN share data (no gap between this and encprivkey)498 12 ?? 1216ish encrypted private key= AESenc(write-key, RSA-key)499 13 ?? -- EOF500 501 (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.502 This is the set of hashes necessary to validate this share's leaf in the503 share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.504 (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes505 long. This is the set of hashes necessary to validate any given block of506 share data up to the per-share root "r". Each "r" is a leaf of the share507 has tree (with root "R"), from which a minimal subset of hashes is put in508 the share hash chain in (8).509 510 === Recovery ===511 512 The first line of defense against damage caused by colliding writes is the513 Prime Coordination Directive: "Don't Do That".514 515 The second line of defense is to keep "S" (the number of competing versions)516 lower than N/k. If this holds true, at least one competing version will have517 k shares and thus be recoverable. Note that server unavailability counts518 against us here: the old version stored on the unavailable server must be519 included in the value of S.520 521 The third line of defense is our use of testv_and_writev() (described below),522 which increases the convergence of simultaneous writes: one of the writers523 will be favored (the one with the highest "R"), and that version is more524 likely to be accepted than the others. This defense is least effective in the525 pathological situation where S simultaneous writers are active, the one with526 the lowest "R" writes to N-k+1 of the shares and then dies, then the one with527 the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the528 one with the highest "R" writes to k-1 shares and dies. Any other sequencing529 will allow the highest "R" to write to at least k shares and establish a new530 revision.531 532 The fourth line of defense is the fact that each client keeps writing until533 at least one version has N shares. This uses additional servers, if534 necessary, to make sure that either the client's version or some535 newer/overriding version is highly available.536 537 The fifth line of defense is the recovery algorithm, which seeks to make sure538 that at least *one* version is highly available, even if that version is539 somebody else's.540 541 The write-shares-to-peers algorithm is as follows:542 543 * permute peers according to storage index544 * walk through peers, trying to assign one share per peer545 * for each peer:546 * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test547 * this means that we will overwrite any old versions, and we will548 overwrite simultaenous writers of the same version if our R is higher.549 We will not overwrite writers using a higher seqnum.550 * record the version that each share winds up with. If the write was551 accepted, this is our own version. If it was rejected, read the552 old_test_data to find out what version was retained.553 * if old_test_data indicates the seqnum was equal or greater than our554 own, mark the "Simultanous Writes Detected" flag, which will eventually555 result in an error being reported to the writer (in their close() call).556 * build a histogram of "R" values557 * repeat until the histogram indicate that some version (possibly ours)558 has N shares. Use new servers if necessary.559 * If we run out of servers:560 * if there are at least shares-of-happiness of any one version, we're561 happy, so return. (the close() might still get an error)562 * not happy, need to reinforce something, goto RECOVERY563 564 RECOVERY:565 * read all shares, count the versions, identify the recoverable ones,566 discard the unrecoverable ones.567 * sort versions: locate max(seqnums), put all versions with that seqnum568 in the list, sort by number of outstanding shares. Then put our own569 version. (TODO: put versions with seqnum <max but >us ahead of us?).570 * for each version:571 * attempt to recover that version572 * if not possible, remove it from the list, go to next one573 * if recovered, start at beginning of peer list, push that version,574 continue until N shares are placed575 * if pushing our own version, bump up the seqnum to one higher than576 the max seqnum we saw577 * if we run out of servers:578 * schedule retry and exponential backoff to repeat RECOVERY579 * admit defeat after some period? presumeably the client will be shut down580 eventually, maybe keep trying (once per hour?) until then.581 582 583 584 585 == Medium Distributed Mutable Files ==586 587 These are just like the SDMF case, but:588 589 * we actually take advantage of the Merkle hash tree over the blocks, by590 reading a single segment of data at a time (and its necessary hashes), to591 reduce the read-time alacrity592 * we allow arbitrary writes to the file (i.e. seek() is provided, and593 O_TRUNC is no longer required)594 * we write more code on the client side (in the MutableFileNode class), to595 first read each segment that a write must modify. This looks exactly like596 the way a normal filesystem uses a block device, or how a CPU must perform597 a cache-line fill before modifying a single word.598 * we might implement some sort of copy-based atomic update server call,599 to allow multiple writev() calls to appear atomic to any readers.600 601 MDMF slots provide fairly efficient in-place edits of very large files (a few602 GB). Appending data is also fairly efficient, although each time a power of 2603 boundary is crossed, the entire file must effectively be re-uploaded (because604 the size of the block hash tree changes), so if the filesize is known in605 advance, that space ought to be pre-allocated (by leaving extra space between606 the block hash tree and the actual data).607 608 MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2609 adds cache-line reads to allow random-access writes.610 611 == Large Distributed Mutable Files ==612 613 LDMF slots use a fundamentally different way to store the file, inspired by614 Mercurial's "revlog" format. They enable very efficient insert/remove/replace615 editing of arbitrary spans. Multiple versions of the file can be retained, in616 a revision graph that can have multiple heads. Each revision can be617 referenced by a cryptographic identifier. There are two forms of the URI, one618 that means "most recent version", and a longer one that points to a specific619 revision.620 621 Metadata can be attached to the revisions, like timestamps, to enable rolling622 back an entire tree to a specific point in history.623 624 LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2625 provides explicit support for revision identifiers and branching.626 627 == TODO ==628 629 improve allocate-and-write or get-writer-buckets API to allow one-call (or630 maybe two-call) updates. The challenge is in figuring out which shares are on631 which machines. First cut will have lots of round trips.632 633 (eventually) define behavior when seqnum wraps. At the very least make sure634 it can't cause a security problem. "the slot is worn out" is acceptable.635 636 (eventually) define share-migration lease update protocol. Including the637 nodeid who accepted the lease is useful, we can use the same protocol as we638 do for updating the write enabler. However we need to know which lease to639 update.. maybe send back a list of all old nodeids that we find, then try all640 of them when we accept the update?641 642 We now do this in a specially-formatted IndexError exception:643 "UNABLE to renew non-existent lease. I have leases accepted by " +644 "nodeids: '12345','abcde','44221' ."645 646 confirm that a repairer can regenerate shares without the private key. Hmm,647 without the write-enabler they won't be able to write those shares to the648 servers.. although they could add immutable new shares to new servers. -
new file docs/specifications/outline.rst
diff --git a/docs/specifications/outline.rst b/docs/specifications/outline.rst new file mode 100644 index 0000000..9ec69bf
- + 1 ============================== 2 Specification Document Outline 3 ============================== 4 5 While we do not yet have a clear set of specification documents for Tahoe 6 (explaining the file formats, so that others can write interoperable 7 implementations), this document is intended to lay out an outline for what 8 these specs ought to contain. Think of this as the ISO 7-Layer Model for 9 Tahoe. 10 11 We currently imagine 4 documents. 12 13 1. `#1: Share Format, Encoding Algorithm`_ 14 2. `#2: Share Exchange Protocol`_ 15 3. `#3: Server Selection Algorithm, filecap format`_ 16 4. `#4: Directory Format`_ 17 18 #1: Share Format, Encoding Algorithm 19 ==================================== 20 21 This document will describe the way that files are encrypted and encoded into 22 shares. It will include a specification of the share format, and explain both 23 the encoding and decoding algorithms. It will cover both mutable and 24 immutable files. 25 26 The immutable encoding algorithm, as described by this document, will start 27 with a plaintext series of bytes, encoding parameters "k" and "N", and either 28 an encryption key or a mechanism for deterministically deriving the key from 29 the plaintext (the CHK specification). The algorithm will end with a set of N 30 shares, and a set of values that must be included in the filecap to provide 31 confidentiality (the encryption key) and integrity (the UEB hash). 32 33 The immutable decoding algorithm will start with the filecap values (key and 34 UEB hash) and "k" shares. It will explain how to validate the shares against 35 the integrity information, how to reverse the erasure-coding, and how to 36 decrypt the resulting ciphertext. It will result in the original plaintext 37 bytes (or some subrange thereof). 38 39 The sections on mutable files will contain similar information. 40 41 This document is *not* responsible for explaining the filecap format, since 42 full filecaps may need to contain additional information as described in 43 document #3. Likewise it it not responsible for explaining where to put the 44 generated shares or where to find them again later. 45 46 It is also not responsible for explaining the access control mechanisms 47 surrounding share upload, download, or modification ("Accounting" is the 48 business of controlling share upload to conserve space, and mutable file 49 shares require some sort of access control to prevent non-writecap holders 50 from destroying shares). We don't yet have a document dedicated to explaining 51 these, but let's call it "Access Control" for now. 52 53 54 #2: Share Exchange Protocol 55 =========================== 56 57 This document explains the wire-protocol used to upload, download, and modify 58 shares on the various storage servers. 59 60 Given the N shares created by the algorithm described in document #1, and a 61 set of servers who are willing to accept those shares, the protocols in this 62 document will be sufficient to get the shares onto the servers. Likewise, 63 given a set of servers who hold at least k shares, these protocols will be 64 enough to retrieve the shares necessary to begin the decoding process 65 described in document #1. The notion of a "storage index" is used to 66 reference a particular share: the storage index is generated by the encoding 67 process described in document #1. 68 69 This document does *not* describe how to identify or choose those servers, 70 rather it explains what to do once they have been selected (by the mechanisms 71 in document #3). 72 73 This document also explains the protocols that a client uses to ask a server 74 whether or not it is willing to accept an uploaded share, and whether it has 75 a share available for download. These protocols will be used by the 76 mechanisms in document #3 to help decide where the shares should be placed. 77 78 Where cryptographic mechanisms are necessary to implement access-control 79 policy, this document will explain those mechanisms. 80 81 In the future, Tahoe will be able to use multiple protocols to speak to 82 storage servers. There will be alternative forms of this document, one for 83 each protocol. The first one to be written will describe the Foolscap-based 84 protocol that tahoe currently uses, but we anticipate a subsequent one to 85 describe a more HTTP-based protocol. 86 87 #3: Server Selection Algorithm, filecap format 88 ============================================== 89 90 This document has two interrelated purposes. With a deeper understanding of 91 the issues, we may be able to separate these more cleanly in the future. 92 93 The first purpose is to explain the server selection algorithm. Given a set 94 of N shares, where should those shares be uploaded? Given some information 95 stored about a previously-uploaded file, how should a downloader locate and 96 recover at least k shares? Given a previously-uploaded mutable file, how 97 should a modifier locate all (or most of) the shares with a reasonable amount 98 of work? 99 100 This question implies many things, all of which should be explained in this 101 document: 102 103 * the notion of a "grid", nominally a set of servers who could potentially 104 hold shares, which might change over time 105 * a way to configure which grid should be used 106 * a way to discover which servers are a part of that grid 107 * a way to decide which servers are reliable enough to be worth sending 108 shares 109 * an algorithm to handle servers which refuse shares 110 * a way for a downloader to locate which servers have shares 111 * a way to choose which shares should be used for download 112 113 The server-selection algorithm has several obviously competing goals: 114 115 * minimize the amount of work that must be done during upload 116 * minimize the total storage resources used 117 * avoid "hot spots", balance load among multiple servers 118 * maximize the chance that enough shares will be downloadable later, by 119 uploading lots of shares, and by placing them on reliable servers 120 * minimize the work that the future downloader must do 121 * tolerate temporary server failures, permanent server departure, and new 122 server insertions 123 * minimize the amount of information that must be added to the filecap 124 125 The server-selection algorithm is defined in some context: some set of 126 expectations about the servers or grid with which it is expected to operate. 127 Different algorithms are appropriate for different situtations, so there will 128 be multiple alternatives of this document. 129 130 The first version of this document will describe the algorithm that the 131 current (1.3.0) release uses, which is heavily weighted towards the two main 132 use case scenarios for which Tahoe has been designed: the small, stable 133 friendnet, and the allmydata.com managed grid. In both cases, we assume that 134 the storage servers are online most of the time, they are uniformly highly 135 reliable, and that the set of servers does not change very rapidly. The 136 server-selection algorithm for this environment uses a permuted server list 137 to achieve load-balancing, uses all servers identically, and derives the 138 permutation key from the storage index to avoid adding a new field to the 139 filecap. 140 141 An alternative algorithm could give clients more precise control over share 142 placement, for example by a user who wished to make sure that k+1 shares are 143 located in each datacenter (to allow downloads to take place using only local 144 bandwidth). This algorithm could skip the permuted list and use other 145 mechanisms to accomplish load-balancing (or ignore the issue altogether). It 146 could add additional information to the filecap (like a list of which servers 147 received the shares) in lieu of performing a search at download time, perhaps 148 at the expense of allowing a repairer to move shares to a new server after 149 the initial upload. It might make up for this by storing "location hints" 150 next to each share, to indicate where other shares are likely to be found, 151 and obligating the repairer to update these hints. 152 153 The second purpose of this document is to explain the format of the file 154 capability string (or "filecap" for short). There are multiple kinds of 155 capabilties (read-write, read-only, verify-only, repaircap, lease-renewal 156 cap, traverse-only, etc). There are multiple ways to represent the filecap 157 (compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc), 158 but they must all contain enough information to reliably retrieve a file 159 (given some context, of course). It must at least contain the confidentiality 160 and integrity information from document #1 (i.e. the encryption key and the 161 UEB hash). It must also contain whatever additional information the 162 upload-time server-selection algorithm generated that will be required by the 163 downloader. 164 165 For some server-selection algorithms, the additional information will be 166 minimal. For example, the 1.3.0 release uses the hash of the encryption key 167 as a storage index, and uses the storage index to permute the server list, 168 and uses an Introducer to learn the current list of servers. This allows a 169 "close-enough" list of servers to be compressed into a filecap field that is 170 already required anyways (the encryption key). It also adds k and N to the 171 filecap, to speed up the downloader's search (the downloader knows how many 172 shares it needs, so it can send out multiple queries in parallel). 173 174 But other server-selection algorithms might require more information. Each 175 variant of this document will explain how to encode that additional 176 information into the filecap, and how to extract and use that information at 177 download time. 178 179 These two purposes are interrelated. A filecap that is interpreted in the 180 context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies 181 a specific peer-selection algorithm, a specific Introducer, and therefore a 182 fairly-specific set of servers to query for shares. A filecap which is meant 183 to be interpreted on a different sort of grid would need different 184 information. 185 186 Some filecap formats can be designed to contain more information (and depend 187 less upon context), such as the way an HTTP URL implies the existence of a 188 single global DNS system. Ideally a tahoe filecap should be able to specify 189 which "grid" it lives in, with enough information to allow a compatible 190 implementation of Tahoe to locate that grid and retrieve the file (regardless 191 of which server-selection algorithm was used for upload). 192 193 This more-universal format might come at the expense of reliability, however. 194 Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or 195 an individual host might then impact file availability (however the 196 Introducer contains DNS names or IP addresses). 197 198 #4: Directory Format 199 ==================== 200 201 Tahoe directories are a special way of interpreting and managing the contents 202 of a file (either mutable or immutable). These "dirnode" files are basically 203 serialized tables that map child name to filecap/dircap. This document 204 describes the format of these files. 205 206 Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by 207 applying an additional layer of encryption to the list of child writecaps. 208 The key for this encryption is derived from the containing file's writecap. 209 This document must explain how to derive this key and apply it to the 210 appropriate portion of the table. 211 212 Future versions of the directory format are expected to contain 213 "deep-traversal caps", which allow verification/repair of files without 214 exposing their plaintext to the repair agent. This document wil be 215 responsible for explaining traversal caps too. 216 217 Future versions of the directory format will probably contain an index and 218 more advanced data structures (for efficiency and fast lookups), instead of a 219 simple flat list of (childname, childcap). This document will also need to 220 describe metadata formats, including what access-control policies are defined 221 for the metadata. -
deleted file docs/specifications/outline.txt
diff --git a/docs/specifications/outline.txt b/docs/specifications/outline.txt deleted file mode 100644 index 204878e..0000000
+ - 1 = Specification Document Outline =2 3 While we do not yet have a clear set of specification documents for Tahoe4 (explaining the file formats, so that others can write interoperable5 implementations), this document is intended to lay out an outline for what6 these specs ought to contain. Think of this as the ISO 7-Layer Model for7 Tahoe.8 9 We currently imagine 4 documents.10 11 == #1: Share Format, Encoding Algorithm ==12 13 This document will describe the way that files are encrypted and encoded into14 shares. It will include a specification of the share format, and explain both15 the encoding and decoding algorithms. It will cover both mutable and16 immutable files.17 18 The immutable encoding algorithm, as described by this document, will start19 with a plaintext series of bytes, encoding parameters "k" and "N", and either20 an encryption key or a mechanism for deterministically deriving the key from21 the plaintext (the CHK specification). The algorithm will end with a set of N22 shares, and a set of values that must be included in the filecap to provide23 confidentiality (the encryption key) and integrity (the UEB hash).24 25 The immutable decoding algorithm will start with the filecap values (key and26 UEB hash) and "k" shares. It will explain how to validate the shares against27 the integrity information, how to reverse the erasure-coding, and how to28 decrypt the resulting ciphertext. It will result in the original plaintext29 bytes (or some subrange thereof).30 31 The sections on mutable files will contain similar information.32 33 This document is *not* responsible for explaining the filecap format, since34 full filecaps may need to contain additional information as described in35 document #3. Likewise it it not responsible for explaining where to put the36 generated shares or where to find them again later.37 38 It is also not responsible for explaining the access control mechanisms39 surrounding share upload, download, or modification ("Accounting" is the40 business of controlling share upload to conserve space, and mutable file41 shares require some sort of access control to prevent non-writecap holders42 from destroying shares). We don't yet have a document dedicated to explaining43 these, but let's call it "Access Control" for now.44 45 46 == #2: Share Exchange Protocol ==47 48 This document explains the wire-protocol used to upload, download, and modify49 shares on the various storage servers.50 51 Given the N shares created by the algorithm described in document #1, and a52 set of servers who are willing to accept those shares, the protocols in this53 document will be sufficient to get the shares onto the servers. Likewise,54 given a set of servers who hold at least k shares, these protocols will be55 enough to retrieve the shares necessary to begin the decoding process56 described in document #1. The notion of a "storage index" is used to57 reference a particular share: the storage index is generated by the encoding58 process described in document #1.59 60 This document does *not* describe how to identify or choose those servers,61 rather it explains what to do once they have been selected (by the mechanisms62 in document #3).63 64 This document also explains the protocols that a client uses to ask a server65 whether or not it is willing to accept an uploaded share, and whether it has66 a share available for download. These protocols will be used by the67 mechanisms in document #3 to help decide where the shares should be placed.68 69 Where cryptographic mechanisms are necessary to implement access-control70 policy, this document will explain those mechanisms.71 72 In the future, Tahoe will be able to use multiple protocols to speak to73 storage servers. There will be alternative forms of this document, one for74 each protocol. The first one to be written will describe the Foolscap-based75 protocol that tahoe currently uses, but we anticipate a subsequent one to76 describe a more HTTP-based protocol.77 78 == #3: Server Selection Algorithm, filecap format ==79 80 This document has two interrelated purposes. With a deeper understanding of81 the issues, we may be able to separate these more cleanly in the future.82 83 The first purpose is to explain the server selection algorithm. Given a set84 of N shares, where should those shares be uploaded? Given some information85 stored about a previously-uploaded file, how should a downloader locate and86 recover at least k shares? Given a previously-uploaded mutable file, how87 should a modifier locate all (or most of) the shares with a reasonable amount88 of work?89 90 This question implies many things, all of which should be explained in this91 document:92 93 * the notion of a "grid", nominally a set of servers who could potentially94 hold shares, which might change over time95 * a way to configure which grid should be used96 * a way to discover which servers are a part of that grid97 * a way to decide which servers are reliable enough to be worth sending98 shares99 * an algorithm to handle servers which refuse shares100 * a way for a downloader to locate which servers have shares101 * a way to choose which shares should be used for download102 103 The server-selection algorithm has several obviously competing goals:104 105 * minimize the amount of work that must be done during upload106 * minimize the total storage resources used107 * avoid "hot spots", balance load among multiple servers108 * maximize the chance that enough shares will be downloadable later, by109 uploading lots of shares, and by placing them on reliable servers110 * minimize the work that the future downloader must do111 * tolerate temporary server failures, permanent server departure, and new112 server insertions113 * minimize the amount of information that must be added to the filecap114 115 The server-selection algorithm is defined in some context: some set of116 expectations about the servers or grid with which it is expected to operate.117 Different algorithms are appropriate for different situtations, so there will118 be multiple alternatives of this document.119 120 The first version of this document will describe the algorithm that the121 current (1.3.0) release uses, which is heavily weighted towards the two main122 use case scenarios for which Tahoe has been designed: the small, stable123 friendnet, and the allmydata.com managed grid. In both cases, we assume that124 the storage servers are online most of the time, they are uniformly highly125 reliable, and that the set of servers does not change very rapidly. The126 server-selection algorithm for this environment uses a permuted server list127 to achieve load-balancing, uses all servers identically, and derives the128 permutation key from the storage index to avoid adding a new field to the129 filecap.130 131 An alternative algorithm could give clients more precise control over share132 placement, for example by a user who wished to make sure that k+1 shares are133 located in each datacenter (to allow downloads to take place using only local134 bandwidth). This algorithm could skip the permuted list and use other135 mechanisms to accomplish load-balancing (or ignore the issue altogether). It136 could add additional information to the filecap (like a list of which servers137 received the shares) in lieu of performing a search at download time, perhaps138 at the expense of allowing a repairer to move shares to a new server after139 the initial upload. It might make up for this by storing "location hints"140 next to each share, to indicate where other shares are likely to be found,141 and obligating the repairer to update these hints.142 143 The second purpose of this document is to explain the format of the file144 capability string (or "filecap" for short). There are multiple kinds of145 capabilties (read-write, read-only, verify-only, repaircap, lease-renewal146 cap, traverse-only, etc). There are multiple ways to represent the filecap147 (compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),148 but they must all contain enough information to reliably retrieve a file149 (given some context, of course). It must at least contain the confidentiality150 and integrity information from document #1 (i.e. the encryption key and the151 UEB hash). It must also contain whatever additional information the152 upload-time server-selection algorithm generated that will be required by the153 downloader.154 155 For some server-selection algorithms, the additional information will be156 minimal. For example, the 1.3.0 release uses the hash of the encryption key157 as a storage index, and uses the storage index to permute the server list,158 and uses an Introducer to learn the current list of servers. This allows a159 "close-enough" list of servers to be compressed into a filecap field that is160 already required anyways (the encryption key). It also adds k and N to the161 filecap, to speed up the downloader's search (the downloader knows how many162 shares it needs, so it can send out multiple queries in parallel).163 164 But other server-selection algorithms might require more information. Each165 variant of this document will explain how to encode that additional166 information into the filecap, and how to extract and use that information at167 download time.168 169 These two purposes are interrelated. A filecap that is interpreted in the170 context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies171 a specific peer-selection algorithm, a specific Introducer, and therefore a172 fairly-specific set of servers to query for shares. A filecap which is meant173 to be interpreted on a different sort of grid would need different174 information.175 176 Some filecap formats can be designed to contain more information (and depend177 less upon context), such as the way an HTTP URL implies the existence of a178 single global DNS system. Ideally a tahoe filecap should be able to specify179 which "grid" it lives in, with enough information to allow a compatible180 implementation of Tahoe to locate that grid and retrieve the file (regardless181 of which server-selection algorithm was used for upload).182 183 This more-universal format might come at the expense of reliability, however.184 Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or185 an individual host might then impact file availability (however the186 Introducer contains DNS names or IP addresses).187 188 == #4: Directory Format ==189 190 Tahoe directories are a special way of interpreting and managing the contents191 of a file (either mutable or immutable). These "dirnode" files are basically192 serialized tables that map child name to filecap/dircap. This document193 describes the format of these files.194 195 Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by196 applying an additional layer of encryption to the list of child writecaps.197 The key for this encryption is derived from the containing file's writecap.198 This document must explain how to derive this key and apply it to the199 appropriate portion of the table.200 201 Future versions of the directory format are expected to contain202 "deep-traversal caps", which allow verification/repair of files without203 exposing their plaintext to the repair agent. This document wil be204 responsible for explaining traversal caps too.205 206 Future versions of the directory format will probably contain an index and207 more advanced data structures (for efficiency and fast lookups), instead of a208 simple flat list of (childname, childcap). This document will also need to209 describe metadata formats, including what access-control policies are defined210 for the metadata. -
new file docs/specifications/servers-of-happiness.rst
diff --git a/docs/specifications/servers-of-happiness.rst b/docs/specifications/servers-of-happiness.rst new file mode 100644 index 0000000..7f0029b
- + 1 ==================== 2 Servers of Happiness 3 ==================== 4 5 When you upload a file to a Tahoe-LAFS grid, you expect that it will 6 stay there for a while, and that it will do so even if a few of the 7 peers on the grid stop working, or if something else goes wrong. An 8 upload health metric helps to make sure that this actually happens. 9 An upload health metric is a test that looks at a file on a Tahoe-LAFS 10 grid and says whether or not that file is healthy; that is, whether it 11 is distributed on the grid in such a way as to ensure that it will 12 probably survive in good enough shape to be recoverable, even if a few 13 things go wrong between the time of the test and the time that it is 14 recovered. Our current upload health metric for immutable files is called 15 'servers-of-happiness'; its predecessor was called 'shares-of-happiness'. 16 17 shares-of-happiness used the number of encoded shares generated by a 18 file upload to say whether or not it was healthy. If there were more 19 shares than a user-configurable threshold, the file was reported to be 20 healthy; otherwise, it was reported to be unhealthy. In normal 21 situations, the upload process would distribute shares fairly evenly 22 over the peers in the grid, and in that case shares-of-happiness 23 worked fine. However, because it only considered the number of shares, 24 and not where they were on the grid, it could not detect situations 25 where a file was unhealthy because most or all of the shares generated 26 from the file were stored on one or two peers. 27 28 servers-of-happiness addresses this by extending the share-focused 29 upload health metric to also consider the location of the shares on 30 grid. servers-of-happiness looks at the mapping of peers to the shares 31 that they hold, and compares the cardinality of the largest happy subset 32 of those to a user-configurable threshold. A happy subset of peers has 33 the property that any k (where k is as in k-of-n encoding) peers within 34 the subset can reconstruct the source file. This definition of file 35 health provides a stronger assurance of file availability over time; 36 with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed 37 to be available even if 4 peers fail. 38 39 Measuring Servers of Happiness 40 ============================== 41 42 We calculate servers-of-happiness by computing a matching on a 43 bipartite graph that is related to the layout of shares on the grid. 44 One set of vertices is the peers on the grid, and one set of vertices is 45 the shares. An edge connects a peer and a share if the peer will (or 46 does, for existing shares) hold the share. The size of the maximum 47 matching on this graph is the size of the largest happy peer set that 48 exists for the upload. 49 50 First, note that a bipartite matching of size n corresponds to a happy 51 subset of size n. This is because a bipartite matching of size n implies 52 that there are n peers such that each peer holds a share that no other 53 peer holds. Then any k of those peers collectively hold k distinct 54 shares, and can restore the file. 55 56 A bipartite matching of size n is not necessary for a happy subset of 57 size n, however (so it is not correct to say that the size of the 58 maximum matching on this graph is the size of the largest happy subset 59 of peers that exists for the upload). For example, consider a file with 60 k = 3, and suppose that each peer has all three of those pieces. Then, 61 since any peer from the original upload can restore the file, if there 62 are 10 peers holding shares, and the happiness threshold is 7, the 63 upload should be declared happy, because there is a happy subset of size 64 10, and 10 > 7. However, since a maximum matching on the bipartite graph 65 related to this layout has only 3 edges, Tahoe-LAFS declares the upload 66 unhealthy. Though it is not unhealthy, a share layout like this example 67 is inefficient; for k = 3, and if there are n peers, it corresponds to 68 an expansion factor of 10x. Layouts that are declared healthy by the 69 bipartite graph matching approach have the property that they correspond 70 to uploads that are either already relatively efficient in their 71 utilization of space, or can be made to be so by deleting shares; and 72 that place all of the shares that they generate, enabling redistribution 73 of shares later without having to re-encode the file. Also, it is 74 computationally reasonable to compute a maximum matching in a bipartite 75 graph, and there are well-studied algorithms to do that. 76 77 Issues 78 ====== 79 80 The uploader is good at detecting unhealthy upload layouts, but it 81 doesn't always know how to make an unhealthy upload into a healthy 82 upload if it is possible to do so; it attempts to redistribute shares to 83 achieve happiness, but only in certain circumstances. The redistribution 84 algorithm isn't optimal, either, so even in these cases it will not 85 always find a happy layout if one can be arrived at through 86 redistribution. We are investigating improvements to address these 87 issues. 88 89 We don't use servers-of-happiness for mutable files yet; this fix will 90 likely come in Tahoe-LAFS version 1.8. -
deleted file docs/specifications/servers-of-happiness.txt
diff --git a/docs/specifications/servers-of-happiness.txt b/docs/specifications/servers-of-happiness.txt deleted file mode 100644 index 67c6d71..0000000
+ - 1 = Servers of Happiness =2 3 When you upload a file to a Tahoe-LAFS grid, you expect that it will4 stay there for a while, and that it will do so even if a few of the5 peers on the grid stop working, or if something else goes wrong. An6 upload health metric helps to make sure that this actually happens.7 An upload health metric is a test that looks at a file on a Tahoe-LAFS8 grid and says whether or not that file is healthy; that is, whether it9 is distributed on the grid in such a way as to ensure that it will10 probably survive in good enough shape to be recoverable, even if a few11 things go wrong between the time of the test and the time that it is12 recovered. Our current upload health metric for immutable files is called13 'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.14 15 shares-of-happiness used the number of encoded shares generated by a16 file upload to say whether or not it was healthy. If there were more17 shares than a user-configurable threshold, the file was reported to be18 healthy; otherwise, it was reported to be unhealthy. In normal19 situations, the upload process would distribute shares fairly evenly20 over the peers in the grid, and in that case shares-of-happiness21 worked fine. However, because it only considered the number of shares,22 and not where they were on the grid, it could not detect situations23 where a file was unhealthy because most or all of the shares generated24 from the file were stored on one or two peers.25 26 servers-of-happiness addresses this by extending the share-focused27 upload health metric to also consider the location of the shares on28 grid. servers-of-happiness looks at the mapping of peers to the shares29 that they hold, and compares the cardinality of the largest happy subset30 of those to a user-configurable threshold. A happy subset of peers has31 the property that any k (where k is as in k-of-n encoding) peers within32 the subset can reconstruct the source file. This definition of file33 health provides a stronger assurance of file availability over time;34 with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed35 to be available even if 4 peers fail.36 37 == Measuring Servers of Happiness ==38 39 We calculate servers-of-happiness by computing a matching on a40 bipartite graph that is related to the layout of shares on the grid.41 One set of vertices is the peers on the grid, and one set of vertices is42 the shares. An edge connects a peer and a share if the peer will (or43 does, for existing shares) hold the share. The size of the maximum44 matching on this graph is the size of the largest happy peer set that45 exists for the upload.46 47 First, note that a bipartite matching of size n corresponds to a happy48 subset of size n. This is because a bipartite matching of size n implies49 that there are n peers such that each peer holds a share that no other50 peer holds. Then any k of those peers collectively hold k distinct51 shares, and can restore the file.52 53 A bipartite matching of size n is not necessary for a happy subset of54 size n, however (so it is not correct to say that the size of the55 maximum matching on this graph is the size of the largest happy subset56 of peers that exists for the upload). For example, consider a file with57 k = 3, and suppose that each peer has all three of those pieces. Then,58 since any peer from the original upload can restore the file, if there59 are 10 peers holding shares, and the happiness threshold is 7, the60 upload should be declared happy, because there is a happy subset of size61 10, and 10 > 7. However, since a maximum matching on the bipartite graph62 related to this layout has only 3 edges, Tahoe-LAFS declares the upload63 unhealthy. Though it is not unhealthy, a share layout like this example64 is inefficient; for k = 3, and if there are n peers, it corresponds to65 an expansion factor of 10x. Layouts that are declared healthy by the66 bipartite graph matching approach have the property that they correspond67 to uploads that are either already relatively efficient in their68 utilization of space, or can be made to be so by deleting shares; and69 that place all of the shares that they generate, enabling redistribution70 of shares later without having to re-encode the file. Also, it is71 computationally reasonable to compute a maximum matching in a bipartite72 graph, and there are well-studied algorithms to do that.73 74 == Issues ==75 76 The uploader is good at detecting unhealthy upload layouts, but it77 doesn't always know how to make an unhealthy upload into a healthy78 upload if it is possible to do so; it attempts to redistribute shares to79 achieve happiness, but only in certain circumstances. The redistribution80 algorithm isn't optimal, either, so even in these cases it will not81 always find a happy layout if one can be arrived at through82 redistribution. We are investigating improvements to address these83 issues.84 85 We don't use servers-of-happiness for mutable files yet; this fix will86 likely come in Tahoe-LAFS version 1.8. -
new file docs/specifications/uri.rst
diff --git a/docs/specifications/uri.rst b/docs/specifications/uri.rst new file mode 100644 index 0000000..91f8cc2
- + 1 ========== 2 Tahoe URIs 3 ========== 4 5 1. `File URIs`_ 6 7 1. `CHK URIs`_ 8 2. `LIT URIs`_ 9 3. `Mutable File URIs`_ 10 11 2. `Directory URIs`_ 12 3. `Internal Usage of URIs`_ 13 14 Each file and directory in a Tahoe filesystem is described by a "URI". There 15 are different kinds of URIs for different kinds of objects, and there are 16 different kinds of URIs to provide different kinds of access to those 17 objects. Each URI is a string representation of a "capability" or "cap", and 18 there are read-caps, write-caps, verify-caps, and others. 19 20 Each URI provides both ``location`` and ``identification`` properties. 21 ``location`` means that holding the URI is sufficient to locate the data it 22 represents (this means it contains a storage index or a lookup key, whatever 23 is necessary to find the place or places where the data is being kept). 24 ``identification`` means that the URI also serves to validate the data: an 25 attacker who wants to trick you into into using the wrong data will be 26 limited in their abilities by the identification properties of the URI. 27 28 Some URIs are subsets of others. In particular, if you know a URI which 29 allows you to modify some object, you can produce a weaker read-only URI and 30 give it to someone else, and they will be able to read that object but not 31 modify it. Directories, for example, have a read-cap which is derived from 32 the write-cap: anyone with read/write access to the directory can produce a 33 limited URI that grants read-only access, but not the other way around. 34 35 src/allmydata/uri.py is the main place where URIs are processed. It is 36 the authoritative definition point for all the the URI types described 37 herein. 38 39 File URIs 40 ========= 41 42 The lowest layer of the Tahoe architecture (the "grid") is reponsible for 43 mapping URIs to data. This is basically a distributed hash table, in which 44 the URI is the key, and some sequence of bytes is the value. 45 46 There are two kinds of entries in this table: immutable and mutable. For 47 immutable entries, the URI represents a fixed chunk of data. The URI itself 48 is derived from the data when it is uploaded into the grid, and can be used 49 to locate and download that data from the grid at some time in the future. 50 51 For mutable entries, the URI identifies a "slot" or "container", which can be 52 filled with different pieces of data at different times. 53 54 It is important to note that the "files" described by these URIs are just a 55 bunch of bytes, and that **no** filenames or other metadata is retained at 56 this layer. The vdrive layer (which sits above the grid layer) is entirely 57 responsible for directories and filenames and the like. 58 59 CHK URIs 60 -------- 61 62 CHK (Content Hash Keyed) files are immutable sequences of bytes. They are 63 uploaded in a distributed fashion using a "storage index" (for the "location" 64 property), and encrypted using a "read key". A secure hash of the data is 65 computed to help validate the data afterwards (providing the "identification" 66 property). All of these pieces, plus information about the file's size and 67 the number of shares into which it has been distributed, are put into the 68 "CHK" uri. The storage index is derived by hashing the read key (using a 69 tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be 70 physically present in the URI. 71 72 The current format for CHK URIs is the concatenation of the following 73 strings:: 74 75 URI:CHK:(key):(hash):(needed-shares):(total-shares):(size) 76 77 Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the 78 base32 encoding of the SHA-256 hash of the URI Extension Block, 79 (needed-shares) is an ascii decimal representation of the number of shares 80 required to reconstruct this file, (total-shares) is the same representation 81 of the total number of shares created, and (size) is an ascii decimal 82 representation of the size of the data represented by this URI. All base32 83 encodings are expressed in lower-case, with the trailing '=' signs removed. 84 85 For example, the following is a CHK URI, generated from the contents of the 86 architecture.txt document that lives next to this one in the source tree:: 87 88 URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733 89 90 Historical note: The name "CHK" is somewhat inaccurate and continues to be 91 used for historical reasons. "Content Hash Key" means that the encryption key 92 is derived by hashing the contents, which gives the useful property that 93 encoding the same file twice will result in the same URI. However, this is an 94 optional step: by passing a different flag to the appropriate API call, Tahoe 95 will generate a random encryption key instead of hashing the file: this gives 96 the useful property that the URI or storage index does not reveal anything 97 about the file's contents (except filesize), which improves privacy. The 98 URI:CHK: prefix really indicates that an immutable file is in use, without 99 saying anything about how the key was derived. 100 101 LIT URIs 102 -------- 103 104 LITeral files are also an immutable sequence of bytes, but they are so short 105 that the data is stored inside the URI itself. These are used for files of 55 106 bytes or shorter, which is the point at which the LIT URI is the same length 107 as a CHK URI would be. 108 109 LIT URIs do not require an upload or download phase, as their data is stored 110 directly in the URI. 111 112 The format of a LIT URI is simply a fixed prefix concatenated with the base32 113 encoding of the file's data:: 114 115 URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi 116 117 The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte 118 file that contains the string "hello" is "URI:LIT:nbswy3dp". 119 120 Mutable File URIs 121 ----------------- 122 123 The other kind of DHT entry is the "mutable slot", in which the URI names a 124 container to which data can be placed and retrieved without changing the 125 identity of the container. 126 127 These slots have write-caps (which allow read/write access), read-caps (which 128 only allow read-access), and verify-caps (which allow a file checker/repairer 129 to confirm that the contents exist, but does not let it decrypt the 130 contents). 131 132 Mutable slots use public key technology to provide data integrity, and put a 133 hash of the public key in the URI. As a result, the data validation is 134 limited to confirming that the data retrieved matches *some* data that was 135 uploaded in the past, but not _which_ version of that data. 136 137 The format of the write-cap for mutable files is:: 138 139 URI:SSK:(writekey):(fingerprint) 140 141 Where (writekey) is the base32 encoding of the 16-byte AES encryption key 142 that is used to encrypt the RSA private key, and (fingerprint) is the base32 143 encoded 32-byte SHA-256 hash of the RSA public key. For more details about 144 the way these keys are used, please see docs/mutable.txt . 145 146 The format for mutable read-caps is:: 147 148 URI:SSK-RO:(readkey):(fingerprint) 149 150 The read-cap is just like the write-cap except it contains the other AES 151 encryption key: the one used for encrypting the mutable file's contents. This 152 second key is derived by hashing the writekey, which allows the holder of a 153 write-cap to produce a read-cap, but not the other way around. The 154 fingerprint is the same in both caps. 155 156 Historical note: the "SSK" prefix is a perhaps-inaccurate reference to 157 "Sub-Space Keys" from the Freenet project, which uses a vaguely similar 158 structure to provide mutable file access. 159 160 Directory URIs 161 ============== 162 163 The grid layer provides a mapping from URI to data. To turn this into a graph 164 of directories and files, the "vdrive" layer (which sits on top of the grid 165 layer) needs to keep track of "directory nodes", or "dirnodes" for short. 166 docs/dirnodes.txt describes how these work. 167 168 Dirnodes are contained inside mutable files, and are thus simply a particular 169 way to interpret the contents of these files. As a result, a directory 170 write-cap looks a lot like a mutable-file write-cap:: 171 172 URI:DIR2:(writekey):(fingerprint) 173 174 Likewise directory read-caps (which provide read-only access to the 175 directory) look much like mutable-file read-caps:: 176 177 URI:DIR2-RO:(readkey):(fingerprint) 178 179 Historical note: the "DIR2" prefix is used because the non-distributed 180 dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix. 181 182 Internal Usage of URIs 183 ====================== 184 185 The classes in source:src/allmydata/uri.py are used to pack and unpack these 186 various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and 187 IDirnodeURI) which are implemented by these classes, and string-to-URI-class 188 conversion routines have been registered as adapters, so that code which 189 wants to extract e.g. the size of a CHK or LIT uri can do:: 190 191 print IFileURI(uri).get_size() 192 193 If the URI does not represent a CHK or LIT uri (for example, if it was for a 194 directory instead), the adaptation will fail, raising a TypeError inside the 195 IFileURI() call. 196 197 Several utility methods are provided on these objects. The most important is 198 ``to_string()``, which returns the string form of the URI. Therefore 199 ``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class 200 in source:src/allmydata/interfaces.py for more details. 201 -
deleted file docs/specifications/uri.txt
diff --git a/docs/specifications/uri.txt b/docs/specifications/uri.txt deleted file mode 100644 index 5599fa1..0000000
+ - 1 2 = Tahoe URIs =3 4 Each file and directory in a Tahoe filesystem is described by a "URI". There5 are different kinds of URIs for different kinds of objects, and there are6 different kinds of URIs to provide different kinds of access to those7 objects. Each URI is a string representation of a "capability" or "cap", and8 there are read-caps, write-caps, verify-caps, and others.9 10 Each URI provides both '''location''' and '''identification''' properties.11 '''location''' means that holding the URI is sufficient to locate the data it12 represents (this means it contains a storage index or a lookup key, whatever13 is necessary to find the place or places where the data is being kept).14 '''identification''' means that the URI also serves to validate the data: an15 attacker who wants to trick you into into using the wrong data will be16 limited in their abilities by the identification properties of the URI.17 18 Some URIs are subsets of others. In particular, if you know a URI which19 allows you to modify some object, you can produce a weaker read-only URI and20 give it to someone else, and they will be able to read that object but not21 modify it. Directories, for example, have a read-cap which is derived from22 the write-cap: anyone with read/write access to the directory can produce a23 limited URI that grants read-only access, but not the other way around.24 25 source:src/allmydata/uri.py is the main place where URIs are processed. It is26 the authoritative definition point for all the the URI types described27 herein.28 29 == File URIs ==30 31 The lowest layer of the Tahoe architecture (the "grid") is reponsible for32 mapping URIs to data. This is basically a distributed hash table, in which33 the URI is the key, and some sequence of bytes is the value.34 35 There are two kinds of entries in this table: immutable and mutable. For36 immutable entries, the URI represents a fixed chunk of data. The URI itself37 is derived from the data when it is uploaded into the grid, and can be used38 to locate and download that data from the grid at some time in the future.39 40 For mutable entries, the URI identifies a "slot" or "container", which can be41 filled with different pieces of data at different times.42 43 It is important to note that the "files" described by these URIs are just a44 bunch of bytes, and that __no__ filenames or other metadata is retained at45 this layer. The vdrive layer (which sits above the grid layer) is entirely46 responsible for directories and filenames and the like.47 48 === CHI URIs ===49 50 CHK (Content Hash Keyed) files are immutable sequences of bytes. They are51 uploaded in a distributed fashion using a "storage index" (for the "location"52 property), and encrypted using a "read key". A secure hash of the data is53 computed to help validate the data afterwards (providing the "identification"54 property). All of these pieces, plus information about the file's size and55 the number of shares into which it has been distributed, are put into the56 "CHK" uri. The storage index is derived by hashing the read key (using a57 tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be58 physically present in the URI.59 60 The current format for CHK URIs is the concatenation of the following61 strings:62 63 URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)64 65 Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the66 base32 encoding of the SHA-256 hash of the URI Extension Block,67 (needed-shares) is an ascii decimal representation of the number of shares68 required to reconstruct this file, (total-shares) is the same representation69 of the total number of shares created, and (size) is an ascii decimal70 representation of the size of the data represented by this URI. All base3271 encodings are expressed in lower-case, with the trailing '=' signs removed.72 73 For example, the following is a CHK URI, generated from the contents of the74 architecture.txt document that lives next to this one in the source tree:75 76 URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:2873377 78 Historical note: The name "CHK" is somewhat inaccurate and continues to be79 used for historical reasons. "Content Hash Key" means that the encryption key80 is derived by hashing the contents, which gives the useful property that81 encoding the same file twice will result in the same URI. However, this is an82 optional step: by passing a different flag to the appropriate API call, Tahoe83 will generate a random encryption key instead of hashing the file: this gives84 the useful property that the URI or storage index does not reveal anything85 about the file's contents (except filesize), which improves privacy. The86 URI:CHK: prefix really indicates that an immutable file is in use, without87 saying anything about how the key was derived.88 89 === LIT URIs ===90 91 LITeral files are also an immutable sequence of bytes, but they are so short92 that the data is stored inside the URI itself. These are used for files of 5593 bytes or shorter, which is the point at which the LIT URI is the same length94 as a CHK URI would be.95 96 LIT URIs do not require an upload or download phase, as their data is stored97 directly in the URI.98 99 The format of a LIT URI is simply a fixed prefix concatenated with the base32100 encoding of the file's data:101 102 URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi103 104 The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte105 file that contains the string "hello" is "URI:LIT:nbswy3dp".106 107 === Mutable File URIs ===108 109 The other kind of DHT entry is the "mutable slot", in which the URI names a110 container to which data can be placed and retrieved without changing the111 identity of the container.112 113 These slots have write-caps (which allow read/write access), read-caps (which114 only allow read-access), and verify-caps (which allow a file checker/repairer115 to confirm that the contents exist, but does not let it decrypt the116 contents).117 118 Mutable slots use public key technology to provide data integrity, and put a119 hash of the public key in the URI. As a result, the data validation is120 limited to confirming that the data retrieved matches _some_ data that was121 uploaded in the past, but not _which_ version of that data.122 123 The format of the write-cap for mutable files is:124 125 URI:SSK:(writekey):(fingerprint)126 127 Where (writekey) is the base32 encoding of the 16-byte AES encryption key128 that is used to encrypt the RSA private key, and (fingerprint) is the base32129 encoded 32-byte SHA-256 hash of the RSA public key. For more details about130 the way these keys are used, please see docs/mutable.txt .131 132 The format for mutable read-caps is:133 134 URI:SSK-RO:(readkey):(fingerprint)135 136 The read-cap is just like the write-cap except it contains the other AES137 encryption key: the one used for encrypting the mutable file's contents. This138 second key is derived by hashing the writekey, which allows the holder of a139 write-cap to produce a read-cap, but not the other way around. The140 fingerprint is the same in both caps.141 142 Historical note: the "SSK" prefix is a perhaps-inaccurate reference to143 "Sub-Space Keys" from the Freenet project, which uses a vaguely similar144 structure to provide mutable file access.145 146 == Directory URIs ==147 148 The grid layer provides a mapping from URI to data. To turn this into a graph149 of directories and files, the "vdrive" layer (which sits on top of the grid150 layer) needs to keep track of "directory nodes", or "dirnodes" for short.151 source:docs/dirnodes.txt describes how these work.152 153 Dirnodes are contained inside mutable files, and are thus simply a particular154 way to interpret the contents of these files. As a result, a directory155 write-cap looks a lot like a mutable-file write-cap:156 157 URI:DIR2:(writekey):(fingerprint)158 159 Likewise directory read-caps (which provide read-only access to the160 directory) look much like mutable-file read-caps:161 162 URI:DIR2-RO:(readkey):(fingerprint)163 164 Historical note: the "DIR2" prefix is used because the non-distributed165 dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.166 167 == Internal Usage of URIs ==168 169 The classes in source:src/allmydata/uri.py are used to pack and unpack these170 various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and171 IDirnodeURI) which are implemented by these classes, and string-to-URI-class172 conversion routines have been registered as adapters, so that code which173 wants to extract e.g. the size of a CHK or LIT uri can do:174 175 {{{176 print IFileURI(uri).get_size()177 }}}178 179 If the URI does not represent a CHK or LIT uri (for example, if it was for a180 directory instead), the adaptation will fail, raising a TypeError inside the181 IFileURI() call.182 183 Several utility methods are provided on these objects. The most important is184 {{{ to_string() }}}, which returns the string form of the URI. Therefore {{{185 IURI(uri).to_string == uri }}} is true for any valid URI. See the IURI class186 in source:src/allmydata/interfaces.py for more details.187