Ticket #1225: docs-txt-rst-conversion-ii.patch

File docs-txt-rst-conversion-ii.patch, 444.2 KB (added by p-static, at 2010-10-29T05:45:43Z)

conversion of specifications/ and frontends/

  • new file docs/frontends/CLI.rst

    diff --git a/docs/frontends/CLI.rst b/docs/frontends/CLI.rst
    new file mode 100644
    index 0000000..743b887
    - +  
     1======================
     2The Tahoe CLI commands
     3======================
     4
     51.  `Overview`_
     62.  `CLI Command Overview`_
     73.  `Node Management`_
     84.  `Filesystem Manipulation`_
     9
     10    1.  `Starting Directories`_
     11    2.  `Command Syntax Summary`_
     12    3.  `Command Examples`_
     13
     145.  `Storage Grid Maintenance`_
     156.  `Debugging`_
     16
     17
     18Overview
     19========
     20
     21Tahoe provides a single executable named "``tahoe``", which can be used to
     22create and manage client/server nodes, manipulate the filesystem, and perform
     23several debugging/maintenance tasks.
     24
     25This executable lives in the source tree at "``bin/tahoe``". Once you've done a
     26build (by running "make"), ``bin/tahoe`` can be run in-place: if it discovers
     27that it is being run from within a Tahoe source tree, it will modify sys.path
     28as necessary to use all the source code and dependent libraries contained in
     29that tree.
     30
     31If you've installed Tahoe (using "``make install``", or by installing a binary
     32package), then the tahoe executable will be available somewhere else, perhaps
     33in ``/usr/bin/tahoe``. In this case, it will use your platform's normal
     34PYTHONPATH search paths to find the tahoe code and other libraries.
     35
     36
     37CLI Command Overview
     38====================
     39
     40The "``tahoe``" tool provides access to three categories of commands.
     41
     42* node management: create a client/server node, start/stop/restart it
     43* filesystem manipulation: list files, upload, download, delete, rename
     44* debugging: unpack cap-strings, examine share files
     45
     46To get a list of all commands, just run "``tahoe``" with no additional
     47arguments. "``tahoe --help``" might also provide something useful.
     48
     49Running "``tahoe --version``" will display a list of version strings, starting
     50with the "allmydata" module (which contains the majority of the Tahoe
     51functionality) and including versions for a number of dependent libraries,
     52like Twisted, Foolscap, pycryptopp, and zfec.
     53
     54
     55Node Management
     56===============
     57
     58"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node command. It
     59creates a new directory and populates it with files that will allow the
     60"``tahoe start``" command to use it later on. This command creates nodes that
     61have client functionality (upload/download files), web API services
     62(controlled by the 'webport' file), and storage services (unless
     63"--no-storage" is specified).
     64
     65NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to
     66publishing a web server on port 3456 (limited to the loopback interface, at
     67127.0.0.1, to restrict access to other programs on the same host). All of the
     68other "``tahoe``" subcommands use corresponding defaults.
     69
     70"``tahoe create-client [NODEDIR]``" creates a node with no storage service.
     71That is, it behaves like "``tahoe create-node --no-storage [NODEDIR]``".
     72(This is a change from versions prior to 1.6.0.)
     73
     74"``tahoe create-introducer [NODEDIR]``" is used to create the Introducer node.
     75This node provides introduction services and nothing else. When started, this
     76node will produce an introducer.furl, which should be published to all
     77clients.
     78
     79"``tahoe create-key-generator [NODEDIR]``" is used to create a special
     80"key-generation" service, which allows a client to offload their RSA key
     81generation to a separate process. Since RSA key generation takes several
     82seconds, and must be done each time a directory is created, moving it to a
     83separate process allows the first process (perhaps a busy wapi server) to
     84continue servicing other requests. The key generator exports a FURL that can
     85be copied into a node to enable this functionality.
     86
     87"``tahoe run [NODEDIR]``" will start a previously-created node in the foreground.
     88
     89"``tahoe start [NODEDIR]``" will launch a previously-created node. It will launch
     90the node into the background, using the standard Twisted "twistd"
     91daemon-launching tool. On some platforms (including Windows) this command is
     92unable to run a daemon in the background; in that case it behaves in the
     93same way as "``tahoe run``".
     94
     95"``tahoe stop [NODEDIR]``" will shut down a running node.
     96
     97"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is
     98most often used by developers who have just modified the code and want to
     99start using their changes.
     100
     101
     102Filesystem Manipulation
     103=======================
     104
     105These commands let you exmaine a Tahoe filesystem, providing basic
     106list/upload/download/delete/rename/mkdir functionality. They can be used as
     107primitives by other scripts. Most of these commands are fairly thin wrappers
     108around wapi calls.
     109
     110By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure
     111out which Tahoe node they should use. When the CLI command uses wapi calls,
     112it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that
     113provides a wapi port will write its URL into this file. If you want to use
     114a node on some other host, just create ~/.tahoe/ and copy that node's wapi
     115URL into this file, and the CLI commands will contact that node instead of a
     116local one.
     117
     118These commands also use a table of "aliases" to figure out which directory
     119they ought to use a starting point. This is explained in more detail below.
     120
     121As of Tahoe v1.7, passing non-ASCII characters to the CLI should work,
     122except on Windows. The command-line arguments are assumed to use the
     123character encoding specified by the current locale.
     124
     125Starting Directories
     126--------------------
     127
     128As described in architecture.txt, the Tahoe distributed filesystem consists
     129of a collection of directories and files, each of which has a "read-cap" or a
     130"write-cap" (also known as a URI). Each directory is simply a table that maps
     131a name to a child file or directory, and this table is turned into a string
     132and stored in a mutable file. The whole set of directory and file "nodes" are
     133connected together into a directed graph.
     134
     135To use this collection of files and directories, you need to choose a
     136starting point: some specific directory that we will refer to as a
     137"starting directory".  For a given starting directory, the "``ls
     138[STARTING_DIR]:``" command would list the contents of this directory,
     139the "``ls [STARTING_DIR]:dir1``" command would look inside this directory
     140for a child named "dir1" and list its contents, "``ls
     141[STARTING_DIR]:dir1/subdir2``" would look two levels deep, etc.
     142
     143Note that there is no real global "root" directory, but instead each
     144starting directory provides a different, possibly overlapping
     145perspective on the graph of files and directories.
     146
     147Each tahoe node remembers a list of starting points, named "aliases",
     148in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8
     149encoded strings that stand in for a directory read- or write- cap. If
     150you use the command line "``ls``" without any "[STARTING_DIR]:" argument,
     151then it will use the default alias, which is "tahoe", therefore "``tahoe
     152ls``" has the same effect as "``tahoe ls tahoe:``".  The same goes for the
     153other commands which can reasonably use a default alias: get, put,
     154mkdir, mv, and rm.
     155
     156For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not
     157found in ~/.tahoe/private/aliases, the CLI will use the contents of
     158~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting
     159point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if
     160necessary. However, once you've set a "tahoe:" alias with "``tahoe set-alias``",
     161that will override anything in the old root_dir.cap file.
     162
     163The Tahoe CLI commands use the same filename syntax as scp and rsync
     164-- an optional "alias:" prefix, followed by the pathname or filename.
     165Some commands (like "tahoe cp") use the lack of an alias to mean that
     166you want to refer to a local file, instead of something from the tahoe
     167virtual filesystem. [TODO] Another way to indicate this is to start
     168the pathname with a dot, slash, or tilde.
     169
     170When you're dealing a single starting directory, the "tahoe:" alias is
     171all you need. But when you want to refer to something that isn't yet
     172attached to the graph rooted at that starting directory, you need to
     173refer to it by its capability. The way to do that is either to use its
     174capability directory as an argument on the command line, or to add an
     175alias to it, with the "tahoe add-alias" command. Once you've added an
     176alias, you can use that alias as an argument to commands.
     177
     178The best way to get started with Tahoe is to create a node, start it, then
     179use the following command to create a new directory and set it as your
     180"tahoe:" alias::
     181
     182 tahoe create-alias tahoe
     183
     184After that you can use "``tahoe ls tahoe:``" and
     185"``tahoe cp local.txt tahoe:``", and both will refer to the directory that
     186you've just created.
     187
     188SECURITY NOTE: For users of shared systems
     189``````````````````````````````````````````
     190
     191Another way to achieve the same effect as the above "tahoe create-alias"
     192command is::
     193
     194 tahoe add-alias tahoe `tahoe mkdir`
     195
     196However, command-line arguments are visible to other users (through the
     197'ps' command, or the Windows Process Explorer tool), so if you are using a
     198tahoe node on a shared host, your login neighbors will be able to see (and
     199capture) any directory caps that you set up with the "``tahoe add-alias``"
     200command.
     201
     202The "``tahoe create-alias``" command avoids this problem by creating a new
     203directory and putting the cap into your aliases file for you. Alternatively,
     204you can edit the NODEDIR/private/aliases file directly, by adding a line like
     205this::
     206
     207 fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
     208
     209By entering the dircap through the editor, the command-line arguments are
     210bypassed, and other users will not be able to see them. Once you've added the
     211alias, no other secrets are passed through the command line, so this
     212vulnerability becomes less significant: they can still see your filenames and
     213other arguments you type there, but not the caps that Tahoe uses to permit
     214access to your files and directories.
     215
     216
     217Command Syntax Summary
     218----------------------
     219
     220tahoe add-alias alias cap
     221
     222tahoe create-alias alias
     223
     224tahoe list-aliases
     225
     226tahoe mkdir
     227
     228tahoe mkdir [alias:]path
     229
     230tahoe ls [alias:][path]
     231
     232tahoe webopen [alias:][path]
     233
     234tahoe put [--mutable] [localfrom:-]
     235
     236tahoe put [--mutable] [localfrom:-] [alias:]to
     237
     238tahoe put [--mutable] [localfrom:-] [alias:]subdir/to
     239
     240tahoe put [--mutable] [localfrom:-] dircap:to
     241
     242tahoe put [--mutable] [localfrom:-] dircap:./subdir/to
     243
     244tahoe put [localfrom:-] mutable-file-writecap
     245
     246tahoe get [alias:]from [localto:-]
     247
     248tahoe cp [-r] [alias:]frompath [alias:]topath
     249
     250tahoe rm [alias:]what
     251
     252tahoe mv [alias:]from [alias:]to
     253
     254tahoe ln [alias:]from [alias:]to
     255
     256tahoe backup localfrom [alias:]to
     257
     258Command Examples
     259----------------
     260
     261``tahoe mkdir``
     262
     263 This creates a new empty unlinked directory, and prints its write-cap to
     264 stdout. The new directory is not attached to anything else.
     265
     266``tahoe add-alias fun DIRCAP``
     267
     268 An example would be::
     269
     270  tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
     271
     272 This creates an alias "fun:" and configures it to use the given directory
     273 cap. Once this is done, "tahoe ls fun:" will list the contents of this
     274 directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the
     275 default "tahoe:" alias.
     276
     277``tahoe create-alias fun``
     278
     279 This combines "``tahoe mkdir``" and "``tahoe add-alias``" into a single step.
     280
     281``tahoe list-aliases``
     282
     283 This displays a table of all configured aliases.
     284
     285``tahoe mkdir subdir``
     286
     287``tahoe mkdir /subdir``
     288
     289 This both create a new empty directory and attaches it to your root with the
     290 name "subdir".
     291
     292``tahoe ls``
     293
     294``tahoe ls /``
     295
     296``tahoe ls tahoe:``
     297
     298``tahoe ls tahoe:/``
     299
     300 All four list the root directory of your personal virtual filesystem.
     301
     302``tahoe ls subdir``
     303
     304 This lists a subdirectory of your filesystem.
     305
     306``tahoe webopen``
     307
     308``tahoe webopen tahoe:``
     309
     310``tahoe webopen tahoe:subdir/``
     311
     312``tahoe webopen subdir/``
     313
     314 This uses the python 'webbrowser' module to cause a local web browser to
     315 open to the web page for the given directory. This page offers interfaces to
     316 add, dowlonad, rename, and delete files in the directory. If not given an
     317 alias or path, opens "tahoe:", the root dir of the default alias.
     318
     319``tahoe put file.txt``
     320
     321``tahoe put ./file.txt``
     322
     323``tahoe put /tmp/file.txt``
     324
     325``tahoe put ~/file.txt``
     326
     327 These upload the local file into the grid, and prints the new read-cap to
     328 stdout. The uploaded file is not attached to any directory. All one-argument
     329 forms of "``tahoe put``" perform an unlinked upload.
     330
     331``tahoe put -``
     332
     333``tahoe put``
     334
     335 These also perform an unlinked upload, but the data to be uploaded is taken
     336 from stdin.
     337
     338``tahoe put file.txt uploaded.txt``
     339
     340``tahoe put file.txt tahoe:uploaded.txt``
     341
     342 These upload the local file and add it to your root with the name
     343 "uploaded.txt"
     344
     345``tahoe put file.txt subdir/foo.txt``
     346
     347``tahoe put - subdir/foo.txt``
     348
     349``tahoe put file.txt tahoe:subdir/foo.txt``
     350
     351``tahoe put file.txt DIRCAP:./foo.txt``
     352
     353``tahoe put file.txt DIRCAP:./subdir/foo.txt``
     354
     355 These upload the named file and attach them to a subdirectory of the given
     356 root directory, under the name "foo.txt". Note that to use a directory
     357 write-cap instead of an alias, you must use ":./" as a separator, rather
     358 than ":", to help the CLI parser figure out where the dircap ends. When the
     359 source file is named "-", the contents are taken from stdin.
     360
     361``tahoe put file.txt --mutable``
     362
     363 Create a new mutable file, fill it with the contents of file.txt, and print
     364 the new write-cap to stdout.
     365
     366``tahoe put file.txt MUTABLE-FILE-WRITECAP``
     367
     368 Replace the contents of the given mutable file with the contents of file.txt
     369 and prints the same write-cap to stdout.
     370
     371``tahoe cp file.txt tahoe:uploaded.txt``
     372
     373``tahoe cp file.txt tahoe:``
     374
     375``tahoe cp file.txt tahoe:/``
     376
     377``tahoe cp ./file.txt tahoe:``
     378
     379 These upload the local file and add it to your root with the name
     380 "uploaded.txt".
     381
     382``tahoe cp tahoe:uploaded.txt downloaded.txt``
     383
     384``tahoe cp tahoe:uploaded.txt ./downloaded.txt``
     385
     386``tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt``
     387
     388``tahoe cp tahoe:uploaded.txt ~/downloaded.txt``
     389
     390 This downloads the named file from your tahoe root, and puts the result on
     391 your local filesystem.
     392
     393``tahoe cp tahoe:uploaded.txt fun:stuff.txt``
     394
     395 This copies a file from your tahoe root to a different virtual directory,
     396 set up earlier with "tahoe add-alias fun DIRCAP".
     397
     398``tahoe rm uploaded.txt``
     399
     400``tahoe rm tahoe:uploaded.txt``
     401
     402 This deletes a file from your tahoe root.
     403
     404``tahoe mv uploaded.txt renamed.txt``
     405
     406``tahoe mv tahoe:uploaded.txt tahoe:renamed.txt``
     407
     408 These rename a file within your tahoe root directory.
     409
     410``tahoe mv uploaded.txt fun:``
     411
     412``tahoe mv tahoe:uploaded.txt fun:``
     413
     414``tahoe mv tahoe:uploaded.txt fun:uploaded.txt``
     415
     416 These move a file from your tahoe root directory to the virtual directory
     417 set up earlier with "tahoe add-alias fun DIRCAP"
     418
     419``tahoe backup ~ work:backups``
     420
     421 This command performs a full versioned backup of every file and directory
     422 underneath your "~" home directory, placing an immutable timestamped
     423 snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the
     424 timestamp is in UTC, hence the "Z" suffix), and a link to the latest
     425 snapshot in work:backups/Latest/ . This command uses a small SQLite database
     426 known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to
     427 remember which local files have been backed up already, and will avoid
     428 uploading files that have already been backed up. It compares timestamps and
     429 filesizes when making this comparison. It also re-uses existing directories
     430 which have identical contents. This lets it run faster and reduces the
     431 number of directories created.
     432
     433 If you reconfigure your client node to switch to a different grid, you
     434 should delete the stale backupdb.sqlite file, to force "tahoe backup" to
     435 upload all files to the new grid.
     436
     437``tahoe backup --exclude=*~ ~ work:backups``
     438
     439 Same as above, but this time the backup process will ignore any
     440 filename that will end with '~'. '--exclude' will accept any standard
     441 unix shell-style wildcards, have a look at
     442 http://docs.python.org/library/fnmatch.html for a more detailed
     443 reference.  You may give multiple '--exclude' options.  Please pay
     444 attention that the pattern will be matched against any level of the
     445 directory tree, it's still impossible to specify absolute path exclusions.
     446
     447``tahoe backup --exclude-from=/path/to/filename ~ work:backups``
     448
     449 '--exclude-from' is similar to '--exclude', but reads exclusion
     450 patterns from '/path/to/filename', one per line.
     451
     452``tahoe backup --exclude-vcs ~ work:backups``
     453
     454 This command will ignore any known file or directory that's used by
     455 version control systems to store metadata. The excluded names are:
     456
     457  * CVS
     458  * RCS
     459  * SCCS
     460  * .git
     461  * .gitignore
     462  * .cvsignore
     463  * .svn
     464  * .arch-ids
     465  * {arch}
     466  * =RELEASE-ID
     467  * =meta-update
     468  * =update
     469  * .bzr
     470  * .bzrignore
     471  * .bzrtags
     472  * .hg
     473  * .hgignore
     474  * _darcs
     475
     476Storage Grid Maintenance
     477========================
     478
     479``tahoe manifest tahoe:``
     480
     481``tahoe manifest --storage-index tahoe:``
     482
     483``tahoe manifest --verify-cap tahoe:``
     484
     485``tahoe manifest --repair-cap tahoe:``
     486
     487``tahoe manifest --raw tahoe:``
     488
     489 This performs a recursive walk of the given directory, visiting every file
     490 and directory that can be reached from that point. It then emits one line to
     491 stdout for each object it encounters.
     492
     493 The default behavior is to print the access cap string (like URI:CHK:.. or
     494 URI:DIR2:..), followed by a space, followed by the full path name.
     495
     496 If --storage-index is added, each line will instead contain the object's
     497 storage index. This (string) value is useful to determine which share files
     498 (on the server) are associated with this directory tree. The --verify-cap
     499 and --repair-cap options are similar, but emit a verify-cap and repair-cap,
     500 respectively. If --raw is provided instead, the output will be a
     501 JSON-encoded dictionary that includes keys for pathnames, storage index
     502 strings, and cap strings. The last line of the --raw output will be a JSON
     503 encoded deep-stats dictionary.
     504
     505``tahoe stats tahoe:``
     506
     507 This performs a recursive walk of the given directory, visiting every file
     508 and directory that can be reached from that point. It gathers statistics on
     509 the sizes of the objects it encounters, and prints a summary to stdout.
     510
     511
     512Debugging
     513=========
     514
     515For a list of all debugging commands, use "tahoe debug".
     516
     517"``tahoe debug find-shares STORAGEINDEX NODEDIRS..``" will look through one or
     518more storage nodes for the share files that are providing storage for the
     519given storage index.
     520
     521"``tahoe debug catalog-shares NODEDIRS..``" will look through one or more
     522storage nodes and locate every single share they contain. It produces a report
     523on stdout with one line per share, describing what kind of share it is, the
     524storage index, the size of the file is used for, etc. It may be useful to
     525concatenate these reports from all storage hosts and use it to look for
     526anomalies.
     527
     528"``tahoe debug dump-share SHAREFILE``" will take the name of a single share file
     529(as found by "tahoe find-shares") and print a summary of its contents to
     530stdout. This includes a list of leases, summaries of the hash tree, and
     531information from the UEB (URI Extension Block). For mutable file shares, it
     532will describe which version (seqnum and root-hash) is being stored in this
     533share.
     534
     535"``tahoe debug dump-cap CAP``" will take a URI (a file read-cap, or a directory
     536read- or write- cap) and unpack it into separate pieces. The most useful
     537aspect of this command is to reveal the storage index for any given URI. This
     538can be used to locate the share files that are holding the encoded+encrypted
     539data for this file.
     540
     541"``tahoe debug repl``" will launch an interactive python interpreter in which
     542the Tahoe packages and modules are available on sys.path (e.g. by using 'import
     543allmydata'). This is most useful from a source tree: it simply sets the
     544PYTHONPATH correctly and runs the 'python' executable.
     545
     546"``tahoe debug corrupt-share SHAREFILE``" will flip a bit in the given
     547sharefile. This can be used to test the client-side verification/repair code.
     548Obviously, this command should not be used during normal operation.
  • deleted file docs/frontends/CLI.txt

    diff --git a/docs/frontends/CLI.txt b/docs/frontends/CLI.txt
    deleted file mode 100644
    index d613a38..0000000
    + -  
    1 = The Tahoe CLI commands =
    2 
    3 1.  Overview
    4 2.  CLI Command Overview
    5 3.  Node Management
    6 4.  Virtual Drive Manipulation
    7   4.1.  Starting Directories
    8     4.1.1.  SECURITY NOTE: For users of shared systems
    9   4.2.  Command Syntax Summary
    10   4.3.  Command Examples
    11 5.  Virtual Drive Maintenance
    12 6.  Debugging
    13 
    14 == Overview ==
    15 
    16 Tahoe provides a single executable named "tahoe", which can be used to create
    17 and manage client/server nodes, manipulate the filesystem, and perform
    18 several debugging/maintenance tasks.
    19 
    20 This executable lives in the source tree at "bin/tahoe". Once you've done a
    21 build (by running "make"), bin/tahoe can be run in-place: if it discovers
    22 that it is being run from within a Tahoe source tree, it will modify sys.path
    23 as necessary to use all the source code and dependent libraries contained in
    24 that tree.
    25 
    26 If you've installed Tahoe (using "make install", or by installing a binary
    27 package), then the tahoe executable will be available somewhere else, perhaps
    28 in /usr/bin/tahoe . In this case, it will use your platform's normal
    29 PYTHONPATH search paths to find the tahoe code and other libraries.
    30 
    31 
    32 == CLI Command Overview ==
    33 
    34 The "tahoe" tool provides access to three categories of commands.
    35 
    36  * node management: create a client/server node, start/stop/restart it
    37  * filesystem manipulation: list files, upload, download, delete, rename
    38  * debugging: unpack cap-strings, examine share files
    39 
    40 To get a list of all commands, just run "tahoe" with no additional arguments.
    41 "tahoe --help" might also provide something useful.
    42 
    43 Running "tahoe --version" will display a list of version strings, starting
    44 with the "allmydata" module (which contains the majority of the Tahoe
    45 functionality) and including versions for a number of dependent libraries,
    46 like Twisted, Foolscap, pycryptopp, and zfec.
    47 
    48 
    49 == Node Management ==
    50 
    51 "tahoe create-node [NODEDIR]" is the basic make-a-new-node command. It
    52 creates a new directory and populates it with files that will allow the
    53 "tahoe start" command to use it later on. This command creates nodes that
    54 have client functionality (upload/download files), web API services
    55 (controlled by the 'webport' file), and storage services (unless
    56 "--no-storage" is specified).
    57 
    58 NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to
    59 publishing a web server on port 3456 (limited to the loopback interface, at
    60 127.0.0.1, to restrict access to other programs on the same host). All of the
    61 other "tahoe" subcommands use corresponding defaults.
    62 
    63 "tahoe create-client [NODEDIR]" creates a node with no storage service.
    64 That is, it behaves like "tahoe create-node --no-storage [NODEDIR]".
    65 (This is a change from versions prior to 1.6.0.)
    66 
    67 "tahoe create-introducer [NODEDIR]" is used to create the Introducer node.
    68 This node provides introduction services and nothing else. When started, this
    69 node will produce an introducer.furl, which should be published to all
    70 clients.
    71 
    72 "tahoe create-key-generator [NODEDIR]" is used to create a special
    73 "key-generation" service, which allows a client to offload their RSA key
    74 generation to a separate process. Since RSA key generation takes several
    75 seconds, and must be done each time a directory is created, moving it to a
    76 separate process allows the first process (perhaps a busy wapi server) to
    77 continue servicing other requests. The key generator exports a FURL that can
    78 be copied into a node to enable this functionality.
    79 
    80 "tahoe run [NODEDIR]" will start a previously-created node in the foreground.
    81 
    82 "tahoe start [NODEDIR]" will launch a previously-created node. It will launch
    83 the node into the background, using the standard Twisted "twistd"
    84 daemon-launching tool. On some platforms (including Windows) this command is
    85 unable to run a daemon in the background; in that case it behaves in the
    86 same way as "tahoe run".
    87 
    88 "tahoe stop [NODEDIR]" will shut down a running node.
    89 
    90 "tahoe restart [NODEDIR]" will stop and then restart a running node. This is
    91 most often used by developers who have just modified the code and want to
    92 start using their changes.
    93 
    94 
    95 == Filesystem Manipulation ==
    96 
    97 These commands let you exmaine a Tahoe filesystem, providing basic
    98 list/upload/download/delete/rename/mkdir functionality. They can be used as
    99 primitives by other scripts. Most of these commands are fairly thin wrappers
    100 around wapi calls.
    101 
    102 By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure
    103 out which Tahoe node they should use. When the CLI command uses wapi calls,
    104 it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that
    105 provides a wapi port will write its URL into this file. If you want to use
    106 a node on some other host, just create ~/.tahoe/ and copy that node's wapi
    107 URL into this file, and the CLI commands will contact that node instead of a
    108 local one.
    109 
    110 These commands also use a table of "aliases" to figure out which directory
    111 they ought to use a starting point. This is explained in more detail below.
    112 
    113 As of Tahoe v1.7, passing non-ASCII characters to the CLI should work,
    114 except on Windows. The command-line arguments are assumed to use the
    115 character encoding specified by the current locale.
    116 
    117 === Starting Directories ===
    118 
    119 As described in architecture.txt, the Tahoe distributed filesystem consists
    120 of a collection of directories and files, each of which has a "read-cap" or a
    121 "write-cap" (also known as a URI). Each directory is simply a table that maps
    122 a name to a child file or directory, and this table is turned into a string
    123 and stored in a mutable file. The whole set of directory and file "nodes" are
    124 connected together into a directed graph.
    125 
    126 To use this collection of files and directories, you need to choose a
    127 starting point: some specific directory that we will refer to as a
    128 "starting directory".  For a given starting directory, the "ls
    129 [STARTING_DIR]:" command would list the contents of this directory,
    130 the "ls [STARTING_DIR]:dir1" command would look inside this directory
    131 for a child named "dir1" and list its contents, "ls
    132 [STARTING_DIR]:dir1/subdir2" would look two levels deep, etc.
    133 
    134 Note that there is no real global "root" directory, but instead each
    135 starting directory provides a different, possibly overlapping
    136 perspective on the graph of files and directories.
    137 
    138 Each tahoe node remembers a list of starting points, named "aliases",
    139 in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8
    140 encoded strings that stand in for a directory read- or write- cap. If
    141 you use the command line "ls" without any "[STARTING_DIR]:" argument,
    142 then it will use the default alias, which is "tahoe", therefore "tahoe
    143 ls" has the same effect as "tahoe ls tahoe:".  The same goes for the
    144 other commands which can reasonably use a default alias: get, put,
    145 mkdir, mv, and rm.
    146 
    147 For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not
    148 found in ~/.tahoe/private/aliases, the CLI will use the contents of
    149 ~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting
    150 point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if
    151 necessary. However, once you've set a "tahoe:" alias with "tahoe set-alias",
    152 that will override anything in the old root_dir.cap file.
    153 
    154 The Tahoe CLI commands use the same filename syntax as scp and rsync
    155 -- an optional "alias:" prefix, followed by the pathname or filename.
    156 Some commands (like "tahoe cp") use the lack of an alias to mean that
    157 you want to refer to a local file, instead of something from the tahoe
    158 virtual filesystem. [TODO] Another way to indicate this is to start
    159 the pathname with a dot, slash, or tilde.
    160 
    161 When you're dealing a single starting directory, the "tahoe:" alias is
    162 all you need. But when you want to refer to something that isn't yet
    163 attached to the graph rooted at that starting directory, you need to
    164 refer to it by its capability. The way to do that is either to use its
    165 capability directory as an argument on the command line, or to add an
    166 alias to it, with the "tahoe add-alias" command. Once you've added an
    167 alias, you can use that alias as an argument to commands.
    168 
    169 The best way to get started with Tahoe is to create a node, start it, then
    170 use the following command to create a new directory and set it as your
    171 "tahoe:" alias:
    172 
    173  tahoe create-alias tahoe
    174 
    175 After that you can use "tahoe ls tahoe:" and "tahoe cp local.txt tahoe:",
    176 and both will refer to the directory that you've just created.
    177 
    178 ==== SECURITY NOTE: For users of shared systems ====
    179 
    180 Another way to achieve the same effect as the above "tahoe create-alias"
    181 command is:
    182 
    183  tahoe add-alias tahoe `tahoe mkdir`
    184 
    185 However, command-line arguments are visible to other users (through the
    186 'ps' command, or the Windows Process Explorer tool), so if you are using a
    187 tahoe node on a shared host, your login neighbors will be able to see (and
    188 capture) any directory caps that you set up with the "tahoe add-alias"
    189 command.
    190 
    191 The "tahoe create-alias" command avoids this problem by creating a new
    192 directory and putting the cap into your aliases file for you. Alternatively,
    193 you can edit the NODEDIR/private/aliases file directly, by adding a line like
    194 this:
    195 
    196  fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
    197 
    198 By entering the dircap through the editor, the command-line arguments are
    199 bypassed, and other users will not be able to see them. Once you've added the
    200 alias, no other secrets are passed through the command line, so this
    201 vulnerability becomes less significant: they can still see your filenames and
    202 other arguments you type there, but not the caps that Tahoe uses to permit
    203 access to your files and directories.
    204 
    205 
    206 === Command Syntax Summary ===
    207 
    208 tahoe add-alias alias cap
    209 tahoe create-alias alias
    210 tahoe list-aliases
    211 tahoe mkdir
    212 tahoe mkdir [alias:]path
    213 tahoe ls [alias:][path]
    214 tahoe webopen [alias:][path]
    215 tahoe put [--mutable] [localfrom:-]
    216 tahoe put [--mutable] [localfrom:-] [alias:]to
    217 tahoe put [--mutable] [localfrom:-] [alias:]subdir/to
    218 tahoe put [--mutable] [localfrom:-] dircap:to
    219 tahoe put [--mutable] [localfrom:-] dircap:./subdir/to
    220 tahoe put [localfrom:-] mutable-file-writecap
    221 tahoe get [alias:]from [localto:-]
    222 tahoe cp [-r] [alias:]frompath [alias:]topath
    223 tahoe rm [alias:]what
    224 tahoe mv [alias:]from [alias:]to
    225 tahoe ln [alias:]from [alias:]to
    226 tahoe backup localfrom [alias:]to
    227 
    228 === Command Examples ===
    229 
    230 tahoe mkdir
    231 
    232  This creates a new empty unlinked directory, and prints its write-cap to
    233  stdout. The new directory is not attached to anything else.
    234 
    235 tahoe add-alias fun DIRCAP
    236 
    237  An example would be:
    238 
    239   tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
    240 
    241  This creates an alias "fun:" and configures it to use the given directory
    242  cap. Once this is done, "tahoe ls fun:" will list the contents of this
    243  directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the
    244  default "tahoe:" alias.
    245 
    246 tahoe create-alias fun
    247 
    248  This combines 'tahoe mkdir' and 'tahoe add-alias' into a single step.
    249 
    250 tahoe list-aliases
    251 
    252  This displays a table of all configured aliases.
    253 
    254 tahoe mkdir subdir
    255 tahoe mkdir /subdir
    256 
    257  This both create a new empty directory and attaches it to your root with the
    258  name "subdir".
    259 
    260 tahoe ls
    261 tahoe ls /
    262 tahoe ls tahoe:
    263 tahoe ls tahoe:/
    264 
    265  All four list the root directory of your personal virtual filesystem.
    266 
    267 tahoe ls subdir
    268 
    269  This lists a subdirectory of your filesystem.
    270 
    271 tahoe webopen
    272 tahoe webopen tahoe:
    273 tahoe webopen tahoe:subdir/
    274 tahoe webopen subdir/
    275 
    276  This uses the python 'webbrowser' module to cause a local web browser to
    277  open to the web page for the given directory. This page offers interfaces to
    278  add, dowlonad, rename, and delete files in the directory. If not given an
    279  alias or path, opens "tahoe:", the root dir of the default alias.
    280 
    281 tahoe put file.txt
    282 tahoe put ./file.txt
    283 tahoe put /tmp/file.txt
    284 tahoe put ~/file.txt
    285 
    286  These upload the local file into the grid, and prints the new read-cap to
    287  stdout. The uploaded file is not attached to any directory. All one-argument
    288  forms of "tahoe put" perform an unlinked upload.
    289 
    290 tahoe put -
    291 tahoe put
    292 
    293  These also perform an unlinked upload, but the data to be uploaded is taken
    294  from stdin.
    295 
    296 tahoe put file.txt uploaded.txt
    297 tahoe put file.txt tahoe:uploaded.txt
    298 
    299  These upload the local file and add it to your root with the name
    300  "uploaded.txt"
    301 
    302 tahoe put file.txt subdir/foo.txt
    303 tahoe put - subdir/foo.txt
    304 tahoe put file.txt tahoe:subdir/foo.txt
    305 tahoe put file.txt DIRCAP:./foo.txt
    306 tahoe put file.txt DIRCAP:./subdir/foo.txt
    307 
    308  These upload the named file and attach them to a subdirectory of the given
    309  root directory, under the name "foo.txt". Note that to use a directory
    310  write-cap instead of an alias, you must use ":./" as a separator, rather
    311  than ":", to help the CLI parser figure out where the dircap ends. When the
    312  source file is named "-", the contents are taken from stdin.
    313 
    314 tahoe put file.txt --mutable
    315 
    316  Create a new mutable file, fill it with the contents of file.txt, and print
    317  the new write-cap to stdout.
    318 
    319 tahoe put file.txt MUTABLE-FILE-WRITECAP
    320 
    321  Replace the contents of the given mutable file with the contents of file.txt
    322  and prints the same write-cap to stdout.
    323 
    324 tahoe cp file.txt tahoe:uploaded.txt
    325 tahoe cp file.txt tahoe:
    326 tahoe cp file.txt tahoe:/
    327 tahoe cp ./file.txt tahoe:
    328 
    329  These upload the local file and add it to your root with the name
    330  "uploaded.txt".
    331 
    332 tahoe cp tahoe:uploaded.txt downloaded.txt
    333 tahoe cp tahoe:uploaded.txt ./downloaded.txt
    334 tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt
    335 tahoe cp tahoe:uploaded.txt ~/downloaded.txt
    336 
    337  This downloads the named file from your tahoe root, and puts the result on
    338  your local filesystem.
    339 
    340 tahoe cp tahoe:uploaded.txt fun:stuff.txt
    341 
    342  This copies a file from your tahoe root to a different virtual directory,
    343  set up earlier with "tahoe add-alias fun DIRCAP".
    344 
    345 tahoe rm uploaded.txt
    346 tahoe rm tahoe:uploaded.txt
    347 
    348  This deletes a file from your tahoe root.
    349 
    350 tahoe mv uploaded.txt renamed.txt
    351 tahoe mv tahoe:uploaded.txt tahoe:renamed.txt
    352 
    353  These rename a file within your tahoe root directory.
    354 
    355 tahoe mv uploaded.txt fun:
    356 tahoe mv tahoe:uploaded.txt fun:
    357 tahoe mv tahoe:uploaded.txt fun:uploaded.txt
    358 
    359  These move a file from your tahoe root directory to the virtual directory
    360  set up earlier with "tahoe add-alias fun DIRCAP"
    361 
    362 tahoe backup ~ work:backups
    363 
    364  This command performs a full versioned backup of every file and directory
    365  underneath your "~" home directory, placing an immutable timestamped
    366  snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the
    367  timestamp is in UTC, hence the "Z" suffix), and a link to the latest
    368  snapshot in work:backups/Latest/ . This command uses a small SQLite database
    369  known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to
    370  remember which local files have been backed up already, and will avoid
    371  uploading files that have already been backed up. It compares timestamps and
    372  filesizes when making this comparison. It also re-uses existing directories
    373  which have identical contents. This lets it run faster and reduces the
    374  number of directories created.
    375 
    376  If you reconfigure your client node to switch to a different grid, you
    377  should delete the stale backupdb.sqlite file, to force "tahoe backup" to
    378  upload all files to the new grid.
    379 
    380 tahoe backup --exclude=*~ ~ work:backups
    381 
    382  Same as above, but this time the backup process will ignore any
    383  filename that will end with '~'. '--exclude' will accept any standard
    384  unix shell-style wildcards, have a look at
    385  http://docs.python.org/library/fnmatch.html for a more detailed
    386  reference.  You may give multiple '--exclude' options.  Please pay
    387  attention that the pattern will be matched against any level of the
    388  directory tree, it's still impossible to specify absolute path exclusions.
    389 
    390 tahoe backup --exclude-from=/path/to/filename ~ work:backups
    391 
    392  '--exclude-from' is similar to '--exclude', but reads exclusion
    393  patterns from '/path/to/filename', one per line.
    394 
    395 tahoe backup --exclude-vcs ~ work:backups
    396 
    397  This command will ignore any known file or directory that's used by
    398  version control systems to store metadata. The list of the exluded
    399  names is:
    400 
    401   * CVS
    402   * RCS
    403   * SCCS
    404   * .git
    405   * .gitignore
    406   * .cvsignore
    407   * .svn
    408   * .arch-ids
    409   * {arch}
    410   * =RELEASE-ID
    411   * =meta-update
    412   * =update
    413   * .bzr
    414   * .bzrignore
    415   * .bzrtags
    416   * .hg
    417   * .hgignore
    418   * _darcs
    419 
    420 == Storage Grid Maintenance ==
    421 
    422 tahoe manifest tahoe:
    423 tahoe manifest --storage-index tahoe:
    424 tahoe manifest --verify-cap tahoe:
    425 tahoe manifest --repair-cap tahoe:
    426 tahoe manifest --raw tahoe:
    427 
    428  This performs a recursive walk of the given directory, visiting every file
    429  and directory that can be reached from that point. It then emits one line to
    430  stdout for each object it encounters.
    431 
    432  The default behavior is to print the access cap string (like URI:CHK:.. or
    433  URI:DIR2:..), followed by a space, followed by the full path name.
    434 
    435  If --storage-index is added, each line will instead contain the object's
    436  storage index. This (string) value is useful to determine which share files
    437  (on the server) are associated with this directory tree. The --verify-cap
    438  and --repair-cap options are similar, but emit a verify-cap and repair-cap,
    439  respectively. If --raw is provided instead, the output will be a
    440  JSON-encoded dictionary that includes keys for pathnames, storage index
    441  strings, and cap strings. The last line of the --raw output will be a JSON
    442  encoded deep-stats dictionary.
    443 
    444 tahoe stats tahoe:
    445 
    446  This performs a recursive walk of the given directory, visiting every file
    447  and directory that can be reached from that point. It gathers statistics on
    448  the sizes of the objects it encounters, and prints a summary to stdout.
    449 
    450 
    451 == Debugging ==
    452 
    453 For a list of all debugging commands, use "tahoe debug".
    454 
    455 "tahoe debug find-shares STORAGEINDEX NODEDIRS.." will look through one or
    456 more storage nodes for the share files that are providing storage for the
    457 given storage index.
    458 
    459 "tahoe debug catalog-shares NODEDIRS.." will look through one or more storage
    460 nodes and locate every single share they contain. It produces a report on
    461 stdout with one line per share, describing what kind of share it is, the
    462 storage index, the size of the file is used for, etc. It may be useful to
    463 concatenate these reports from all storage hosts and use it to look for
    464 anomalies.
    465 
    466 "tahoe debug dump-share SHAREFILE" will take the name of a single share file
    467 (as found by "tahoe find-shares") and print a summary of its contents to
    468 stdout. This includes a list of leases, summaries of the hash tree, and
    469 information from the UEB (URI Extension Block). For mutable file shares, it
    470 will describe which version (seqnum and root-hash) is being stored in this
    471 share.
    472 
    473 "tahoe debug dump-cap CAP" will take a URI (a file read-cap, or a directory
    474 read- or write- cap) and unpack it into separate pieces. The most useful
    475 aspect of this command is to reveal the storage index for any given URI. This
    476 can be used to locate the share files that are holding the encoded+encrypted
    477 data for this file.
    478 
    479 "tahoe debug repl" will launch an interactive python interpreter in which the
    480 Tahoe packages and modules are available on sys.path (e.g. by using 'import
    481 allmydata'). This is most useful from a source tree: it simply sets the
    482 PYTHONPATH correctly and runs the 'python' executable.
    483 
    484 "tahoe debug corrupt-share SHAREFILE" will flip a bit in the given sharefile.
    485 This can be used to test the client-side verification/repair code. Obviously
    486 this command should not be used during normal operation.
  • new file docs/frontends/FTP-and-SFTP.rst

    diff --git a/docs/frontends/FTP-and-SFTP.rst b/docs/frontends/FTP-and-SFTP.rst
    new file mode 100644
    index 0000000..230dca3
    - +  
     1=================================
     2Tahoe-LAFS FTP and SFTP Frontends
     3=================================
     4
     51.  `FTP/SFTP Background`_
     62.  `Tahoe-LAFS Support`_
     73.  `Creating an Account File`_
     84.  `Configuring FTP Access`_
     95.  `Configuring SFTP Access`_
     106.  `Dependencies`_
     117.  `Immutable and mutable files`_
     128.  `Known Issues`_
     13
     14
     15FTP/SFTP Background
     16===================
     17
     18FTP is the venerable internet file-transfer protocol, first developed in
     191971. The FTP server usually listens on port 21. A separate connection is
     20used for the actual data transfers, either in the same direction as the
     21initial client-to-server connection (for PORT mode), or in the reverse
     22direction (for PASV) mode. Connections are unencrypted, so passwords, file
     23names, and file contents are visible to eavesdroppers.
     24
     25SFTP is the modern replacement, developed as part of the SSH "secure shell"
     26protocol, and runs as a subchannel of the regular SSH connection. The SSH
     27server usually listens on port 22. All connections are encrypted.
     28
     29Both FTP and SFTP were developed assuming a UNIX-like server, with accounts
     30and passwords, octal file modes (user/group/other, read/write/execute), and
     31ctime/mtime timestamps.
     32
     33Tahoe-LAFS Support
     34==================
     35
     36All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP
     37clients (like /usr/bin/ftp, ncftp, and countless others) to access the
     38virtual filesystem. They can also run an SFTP server, so SFTP clients (like
     39/usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
     40sit at the same level as the webapi interface.
     41
     42Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
     43must be configured with a way to first authenticate a user (confirm that a
     44prospective client has a legitimate claim to whatever authorities we might
     45grant a particular user), and second to decide what root directory cap should
     46be granted to the authenticated username. A username and password is used
     47for this purpose. (The SFTP protocol is also capable of using client
     48RSA or DSA public keys, but this is not currently implemented.)
     49
     50Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The
     51first is a simple flat file with one account per line. The second is an
     52HTTP-based login mechanism, backed by simple PHP script and a database. The
     53latter form is used by allmydata.com to provide secure access to customer
     54rootcaps.
     55
     56Creating an Account File
     57========================
     58
     59To use the first form, create a file (probably in
     60BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
     61space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so::
     62
     63 % cat BASEDIR/private/ftp.accounts
     64 # This is a password line, (username, password, rootcap)
     65 alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
     66 bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
     67
     68Future versions of Tahoe-LAFS may support using client public keys for SFTP.
     69The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify
     70the public key format, so users cannot have a password equal to either of
     71these strings.
     72
     73Now add an 'accounts.file' directive to your tahoe.cfg file, as described
     74in the next sections.
     75
     76Configuring FTP Access
     77======================
     78
     79To enable the FTP server with an accounts file, add the following lines to
     80the BASEDIR/tahoe.cfg file::
     81
     82 [ftpd]
     83 enabled = true
     84 port = tcp:8021:interface=127.0.0.1
     85 accounts.file = private/ftp.accounts
     86
     87The FTP server will listen on the given port number and on the loopback
     88interface only. The "accounts.file" pathname will be interpreted
     89relative to the node's BASEDIR.
     90
     91To enable the FTP server with an account server instead, provide the URL of
     92that server in an "accounts.url" directive::
     93
     94 [ftpd]
     95 enabled = true
     96 port = tcp:8021:interface=127.0.0.1
     97 accounts.url = https://example.com/login
     98
     99You can provide both accounts.file and accounts.url, although it probably
     100isn't very useful except for testing.
     101
     102FTP provides no security, and so your password or caps could be eavesdropped
     103if you connect to the FTP server remotely. The examples above include
     104":interface=127.0.0.1" in the "port" option, which causes the server to only
     105accept connections from localhost.
     106
     107Configuring SFTP Access
     108=======================
     109
     110The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH
     111server. It is important to give each server a distinct keypair, to prevent
     112one server from masquerading as different one. The first time a client
     113program talks to a given server, it will store the host key it receives, and
     114will complain if a subsequent connection uses a different key. This reduces
     115the opportunity for man-in-the-middle attacks to just the first connection.
     116
     117Exercise caution when connecting to the SFTP server remotely. The AES
     118implementation used by the SFTP code does not have defenses against timing
     119attacks. The code for encrypting the SFTP connection was not written by the
     120Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed
     121the code for encrypting files and directories in Tahoe-LAFS itself. If you
     122can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway)
     123only from a client on the same host, then you would be safe from any problem
     124with the SFTP connection security. The examples given below enforce this
     125policy by including ":interface=127.0.0.1" in the "port" option, which
     126causes the server to only accept connections from localhost.
     127
     128You will use directives in the tahoe.cfg file to tell the SFTP code where to
     129find these keys. To create one, use the ``ssh-keygen`` tool (which comes with
     130the standard openssh client distribution)::
     131
     132 % cd BASEDIR
     133 % ssh-keygen -f private/ssh_host_rsa_key
     134
     135The server private key file must not have a passphrase.
     136
     137Then, to enable the SFTP server with an accounts file, add the following
     138lines to the BASEDIR/tahoe.cfg file::
     139
     140 [sftpd]
     141 enabled = true
     142 port = tcp:8022:interface=127.0.0.1
     143 host_pubkey_file = private/ssh_host_rsa_key.pub
     144 host_privkey_file = private/ssh_host_rsa_key
     145 accounts.file = private/ftp.accounts
     146
     147The SFTP server will listen on the given port number and on the loopback
     148interface only. The "accounts.file" pathname will be interpreted
     149relative to the node's BASEDIR.
     150
     151Or, to use an account server instead, do this::
     152
     153 [sftpd]
     154 enabled = true
     155 port = tcp:8022:interface=127.0.0.1
     156 host_pubkey_file = private/ssh_host_rsa_key.pub
     157 host_privkey_file = private/ssh_host_rsa_key
     158 accounts.url = https://example.com/login
     159
     160You can provide both accounts.file and accounts.url, although it probably
     161isn't very useful except for testing.
     162
     163For further information on SFTP compatibility and known issues with various
     164clients and with the sshfs filesystem, see
     165http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend .
     166
     167Dependencies
     168============
     169
     170The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is
     171a twisted shell, get it?). Many Linux distributions package the Conch code
     172separately: debian puts it in the "python-twisted-conch" package. Conch
     173requires the "pycrypto" package, which is a Python+C implementation of many
     174cryptographic functions (the debian package is named "python-crypto").
     175
     176Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS
     177uses (which is a Python wrapper around the C++ -based Crypto++ library, a
     178library that is frequently installed as /usr/lib/libcryptopp.a, to avoid
     179problems with non-alphanumerics in filenames).
     180
     181The FTP server requires code in Twisted that enables asynchronous closing of
     182file-upload operations. This code was landed to Twisted's SVN trunk in r28453
     183on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should
     184be present in the next release after that. To use Tahoe-LAFS's FTP server with
     185Twisted-10.0 or earlier, you will need to apply the patch attached to
     186http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to
     187start the FTP server unless it detects the necessary support code in Twisted.
     188This patch is not needed for SFTP.
     189
     190Immutable and Mutable Files
     191===========================
     192
     193All files created via SFTP (and FTP) are immutable files. However, files
     194can only be created in writeable directories, which allows the directory
     195entry to be relinked to a different file. Normally, when the path of an
     196immutable file is opened for writing by SFTP, the directory entry is
     197relinked to another file with the newly written contents when the file
     198handle is closed. The old file is still present on the grid, and any other
     199caps to it will remain valid. (See docs/garbage-collection.txt for how to
     200reclaim the space used by files that are no longer needed.)
     201
     202The 'no-write' metadata field of a directory entry can override this
     203behaviour. If the 'no-write' field holds a true value, then a permission
     204error will occur when trying to write to the file, even if it is in a
     205writeable directory. This does not prevent the directory entry from being
     206unlinked or replaced.
     207
     208When using sshfs, the 'no-write' field can be set by clearing the 'w'
     209bits in the Unix permissions, for example using the command
     210'chmod 444 path/to/file'. Note that this does not mean that arbitrary
     211combinations of Unix permissions are supported. If the 'w' bits are
     212cleared on a link to a mutable file or directory, that link will become
     213read-only.
     214
     215If SFTP is used to write to an existing mutable file, it will publish a
     216new version when the file handle is closed.
     217
     218Known Issues
     219============
     220
     221Mutable files are not supported by the FTP frontend (`ticket #680
     222<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/680>`_). Currently, a directory
     223containing mutable files cannot even be listed over FTP.
     224
     225The FTP frontend sometimes fails to report errors, for example if an upload
     226fails because it does meet the "servers of happiness" threshold (`ticket #1081
     227<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1081>`_). Upload errors also may not
     228be reported when writing files using SFTP via sshfs (`ticket #1059
     229<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1059>`_).
     230
     231Non-ASCII filenames are not supported by FTP (`ticket #682
     232<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/682>`_). They can be used
     233with SFTP only if the client encodes filenames as UTF-8 (`ticket #1089
     234<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1089>`_).
     235
     236The gateway node may incur a memory leak when accessing many files via SFTP
     237(`ticket #1045 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045>`_).
     238
     239For other known issues in SFTP, see
     240<http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.
  • deleted file docs/frontends/FTP-and-SFTP.txt

    diff --git a/docs/frontends/FTP-and-SFTP.txt b/docs/frontends/FTP-and-SFTP.txt
    deleted file mode 100644
    index 8facc09..0000000
    + -  
    1 = Tahoe-LAFS FTP and SFTP Frontends =
    2 
    3 1.  FTP/SFTP Background
    4 2.  Tahoe-LAFS Support
    5 3.  Creating an Account File
    6 4.  Configuring FTP Access
    7 5.  Configuring SFTP Access
    8 6.  Dependencies
    9 7.  Immutable and mutable files
    10 
    11 
    12 == FTP/SFTP Background ==
    13 
    14 FTP is the venerable internet file-transfer protocol, first developed in
    15 1971. The FTP server usually listens on port 21. A separate connection is
    16 used for the actual data transfers, either in the same direction as the
    17 initial client-to-server connection (for PORT mode), or in the reverse
    18 direction (for PASV) mode. Connections are unencrypted, so passwords, file
    19 names, and file contents are visible to eavesdroppers.
    20 
    21 SFTP is the modern replacement, developed as part of the SSH "secure shell"
    22 protocol, and runs as a subchannel of the regular SSH connection. The SSH
    23 server usually listens on port 22. All connections are encrypted.
    24 
    25 Both FTP and SFTP were developed assuming a UNIX-like server, with accounts
    26 and passwords, octal file modes (user/group/other, read/write/execute), and
    27 ctime/mtime timestamps.
    28 
    29 
    30 == Tahoe-LAFS Support ==
    31 
    32 All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP
    33 clients (like /usr/bin/ftp, ncftp, and countless others) to access the
    34 virtual filesystem. They can also run an SFTP server, so SFTP clients (like
    35 /usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
    36 sit at the same level as the webapi interface.
    37 
    38 Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
    39 must be configured with a way to first authenticate a user (confirm that a
    40 prospective client has a legitimate claim to whatever authorities we might
    41 grant a particular user), and second to decide what root directory cap should
    42 be granted to the authenticated username. A username and password is used
    43 for this purpose. (The SFTP protocol is also capable of using client
    44 RSA or DSA public keys, but this is not currently implemented.)
    45 
    46 Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The
    47 first is a simple flat file with one account per line. The second is an
    48 HTTP-based login mechanism, backed by simple PHP script and a database. The
    49 latter form is used by allmydata.com to provide secure access to customer
    50 rootcaps.
    51 
    52 
    53 == Creating an Account File ==
    54 
    55 To use the first form, create a file (probably in
    56 BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
    57 space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so:
    58 
    59  % cat BASEDIR/private/ftp.accounts
    60  # This is a password line, (username, password, rootcap)
    61  alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
    62  bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
    63 
    64 Future versions of Tahoe-LAFS may support using client public keys for SFTP.
    65 The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify
    66 the public key format, so users cannot have a password equal to either of
    67 these strings.
    68 
    69 Now add an 'accounts.file' directive to your tahoe.cfg file, as described
    70 in the next sections.
    71 
    72 
    73 == Configuring FTP Access ==
    74 
    75 To enable the FTP server with an accounts file, add the following lines to
    76 the BASEDIR/tahoe.cfg file:
    77 
    78  [ftpd]
    79  enabled = true
    80  port = tcp:8021:interface=127.0.0.1
    81  accounts.file = private/ftp.accounts
    82 
    83 The FTP server will listen on the given port number and on the loopback
    84 interface only. The "accounts.file" pathname will be interpreted
    85 relative to the node's BASEDIR.
    86 
    87 To enable the FTP server with an account server instead, provide the URL of
    88 that server in an "accounts.url" directive:
    89 
    90  [ftpd]
    91  enabled = true
    92  port = tcp:8021:interface=127.0.0.1
    93  accounts.url = https://example.com/login
    94 
    95 You can provide both accounts.file and accounts.url, although it probably
    96 isn't very useful except for testing.
    97 
    98 FTP provides no security, and so your password or caps could be eavesdropped
    99 if you connect to the FTP server remotely. The examples above include
    100 ":interface=127.0.0.1" in the "port" option, which causes the server to only
    101 accept connections from localhost.
    102 
    103 
    104 == Configuring SFTP Access ==
    105 
    106 The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH
    107 server. It is important to give each server a distinct keypair, to prevent
    108 one server from masquerading as different one. The first time a client
    109 program talks to a given server, it will store the host key it receives, and
    110 will complain if a subsequent connection uses a different key. This reduces
    111 the opportunity for man-in-the-middle attacks to just the first connection.
    112 
    113 Exercise caution when connecting to the SFTP server remotely. The AES
    114 implementation used by the SFTP code does not have defenses against timing
    115 attacks. The code for encrypting the SFTP connection was not written by the
    116 Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed
    117 the code for encrypting files and directories in Tahoe-LAFS itself. If you
    118 can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway)
    119 only from a client on the same host, then you would be safe from any problem
    120 with the SFTP connection security. The examples given below enforce this
    121 policy by including ":interface=127.0.0.1" in the "port" option, which
    122 causes the server to only accept connections from localhost.
    123 
    124 You will use directives in the tahoe.cfg file to tell the SFTP code where to
    125 find these keys. To create one, use the ssh-keygen tool (which comes with the
    126 standard openssh client distribution):
    127 
    128 % cd BASEDIR
    129 % ssh-keygen -f private/ssh_host_rsa_key
    130 
    131 The server private key file must not have a passphrase.
    132 
    133 Then, to enable the SFTP server with an accounts file, add the following
    134 lines to the BASEDIR/tahoe.cfg file:
    135 
    136  [sftpd]
    137  enabled = true
    138  port = tcp:8022:interface=127.0.0.1
    139  host_pubkey_file = private/ssh_host_rsa_key.pub
    140  host_privkey_file = private/ssh_host_rsa_key
    141  accounts.file = private/ftp.accounts
    142 
    143 The SFTP server will listen on the given port number and on the loopback
    144 interface only. The "accounts.file" pathname will be interpreted
    145 relative to the node's BASEDIR.
    146 
    147 Or, to use an account server instead, do this:
    148 
    149  [sftpd]
    150  enabled = true
    151  port = tcp:8022:interface=127.0.0.1
    152  host_pubkey_file = private/ssh_host_rsa_key.pub
    153  host_privkey_file = private/ssh_host_rsa_key
    154  accounts.url = https://example.com/login
    155 
    156 You can provide both accounts.file and accounts.url, although it probably
    157 isn't very useful except for testing.
    158 
    159 For further information on SFTP compatibility and known issues with various
    160 clients and with the sshfs filesystem, see
    161 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.
    162 
    163 
    164 == Dependencies ==
    165 
    166 The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is a
    167 twisted shell, get it?). Many Linux distributions package the Conch code
    168 separately: debian puts it in the "python-twisted-conch" package. Conch
    169 requires the "pycrypto" package, which is a Python+C implementation of many
    170 cryptographic functions (the debian package is named "python-crypto").
    171 
    172 Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS
    173 uses (which is a Python wrapper around the C++ -based Crypto++ library, a
    174 library that is frequently installed as /usr/lib/libcryptopp.a, to avoid
    175 problems with non-alphanumerics in filenames).
    176 
    177 The FTP server requires code in Twisted that enables asynchronous closing of
    178 file-upload operations. This code was landed to Twisted's SVN trunk in r28453
    179 on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should
    180 be present in the next release after that. To use Tahoe-LAFS's FTP server with
    181 Twisted-10.0 or earlier, you will need to apply the patch attached to
    182 http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to
    183 start the FTP server unless it detects the necessary support code in Twisted.
    184 This patch is not needed for SFTP.
    185 
    186 
    187 == Immutable and Mutable Files ==
    188 
    189 All files created via SFTP (and FTP) are immutable files. However, files
    190 can only be created in writeable directories, which allows the directory
    191 entry to be relinked to a different file. Normally, when the path of an
    192 immutable file is opened for writing by SFTP, the directory entry is
    193 relinked to another file with the newly written contents when the file
    194 handle is closed. The old file is still present on the grid, and any other
    195 caps to it will remain valid. (See docs/garbage-collection.txt for how to
    196 reclaim the space used by files that are no longer needed.)
    197 
    198 The 'no-write' metadata field of a directory entry can override this
    199 behaviour. If the 'no-write' field holds a true value, then a permission
    200 error will occur when trying to write to the file, even if it is in a
    201 writeable directory. This does not prevent the directory entry from being
    202 unlinked or replaced.
    203 
    204 When using sshfs, the 'no-write' field can be set by clearing the 'w'
    205 bits in the Unix permissions, for example using the command
    206 'chmod 444 path/to/file'. Note that this does not mean that arbitrary
    207 combinations of Unix permissions are supported. If the 'w' bits are
    208 cleared on a link to a mutable file or directory, that link will become
    209 read-only.
    210 
    211 If SFTP is used to write to an existing mutable file, it will publish a
    212 new version when the file handle is closed.
    213 
    214 
    215 == Known Issues ==
    216 
    217 Mutable files are not supported by the FTP frontend (ticket #680). Currently,
    218 a directory containing mutable files cannot even be listed over FTP.
    219 
    220 The FTP frontend sometimes fails to report errors, for example if an upload
    221 fails because it does meet the "servers of happiness" threshold (ticket #1081).
    222 Upload errors also may not be reported when writing files using SFTP via sshfs
    223 (ticket #1059).
    224 
    225 Non-ASCII filenames are not supported by FTP (ticket #682). They can be used
    226 with SFTP only if the client encodes filenames as UTF-8 (ticket #1089).
    227 
    228 The gateway node may incur a memory leak when accessing many files via SFTP
    229 (ticket #1045).
    230 
    231 For other known issues in SFTP, see
    232 <http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.
  • new file docs/frontends/download-status.rst

    diff --git a/docs/frontends/download-status.rst b/docs/frontends/download-status.rst
    new file mode 100644
    index 0000000..315b6a3
    - +  
     1===============
     2Download status
     3===============
     4
     5
     6Introduction
     7============
     8
     9The WUI will display the "status" of uploads and downloads.
     10
     11The Welcome Page has a link entitled "Recent Uploads and Downloads"
     12which goes to this URL:
     13
     14http://$GATEWAY/status
     15
     16Each entry in the list of recent operations has a "status" link which
     17will take you to a page describing that operation.
     18
     19For immutable downloads, the page has a lot of information, and this
     20document is to explain what it all means. It was written by Brian
     21Warner, who wrote the v1.8.0 downloader code and the code which
     22generates this status report about the v1.8.0 downloader's
     23behavior. Brian posted it to the trac:
     24http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:1
     25
     26Then Zooko lightly edited it while copying it into the docs/
     27directory.
     28
     29What's involved in a download?
     30==============================
     31
     32Downloads are triggered by read() calls, each with a starting offset (defaults
     33to 0) and a length (defaults to the whole file). A regular webapi GET request
     34will result in a whole-file read() call.
     35
     36Each read() call turns into an ordered sequence of get_segment() calls. A
     37whole-file read will fetch all segments, in order, but partial reads or
     38multiple simultaneous reads will result in random-access of segments. Segment
     39reads always return ciphertext: the layer above that (in read()) is responsible
     40for decryption.
     41
     42Before we can satisfy any segment reads, we need to find some shares. ("DYHB"
     43is an abbreviation for "Do You Have Block", and is the message we send to
     44storage servers to ask them if they have any shares for us. The name is
     45historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive.
     46Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come
     47back eventually, or don't.
     48
     49Once we get enough positive DYHB responses, we have enough shares to start
     50downloading. We send "block requests" for various pieces of the share.
     51Responses come back eventually, or don't.
     52
     53When we get enough block-request responses for a given segment, we can decode
     54the data and satisfy the segment read.
     55
     56When the segment read completes, some or all of the segment data is used to
     57satisfy the read() call (if the read call started or ended in the middle of a
     58segment, we'll only use part of the data, otherwise we'll use all of it).
     59
     60Data on the download-status page
     61================================
     62
     63DYHB Requests
     64-------------
     65
     66This shows every Do-You-Have-Block query sent to storage servers and their
     67results. Each line shows the following:
     68
     69* the serverid to which the request was sent
     70* the time at which the request was sent. Note that all timestamps are
     71  relative to the start of the first read() call and indicated with a "+" sign
     72* the time at which the response was received (if ever)
     73* the share numbers that the server has, if any
     74* the elapsed time taken by the request
     75
     76Also, each line is colored according to the serverid. This color is also used
     77in the "Requests" section below.
     78
     79Read Events
     80-----------
     81
     82This shows all the FileNode read() calls and their overall results. Each line
     83shows:
     84
     85* the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file
     86  GET will start at 0 and read the entire file.
     87* the time at which the read() was made
     88* the time at which the request finished, either because the last byte of data
     89  was returned to the read() caller, or because they cancelled the read by
     90  calling stopProducing (i.e. closing the HTTP connection)
     91* the number of bytes returned to the caller so far
     92* the time spent on the read, so far
     93* the total time spent in AES decryption
     94* total time spend paused by the client (pauseProducing), generally because the
     95  HTTP connection filled up, which most streaming media players will do to
     96  limit how much data they have to buffer
     97* effective speed of the read(), not including paused time
     98
     99Segment Events
     100--------------
     101
     102This shows each get_segment() call and its resolution. This table is not well
     103organized, and my post-1.8.0 work will clean it up a lot. In its present form,
     104it records "request" and "delivery" events separately, indicated by the "type"
     105column.
     106
     107Each request shows the segment number being requested and the time at which the
     108get_segment() call was made.
     109
     110Each delivery shows:
     111
     112* segment number
     113* range of file data (as [OFFSET:+SIZE]) delivered
     114* elapsed time spent doing ZFEC decoding
     115* overall elapsed time fetching the segment
     116* effective speed of the segment fetch
     117
     118Requests
     119--------
     120
     121This shows every block-request sent to the storage servers. Each line shows:
     122
     123* the server to which the request was sent
     124* which share number it is referencing
     125* the portion of the share data being requested (as [OFFSET:+SIZE])
     126* the time the request was sent
     127* the time the response was received (if ever)
     128* the amount of data that was received (which might be less than SIZE if we
     129  tried to read off the end of the share)
     130* the elapsed time for the request (RTT=Round-Trip-Time)
     131
     132Also note that each Request line is colored according to the serverid it was
     133sent to. And all timestamps are shown relative to the start of the first
     134read() call: for example the first DYHB message was sent at +0.001393s about
     1351.4 milliseconds after the read() call started everything off.
  • deleted file docs/frontends/download-status.txt

    diff --git a/docs/frontends/download-status.txt b/docs/frontends/download-status.txt
    deleted file mode 100644
    index 90aaabf..0000000
    + -  
    1 The WUI will display the "status" of uploads and downloads.
    2 
    3 The Welcome Page has a link entitled "Recent Uploads and Downloads"
    4 which goes to this URL:
    5 
    6 http://$GATEWAY/status
    7 
    8 Each entry in the list of recent operations has a "status" link which
    9 will take you to a page describing that operation.
    10 
    11 For immutable downloads, the page has a lot of information, and this
    12 document is to explain what it all means. It was written by Brian
    13 Warner, who wrote the v1.8.0 downloader code and the code which
    14 generates this status report about the v1.8.0 downloader's
    15 behavior. Brian posted it to the trac:
    16 http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:1
    17 
    18 Then Zooko lightly edited it while copying it into the docs/
    19 directory.
    20 
    21 -------
    22 
    23 First, what's involved in a download?:
    24 
    25     downloads are triggered by read() calls, each with a starting offset (defaults to 0) and a length (defaults to the whole file). A regular webapi GET request will result in a whole-file read() call
    26     each read() call turns into an ordered sequence of get_segment() calls. A whole-file read will fetch all segments, in order, but partial reads or multiple simultaneous reads will result in random-access of segments. Segment reads always return ciphertext: the layer above that (in read()) is responsible for decryption.
    27     before we can satisfy any segment reads, we need to find some shares. ("DYHB" is an abbreviation for "Do You Have Block", and is the message we send to storage servers to ask them if they have any shares for us. The name is historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive. Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come back eventually, or don't.
    28     Once we get enough positive DYHB responses, we have enough shares to start downloading. We send "block requests" for various pieces of the share. Responses come back eventually, or don't.
    29     When we get enough block-request responses for a given segment, we can decode the data and satisfy the segment read.
    30     When the segment read completes, some or all of the segment data is used to satisfy the read() call (if the read call started or ended in the middle of a segment, we'll only use part of the data, otherwise we'll use all of it).
    31 
    32 With that background, here is the data currently on the download-status page:
    33 
    34     "DYHB Requests": this shows every Do-You-Have-Block query sent to storage servers and their results. Each line shows the following:
    35         the serverid to which the request was sent
    36         the time at which the request was sent. Note that all timestamps are relative to the start of the first read() call and indicated with a "+" sign
    37         the time at which the response was received (if ever)
    38         the share numbers that the server has, if any
    39         the elapsed time taken by the request
    40         also, each line is colored according to the serverid. This color is also used in the "Requests" section below.
    41 
    42     "Read Events": this shows all the FileNode read() calls and their overall results. Each line shows:
    43         the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file GET will start at 0 and read the entire file.
    44         the time at which the read() was made
    45         the time at which the request finished, either because the last byte of data was returned to the read() caller, or because they cancelled the read by calling stopProducing (i.e. closing the HTTP connection)
    46         the number of bytes returned to the caller so far
    47         the time spent on the read, so far
    48         the total time spent in AES decryption
    49         total time spend paused by the client (pauseProducing), generally because the HTTP connection filled up, which most streaming media players will do to limit how much data they have to buffer
    50         effective speed of the read(), not including paused time
    51 
    52     "Segment Events": this shows each get_segment() call and its resolution. This table is not well organized, and my post-1.8.0 work will clean it up a lot. In its present form, it records "request" and "delivery" events separately, indicated by the "type" column.
    53         Each request shows the segment number being requested and the time at which the get_segment() call was made
    54         Each delivery shows:
    55             segment number
    56             range of file data (as [OFFSET:+SIZE]) delivered
    57             elapsed time spent doing ZFEC decoding
    58             overall elapsed time fetching the segment
    59             effective speed of the segment fetch
    60 
    61     "Requests": this shows every block-request sent to the storage servers. Each line shows:
    62         the server to which the request was sent
    63         which share number it is referencing
    64         the portion of the share data being requested (as [OFFSET:+SIZE])
    65         the time the request was sent
    66         the time the response was received (if ever)
    67         the amount of data that was received (which might be less than SIZE if we tried to read off the end of the share)
    68         the elapsed time for the request (RTT=Round-Trip-Time)
    69 
    70 Also note that each Request line is colored according to the serverid it was sent to. And all timestamps are shown relative to the start of the first read() call: for example the first DYHB message was sent at +0.001393s about 1.4 milliseconds after the read() call started everything off.
  • new file docs/frontends/webapi.rst

    diff --git a/docs/frontends/webapi.rst b/docs/frontends/webapi.rst
    new file mode 100644
    index 0000000..31924bc
    - +  
     1==========================
     2The Tahoe REST-ful Web API
     3==========================
     4
     51.  `Enabling the web-API port`_
     62.  `Basic Concepts: GET, PUT, DELETE, POST`_
     73.  `URLs`_
     8
     9        1. `Child Lookup`_
     10
     114.  `Slow Operations, Progress, and Cancelling`_
     125.  `Programmatic Operations`_
     13
     14    1. `Reading a file`_
     15    2. `Writing/Uploading a File`_
     16    3. `Creating a New Directory`_
     17    4. `Get Information About A File Or Directory (as JSON)`_
     18    5. `Attaching an existing File or Directory by its read- or write-cap`_
     19    6. `Adding multiple files or directories to a parent directory at once`_
     20    7. `Deleting a File or Directory`_
     21
     226.  `Browser Operations: Human-Oriented Interfaces`_
     23
     24    1.  `Viewing A Directory (as HTML)`_
     25    2.  `Viewing/Downloading a File`_
     26    3.  `Get Information About A File Or Directory (as HTML)`_
     27    4.  `Creating a Directory`_
     28    5.  `Uploading a File`_
     29    6.  `Attaching An Existing File Or Directory (by URI)`_
     30    7.  `Deleting A Child`_
     31    8.  `Renaming A Child`_
     32    9.  `Other Utilities`_
     33    10. `Debugging and Testing Features`_
     34
     357.  `Other Useful Pages`_
     368.  `Static Files in /public_html`_
     379.  `Safety and security issues -- names vs. URIs`_
     3810. `Concurrency Issues`_
     39
     40Enabling the web-API port
     41=========================
     42
     43Every Tahoe node is capable of running a built-in HTTP server. To enable
     44this, just write a port number into the "[node]web.port" line of your node's
     45tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"
     46section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port
     473456.
     48
     49This string is actually a Twisted "strports" specification, meaning you can
     50get more control over the interface to which the server binds by supplying
     51additional arguments. For more details, see the documentation on
     52`twisted.application.strports
     53<http://twistedmatrix.com/documents/current/api/twisted.application.strports.html>`_.
     54
     55Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same
     56but binds to the loopback interface, ensuring that only the programs on the
     57local host can connect. Using "ssl:3456:privateKey=mykey.pem:certKey=cert.pem"
     58runs an SSL server.
     59
     60This webport can be set when the node is created by passing a --webport
     61option to the 'tahoe create-node' command. By default, the node listens on
     62port 3456, on the loopback (127.0.0.1) interface.
     63
     64Basic Concepts: GET, PUT, DELETE, POST
     65======================================
     66
     67As described in `architecture.rst`_, each file and directory in a Tahoe virtual
     68filesystem is referenced by an identifier that combines the designation of
     69the object with the authority to do something with it (such as read or modify
     70the contents). This identifier is called a "read-cap" or "write-cap",
     71depending upon whether it enables read-only or read-write access. These
     72"caps" are also referred to as URIs.
     73
     74.. _architecture.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/architecture.rst
     75
     76The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
     77"REpresentational State Transfer": the original scheme by which the World
     78Wide Web was intended to work. Each object (file or directory) is referenced
     79by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
     80DELETE) are used to manipulate these objects. You can think of the URL as a
     81noun, and the method as a verb.
     82
     83In REST, the GET method is used to retrieve information about an object, or
     84to retrieve some representation of the object itself. When the object is a
     85file, the basic GET method will simply return the contents of that file.
     86Other variations (generally implemented by adding query parameters to the
     87URL) will return information about the object, such as metadata. GET
     88operations are required to have no side-effects.
     89
     90PUT is used to upload new objects into the filesystem, or to replace an
     91existing object. DELETE it used to delete objects from the filesystem. Both
     92PUT and DELETE are required to be idempotent: performing the same operation
     93multiple times must have the same side-effects as only performing it once.
     94
     95POST is used for more complicated actions that cannot be expressed as a GET,
     96PUT, or DELETE. POST operations can be thought of as a method call: sending
     97some message to the object referenced by the URL. In Tahoe, POST is also used
     98for operations that must be triggered by an HTML form (including upload and
     99delete), because otherwise a regular web browser has no way to accomplish
     100these tasks. In general, everything that can be done with a PUT or DELETE can
     101also be done with a POST.
     102
     103Tahoe's web API is designed for two different kinds of consumer. The first is
     104a program that needs to manipulate the virtual file system. Such programs are
     105expected to use the RESTful interface described above. The second is a human
     106using a standard web browser to work with the filesystem. This user is given
     107a series of HTML pages with links to download files, and forms that use POST
     108actions to upload, rename, and delete files.
     109
     110When an error occurs, the HTTP response code will be set to an appropriate
     111400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
     112when the parameters to a webapi operation are invalid), and the HTTP response
     113body will usually contain a few lines of explanation as to the cause of the
     114error and possible responses. Unusual exceptions may result in a 500 Internal
     115Server Error as a catch-all, with a default response body containing
     116a Nevow-generated HTML-ized representation of the Python exception stack trace
     117that caused the problem. CLI programs which want to copy the response body to
     118stderr should provide an "Accept: text/plain" header to their requests to get
     119a plain text stack trace instead. If the Accept header contains ``*/*``, or
     120``text/*``, or text/html (or if there is no Accept header), HTML tracebacks will
     121be generated.
     122
     123URLs
     124====
     125
     126Tahoe uses a variety of read- and write- caps to identify files and
     127directories. The most common of these is the "immutable file read-cap", which
     128is used for most uploaded files. These read-caps look like the following::
     129
     130 URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
     131
     132The next most common is a "directory write-cap", which provides both read and
     133write access to a directory, and look like this::
     134
     135 URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
     136
     137There are also "directory read-caps", which start with "URI:DIR2-RO:", and
     138give read-only access to a directory. Finally there are also mutable file
     139read- and write- caps, which start with "URI:SSK", and give access to mutable
     140files.
     141
     142(Later versions of Tahoe will make these strings shorter, and will remove the
     143unfortunate colons, which must be escaped when these caps are embedded in
     144URLs.)
     145
     146To refer to any Tahoe object through the web API, you simply need to combine
     147a prefix (which indicates the HTTP server to use) with the cap (which
     148indicates which object inside that server to access). Since the default Tahoe
     149webport is 3456, the most common prefix is one that will use a local node
     150listening on this port::
     151
     152 http://127.0.0.1:3456/uri/ + $CAP
     153
     154So, to access the directory named above (which happens to be the
     155publically-writeable sample directory on the Tahoe test grid, described at
     156http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be::
     157
     158 http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
     159
     160(note that the colons in the directory-cap are url-encoded into "%3A"
     161sequences).
     162
     163Likewise, to access the file named above, use::
     164
     165 http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
     166
     167In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
     168or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
     169that refers to a file (whether mutable or immutable). So those URLs above can
     170be abbreviated as::
     171
     172 http://127.0.0.1:3456/uri/$DIRCAP/
     173 http://127.0.0.1:3456/uri/$FILECAP
     174
     175The operation summaries below will abbreviate these further, by eliding the
     176server prefix. They will be displayed like this::
     177
     178 /uri/$DIRCAP/
     179 /uri/$FILECAP
     180
     181
     182Child Lookup
     183------------
     184
     185Tahoe directories contain named child entries, just like directories in a regular
     186local filesystem. These child entries, called "dirnodes", consist of a name,
     187metadata, a write slot, and a read slot. The write and read slots normally contain
     188a write-cap and read-cap referring to the same object, which can be either a file
     189or a subdirectory. The write slot may be empty (actually, both may be empty,
     190but that is unusual).
     191
     192If you have a Tahoe URL that refers to a directory, and want to reference a
     193named child inside it, just append the child name to the URL. For example, if
     194our sample directory contains a file named "welcome.txt", we can refer to
     195that file with::
     196
     197 http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt
     198
     199(or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
     200
     201Multiple levels of subdirectories can be handled this way::
     202
     203 http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt
     204
     205In this document, when we need to refer to a URL that references a file using
     206this child-of-some-directory format, we'll use the following string::
     207
     208 /uri/$DIRCAP/[SUBDIRS../]FILENAME
     209
     210The "[SUBDIRS../]" part means that there are zero or more (optional)
     211subdirectory names in the middle of the URL. The "FILENAME" at the end means
     212that this whole URL refers to a file of some sort, rather than to a
     213directory.
     214
     215When we need to refer specifically to a directory in this way, we'll write::
     216
     217 /uri/$DIRCAP/[SUBDIRS../]SUBDIR
     218
     219
     220Note that all components of pathnames in URLs are required to be UTF-8
     221encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
     222with::
     223
     224 http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
     225
     226Also note that the filenames inside upload POST forms are interpreted using
     227whatever character set was provided in the conventional '_charset' field, and
     228defaults to UTF-8 if not otherwise specified. The JSON representation of each
     229directory contains native unicode strings. Tahoe directories are specified to
     230contain unicode filenames, and cannot contain binary strings that are not
     231representable as such.
     232
     233All Tahoe operations that refer to existing files or directories must include
     234a suitable read- or write- cap in the URL: the webapi server won't add one
     235for you. If you don't know the cap, you can't access the file. This allows
     236the security properties of Tahoe caps to be extended across the webapi
     237interface.
     238
     239Slow Operations, Progress, and Cancelling
     240=========================================
     241
     242Certain operations can be expected to take a long time. The "t=deep-check",
     243described below, will recursively visit every file and directory reachable
     244from a given starting point, which can take minutes or even hours for
     245extremely large directory structures. A single long-running HTTP request is a
     246fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
     247with waiting and give up on the connection.
     248
     249For this reason, long-running operations have an "operation handle", which
     250can be used to poll for status/progress messages while the operation
     251proceeds. This handle can also be used to cancel the operation. These handles
     252are created by the client, and passed in as a an "ophandle=" query argument
     253to the POST or PUT request which starts the operation. The following
     254operations can then be used to retrieve status:
     255
     256``GET /operations/$HANDLE?output=HTML   (with or without t=status)``
     257
     258``GET /operations/$HANDLE?output=JSON   (same)``
     259
     260 These two retrieve the current status of the given operation. Each operation
     261 presents a different sort of information, but in general the page retrieved
     262 will indicate:
     263
     264 * whether the operation is complete, or if it is still running
     265 * how much of the operation is complete, and how much is left, if possible
     266
     267 Note that the final status output can be quite large: a deep-manifest of a
     268 directory structure with 300k directories and 200k unique files is about
     269 275MB of JSON, and might take two minutes to generate. For this reason, the
     270 full status is not provided until the operation has completed.
     271
     272 The HTML form will include a meta-refresh tag, which will cause a regular
     273 web browser to reload the status page about 60 seconds later. This tag will
     274 be removed once the operation has completed.
     275
     276 There may be more status information available under
     277 /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
     278
     279``POST /operations/$HANDLE?t=cancel``
     280
     281 This terminates the operation, and returns an HTML page explaining what was
     282 cancelled. If the operation handle has already expired (see below), this
     283 POST will return a 404, which indicates that the operation is no longer
     284 running (either it was completed or terminated). The response body will be
     285 the same as a GET /operations/$HANDLE on this operation handle, and the
     286 handle will be expired immediately afterwards.
     287
     288The operation handle will eventually expire, to avoid consuming an unbounded
     289amount of memory. The handle's time-to-live can be reset at any time, by
     290passing a retain-for= argument (with a count of seconds) to either the
     291initial POST that starts the operation, or the subsequent GET request which
     292asks about the operation. For example, if a 'GET
     293/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
     294handle will remain active for 600 seconds (10 minutes) after the GET was
     295received.
     296
     297In addition, if the GET includes a release-after-complete=True argument, and
     298the operation has completed, the operation handle will be released
     299immediately.
     300
     301If a retain-for= argument is not used, the default handle lifetimes are:
     302
     303 * handles will remain valid at least until their operation finishes
     304 * uncollected handles for finished operations (i.e. handles for
     305   operations that have finished but for which the GET page has not been
     306   accessed since completion) will remain valid for four days, or for
     307   the total time consumed by the operation, whichever is greater.
     308 * collected handles (i.e. the GET page has been retrieved at least once
     309   since the operation completed) will remain valid for one day.
     310
     311Many "slow" operations can begin to use unacceptable amounts of memory when
     312operating on large directory structures. The memory usage increases when the
     313ophandle is polled, as the results must be copied into a JSON string, sent
     314over the wire, then parsed by a client. So, as an alternative, many "slow"
     315operations have streaming equivalents. These equivalents do not use operation
     316handles. Instead, they emit line-oriented status results immediately. Client
     317code can cancel the operation by simply closing the HTTP connection.
     318
     319Programmatic Operations
     320=======================
     321
     322Now that we know how to build URLs that refer to files and directories in a
     323Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
     324This section contains a catalog of GET, PUT, DELETE, and POST operations that
     325can be performed on these URLs. This set of operations are aimed at programs
     326that use HTTP to communicate with a Tahoe node. A later section describes
     327operations that are intended for web browsers.
     328
     329Reading A File
     330--------------
     331
     332``GET /uri/$FILECAP``
     333
     334``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
     335
     336 This will retrieve the contents of the given file. The HTTP response body
     337 will contain the sequence of bytes that make up the file.
     338
     339 To view files in a web browser, you may want more control over the
     340 Content-Type and Content-Disposition headers. Please see the next section
     341 "Browser Operations", for details on how to modify these URLs for that
     342 purpose.
     343
     344Writing/Uploading A File
     345------------------------
     346
     347``PUT /uri/$FILECAP``
     348
     349``PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME``
     350
     351 Upload a file, using the data from the HTTP request body, and add whatever
     352 child links and subdirectories are necessary to make the file available at
     353 the given location. Once this operation succeeds, a GET on the same URL will
     354 retrieve the same contents that were just uploaded. This will create any
     355 necessary intermediate subdirectories.
     356
     357 To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
     358
     359 In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
     360 writeable mutable file, that file's contents will be overwritten in-place. If
     361 it is a read-cap for a mutable file, an error will occur. If it is an
     362 immutable file, the old file will be discarded, and a new one will be put in
     363 its place.
     364
     365 When creating a new file, if "mutable=true" is in the query arguments, the
     366 operation will create a mutable file instead of an immutable one.
     367
     368 This returns the file-cap of the resulting file. If a new file was created
     369 by this method, the HTTP response code (as dictated by rfc2616) will be set
     370 to 201 CREATED. If an existing file was replaced or modified, the response
     371 code will be 200 OK.
     372
     373 Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'
     374 command can be used to invoke this operation.
     375
     376``PUT /uri``
     377
     378 This uploads a file, and produces a file-cap for the contents, but does not
     379 attach the file into the filesystem. No directories will be modified by
     380 this operation. The file-cap is returned as the body of the HTTP response.
     381
     382 If "mutable=true" is in the query arguments, the operation will create a
     383 mutable file, and return its write-cap in the HTTP respose. The default is
     384 to create an immutable file, returning the read-cap as a response.
     385
     386Creating A New Directory
     387------------------------
     388
     389``POST /uri?t=mkdir``
     390
     391``PUT /uri?t=mkdir``
     392
     393 Create a new empty directory and return its write-cap as the HTTP response
     394 body. This does not make the newly created directory visible from the
     395 filesystem. The "PUT" operation is provided for backwards compatibility:
     396 new code should use POST.
     397
     398``POST /uri?t=mkdir-with-children``
     399
     400 Create a new directory, populated with a set of child nodes, and return its
     401 write-cap as the HTTP response body. The new directory is not attached to
     402 any other directory: the returned write-cap is the only reference to it.
     403
     404 Initial children are provided as the body of the POST form (this is more
     405 efficient than doing separate mkdir and set_children operations). If the
     406 body is empty, the new directory will be empty. If not empty, the body will
     407 be interpreted as a UTF-8 JSON-encoded dictionary of children with which the
     408 new directory should be populated, using the same format as would be
     409 returned in the 'children' value of the t=json GET request, described below.
     410 Each dictionary key should be a child name, and each value should be a list
     411 of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and
     412 "metadata" keys (all others are ignored). For example, the PUT request body
     413 could be::
     414
     415  {
     416    "Fran\u00e7ais": [ "filenode", {
     417        "ro_uri": "URI:CHK:...",
     418        "size": bytes,
     419        "metadata": {
     420          "ctime": 1202777696.7564139,
     421          "mtime": 1202777696.7564139,
     422          "tahoe": {
     423            "linkcrtime": 1202777696.7564139,
     424            "linkmotime": 1202777696.7564139
     425            } } } ],
     426    "subdir":  [ "dirnode", {
     427        "rw_uri": "URI:DIR2:...",
     428        "ro_uri": "URI:DIR2-RO:...",
     429        "metadata": {
     430          "ctime": 1202778102.7589991,
     431          "mtime": 1202778111.2160511,
     432          "tahoe": {
     433            "linkcrtime": 1202777696.7564139,
     434            "linkmotime": 1202777696.7564139
     435          } } } ]
     436  }
     437
     438 For forward-compatibility, a mutable directory can also contain caps in
     439 a format that is unknown to the webapi server. When such caps are retrieved
     440 from a mutable directory in a "ro_uri" field, they will be prefixed with
     441 the string "ro.", indicating that they must not be decoded without
     442 checking that they are read-only. The "ro." prefix must not be stripped
     443 off without performing this check. (Future versions of the webapi server
     444 will perform it where necessary.)
     445
     446 If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
     447 and the webapi server recognizes the rw_uri as a write cap, then it will
     448 reset the ro_uri to the corresponding read cap and discard the original
     449 contents of ro_uri (in order to ensure that the two caps correspond to the
     450 same object and that the ro_uri is in fact read-only). However this may not
     451 happen for caps in a format unknown to the webapi server. Therefore, when
     452 writing a directory the webapi client should ensure that the contents
     453 of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
     454 (write cap, read cap) pair if possible. If the webapi client only has
     455 one cap and does not know whether it is a write cap or read cap, then
     456 it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
     457 client must not put a write cap into a "ro_uri" field.
     458
     459 The metadata may have a "no-write" field. If this is set to true in the
     460 metadata of a link, it will not be possible to open that link for writing
     461 via the SFTP frontend; see `FTP-and-SFTP.rst`_ for details.
     462 Also, if the "no-write" field is set to true in the metadata of a link to
     463 a mutable child, it will cause the link to be diminished to read-only.
     464 
     465 .. _FTP-and-SFTP.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/frontents/FTP-and-SFTP.rst
     466
     467 Note that the webapi-using client application must not provide the
     468 "Content-Type: multipart/form-data" header that usually accompanies HTML
     469 form submissions, since the body is not formatted this way. Doing so will
     470 cause a server error as the lower-level code misparses the request body.
     471
     472 Child file names should each be expressed as a unicode string, then used as
     473 keys of the dictionary. The dictionary should then be converted into JSON,
     474 and the resulting string encoded into UTF-8. This UTF-8 bytestring should
     475 then be used as the POST body.
     476
     477``POST /uri?t=mkdir-immutable``
     478
     479 Like t=mkdir-with-children above, but the new directory will be
     480 deep-immutable. This means that the directory itself is immutable, and that
     481 it can only contain objects that are treated as being deep-immutable, like
     482 immutable files, literal files, and deep-immutable directories.
     483
     484 For forward-compatibility, a deep-immutable directory can also contain caps
     485 in a format that is unknown to the webapi server. When such caps are retrieved
     486 from a deep-immutable directory in a "ro_uri" field, they will be prefixed
     487 with the string "imm.", indicating that they must not be decoded without
     488 checking that they are immutable. The "imm." prefix must not be stripped
     489 off without performing this check. (Future versions of the webapi server
     490 will perform it where necessary.)
     491 
     492 The cap for each child may be given either in the "rw_uri" or "ro_uri"
     493 field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
     494 then the webapi server will check that it is an immutable read-cap of a
     495 *known* format, and give an error if it is not. If a cap is given in the
     496 "ro_uri" field, then the webapi server will still check whether known
     497 caps are immutable, but for unknown caps it will simply assume that the
     498 cap can be stored, as described above. Note that an attacker would be
     499 able to store any cap in an immutable directory, so this check when
     500 creating the directory is only to help non-malicious clients to avoid
     501 accidentally giving away more authority than intended.
     502
     503 A non-empty request body is mandatory, since after the directory is created,
     504 it will not be possible to add more children to it.
     505
     506``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
     507
     508``PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
     509
     510 Create new directories as necessary to make sure that the named target
     511 ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
     512 intermediate mutable directories as necessary. If the named target directory
     513 already exists, this will make no changes to it.
     514
     515 If the final directory is created, it will be empty.
     516
     517 This operation will return an error if a blocking file is present at any of
     518 the parent names, preventing the server from creating the necessary parent
     519 directory; or if it would require changing an immutable directory.
     520
     521 The write-cap of the new directory will be returned as the HTTP response
     522 body.
     523
     524``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children``
     525
     526 Like /uri?t=mkdir-with-children, but the final directory is created as a
     527 child of an existing mutable directory. This will create additional
     528 intermediate mutable directories as necessary. If the final directory is
     529 created, it will be populated with initial children from the POST request
     530 body, as described above.
     531 
     532 This operation will return an error if a blocking file is present at any of
     533 the parent names, preventing the server from creating the necessary parent
     534 directory; or if it would require changing an immutable directory; or if
     535 the immediate parent directory already has a a child named SUBDIR.
     536
     537``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable``
     538
     539 Like /uri?t=mkdir-immutable, but the final directory is created as a child
     540 of an existing mutable directory. The final directory will be deep-immutable,
     541 and will be populated with the children specified as a JSON dictionary in
     542 the POST request body.
     543
     544 In Tahoe 1.6 this operation creates intermediate mutable directories if
     545 necessary, but that behaviour should not be relied on; see ticket #920.
     546
     547 This operation will return an error if the parent directory is immutable,
     548 or already has a child named SUBDIR.
     549
     550``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME``
     551
     552 Create a new empty mutable directory and attach it to the given existing
     553 directory. This will create additional intermediate directories as necessary.
     554
     555 This operation will return an error if a blocking file is present at any of
     556 the parent names, preventing the server from creating the necessary parent
     557 directory, or if it would require changing any immutable directory.
     558
     559 The URL of this operation points to the parent of the bottommost new directory,
     560 whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL
     561 that points directly to the bottommost new directory.
     562
     563``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME``
     564
     565 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will
     566 be populated with initial children via the POST request body. This command
     567 will create additional intermediate mutable directories as necessary.
     568 
     569 This operation will return an error if a blocking file is present at any of
     570 the parent names, preventing the server from creating the necessary parent
     571 directory; or if it would require changing an immutable directory; or if
     572 the immediate parent directory already has a a child named NAME.
     573
     574 Note that the name= argument must be passed as a queryarg, because the POST
     575 request body is used for the initial children JSON.
     576
     577``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME``
     578
     579 Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the
     580 final directory will be deep-immutable. The children are specified as a
     581 JSON dictionary in the POST request body. Again, the name= argument must be
     582 passed as a queryarg.
     583
     584 In Tahoe 1.6 this operation creates intermediate mutable directories if
     585 necessary, but that behaviour should not be relied on; see ticket #920.
     586
     587 This operation will return an error if the parent directory is immutable,
     588 or already has a child named NAME.
     589
     590Get Information About A File Or Directory (as JSON)
     591---------------------------------------------------
     592
     593``GET /uri/$FILECAP?t=json``
     594
     595``GET /uri/$DIRCAP?t=json``
     596
     597``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json``
     598
     599``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json``
     600
     601 This returns a machine-parseable JSON-encoded description of the given
     602 object. The JSON always contains a list, and the first element of the list is
     603 always a flag that indicates whether the referenced object is a file or a
     604 directory. If it is a capability to a file, then the information includes
     605 file size and URI, like this::
     606
     607  GET /uri/$FILECAP?t=json :
     608
     609   [ "filenode", {
     610         "ro_uri": file_uri,
     611         "verify_uri": verify_uri,
     612         "size": bytes,
     613         "mutable": false
     614         } ]
     615
     616 If it is a capability to a directory followed by a path from that directory
     617 to a file, then the information also includes metadata from the link to the
     618 file in the parent directory, like this::
     619
     620  GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
     621
     622   [ "filenode", {
     623         "ro_uri": file_uri,
     624         "verify_uri": verify_uri,
     625         "size": bytes,
     626         "mutable": false,
     627         "metadata": {
     628           "ctime": 1202777696.7564139,
     629           "mtime": 1202777696.7564139,
     630           "tahoe": {
     631                 "linkcrtime": 1202777696.7564139,
     632                 "linkmotime": 1202777696.7564139
     633                 } } } ]
     634
     635 If it is a directory, then it includes information about the children of
     636 this directory, as a mapping from child name to a set of data about the
     637 child (the same data that would appear in a corresponding GET?t=json of the
     638 child itself). The child entries also include metadata about each child,
     639 including link-creation- and link-change- timestamps. The output looks like
     640 this::
     641
     642  GET /uri/$DIRCAP?t=json :
     643  GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
     644
     645   [ "dirnode", {
     646         "rw_uri": read_write_uri,
     647         "ro_uri": read_only_uri,
     648         "verify_uri": verify_uri,
     649         "mutable": true,
     650         "children": {
     651           "foo.txt": [ "filenode", {
     652                   "ro_uri": uri,
     653                   "size": bytes,
     654                   "metadata": {
     655                         "ctime": 1202777696.7564139,
     656                         "mtime": 1202777696.7564139,
     657                         "tahoe": {
     658                           "linkcrtime": 1202777696.7564139,
     659                           "linkmotime": 1202777696.7564139
     660                           } } } ],
     661           "subdir":  [ "dirnode", {
     662                   "rw_uri": rwuri,
     663                   "ro_uri": rouri,
     664                   "metadata": {
     665                         "ctime": 1202778102.7589991,
     666                         "mtime": 1202778111.2160511,
     667                         "tahoe": {
     668                           "linkcrtime": 1202777696.7564139,
     669                           "linkmotime": 1202777696.7564139
     670                         } } } ]
     671         } } ]
     672
     673 In the above example, note how 'children' is a dictionary in which the keys
     674 are child names and the values depend upon whether the child is a file or a
     675 directory. The value is mostly the same as the JSON representation of the
     676 child object (except that directories do not recurse -- the "children"
     677 entry of the child is omitted, and the directory view includes the metadata
     678 that is stored on the directory edge).
     679
     680 The rw_uri field will be present in the information about a directory
     681 if and only if you have read-write access to that directory. The verify_uri
     682 field will be present if and only if the object has a verify-cap
     683 (non-distributed LIT files do not have verify-caps).
     684 
     685 If the cap is of an unknown format, then the file size and verify_uri will
     686 not be available::
     687
     688  GET /uri/$UNKNOWNCAP?t=json :
     689
     690   [ "unknown", {
     691         "ro_uri": unknown_read_uri
     692         } ]
     693
     694  GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json :
     695
     696   [ "unknown", {
     697         "rw_uri": unknown_write_uri,
     698         "ro_uri": unknown_read_uri,
     699         "mutable": true,
     700         "metadata": {
     701           "ctime": 1202777696.7564139,
     702           "mtime": 1202777696.7564139,
     703           "tahoe": {
     704                 "linkcrtime": 1202777696.7564139,
     705                 "linkmotime": 1202777696.7564139
     706                 } } } ]
     707
     708 As in the case of file nodes, the metadata will only be present when the
     709 capability is to a directory followed by a path. The "mutable" field is also
     710 not always present; when it is absent, the mutability of the object is not
     711 known.
     712
     713About the metadata
     714``````````````````
     715
     716The value of the 'tahoe':'linkmotime' key is updated whenever a link to a
     717child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever
     718a link to a child is created -- i.e. when there was not previously a link
     719under that name.
     720
     721Note however, that if the edge in the Tahoe filesystem points to a mutable
     722file and the contents of that mutable file is changed, then the
     723'tahoe':'linkmotime' value on that edge will *not* be updated, since the
     724edge itself wasn't updated -- only the mutable file was.
     725
     726The timestamps are represented as a number of seconds since the UNIX epoch
     727(1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long
     728term.
     729
     730In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated
     731instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
     732in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
     733are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
     734sub-dict to be deleted by webapi requests in which new metadata is
     735specified, and not to be added to existing child links that lack it.
     736
     737From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
     738populated or updated (see ticket #924), except by "tahoe backup" as
     739explained below. For backward compatibility, when an existing link is
     740updated and 'tahoe':'linkcrtime' is not present in the previous metadata
     741but 'ctime' is, the old value of 'ctime' is used as the new value of
     742'tahoe':'linkcrtime'.
     743
     744The reason we added the new fields in Tahoe v1.4.0 is that there is a
     745"set_children" API (described below) which you can use to overwrite the
     746values of the 'mtime'/'ctime' pair, and this API is used by the
     747"tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and
     748'ctime' values when backing up files from a local filesystem into the
     749Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used
     750to set anything under the 'tahoe' key of the metadata dict -- if you
     751include 'tahoe' keys in your 'metadata' arguments then it will silently
     752ignore those keys.
     753
     754Therefore, if the 'tahoe' sub-dict is present, you can rely on the
     755'linkcrtime' and 'linkmotime' values therein to have the semantics described
     756above. (This is assuming that only official Tahoe clients have been used to
     757write those links, and that their system clocks were set to what you expected
     758-- there is nothing preventing someone from editing their Tahoe client or
     759writing their own Tahoe client which would overwrite those values however
     760they like, and there is nothing to constrain their system clock from taking
     761any value.)
     762
     763When an edge is created or updated by "tahoe backup", the 'mtime' and
     764'ctime' keys on that edge are set as follows:
     765
     766* 'mtime' is set to the timestamp read from the local filesystem for the
     767  "mtime" of the local file in question, which means the last time the
     768  contents of that file were changed.
     769
     770* On Windows, 'ctime' is set to the creation timestamp for the file
     771  read from the local filesystem. On other platforms, 'ctime' is set to
     772  the UNIX "ctime" of the local file, which means the last time that
     773  either the contents or the metadata of the local file was changed.
     774
     775There are several ways that the 'ctime' field could be confusing:
     776
     7771. You might be confused about whether it reflects the time of the creation
     778   of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a
     779   timestamp copied in by "tahoe backup" from a local filesystem.
     780
     7812. You might be confused about whether it is a copy of the file creation
     782   time (if "tahoe backup" was run on a Windows system) or of the last
     783   contents-or-metadata change (if "tahoe backup" was run on a different
     784   operating system).
     785
     7863. You might be confused by the fact that changing the contents of a
     787   mutable file in Tahoe doesn't have any effect on any links pointing at
     788   that file in any directories, although "tahoe backup" sets the link
     789   'ctime'/'mtime' to reflect timestamps about the local file corresponding
     790   to the Tahoe file to which the link points.
     791
     7924. Also, quite apart from Tahoe, you might be confused about the meaning
     793   of the "ctime" in UNIX local filesystems, which people sometimes think
     794   means file creation time, but which actually means, in UNIX local
     795   filesystems, the most recent time that the file contents or the file
     796   metadata (such as owner, permission bits, extended attributes, etc.)
     797   has changed. Note that although "ctime" does not mean file creation time
     798   in UNIX, links created by a version of Tahoe prior to v1.7.0, and never
     799   written by "tahoe backup", will have 'ctime' set to the link creation
     800   time.
     801
     802
     803Attaching an existing File or Directory by its read- or write-cap
     804-----------------------------------------------------------------
     805
     806``PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
     807
     808 This attaches a child object (either a file or directory) to a specified
     809 location in the virtual filesystem. The child object is referenced by its
     810 read- or write- cap, as provided in the HTTP request body. This will create
     811 intermediate directories as necessary.
     812
     813 This is similar to a UNIX hardlink: by referencing a previously-uploaded file
     814 (or previously-created directory) instead of uploading/creating a new one,
     815 you can create two references to the same object.
     816
     817 The read- or write- cap of the child is provided in the body of the HTTP
     818 request, and this same cap is returned in the response body.
     819
     820 The default behavior is to overwrite any existing object at the same
     821 location. To prevent this (and make the operation return an error instead
     822 of overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
     823 With replace=false, this operation will return an HTTP 409 "Conflict" error
     824 if there is already an object at the given location, rather than
     825 overwriting the existing object. To allow the operation to overwrite a
     826 file, but return an error when trying to overwrite a directory, use
     827 "replace=only-files" (this behavior is closer to the traditional UNIX "mv"
     828 command). Note that "true", "t", and "1" are all synonyms for "True", and
     829 "false", "f", and "0" are synonyms for "False", and the parameter is
     830 case-insensitive.
     831 
     832 Note that this operation does not take its child cap in the form of
     833 separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
     834 child cap in a format unknown to the webapi server, unless its URI
     835 starts with "ro." or "imm.". This restriction is necessary because the
     836 server is not able to attenuate an unknown write cap to a read cap.
     837 Unknown URIs starting with "ro." or "imm.", on the other hand, are
     838 assumed to represent read caps. The client should not prefix a write
     839 cap with "ro." or "imm." and pass it to this operation, since that
     840 would result in granting the cap's write authority to holders of the
     841 directory read cap.
     842
     843Adding multiple files or directories to a parent directory at once
     844------------------------------------------------------------------
     845
     846``POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children``
     847
     848``POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children``    (Tahoe >= v1.6)
     849
     850 This command adds multiple children to a directory in a single operation.
     851 It reads the request body and interprets it as a JSON-encoded description
     852 of the child names and read/write-caps that should be added.
     853
     854 The body should be a JSON-encoded dictionary, in the same format as the
     855 "children" value returned by the "GET /uri/$DIRCAP?t=json" operation
     856 described above. In this format, each key is a child names, and the
     857 corresponding value is a tuple of (type, childinfo). "type" is ignored, and
     858 "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and
     859 "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and
     860 use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2
     861 look very much like DIR1 (except for any existing children of DIR2 that
     862 were not overwritten, and any existing "tahoe" metadata keys as described
     863 below).
     864
     865 When the set_children request contains a child name that already exists in
     866 the target directory, this command defaults to overwriting that child with
     867 the new value (both child cap and metadata, but if the JSON data does not
     868 contain a "metadata" key, the old child's metadata is preserved). The
     869 command takes a boolean "overwrite=" query argument to control this
     870 behavior. If you use "?t=set_children&overwrite=false", then an attempt to
     871 replace an existing child will instead cause an error.
     872
     873 Any "tahoe" key in the new child's "metadata" value is ignored. Any
     874 existing "tahoe" metadata is preserved. The metadata["tahoe"] value is
     875 reserved for metadata generated by the tahoe node itself. The only two keys
     876 currently placed here are "linkcrtime" and "linkmotime". For details, see
     877 the section above entitled "Get Information About A File Or Directory (as
     878 JSON)", in the "About the metadata" subsection.
     879 
     880 Note that this command was introduced with the name "set_children", which
     881 uses an underscore rather than a hyphen as other multi-word command names
     882 do. The variant with a hyphen is now accepted, but clients that desire
     883 backward compatibility should continue to use "set_children".
     884
     885
     886Deleting a File or Directory
     887----------------------------
     888
     889``DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME``
     890
     891 This removes the given name from its parent directory. CHILDNAME is the
     892 name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
     893 be modified.
     894
     895 Note that this does not actually delete the file or directory that the name
     896 points to from the tahoe grid -- it only removes the named reference from
     897 this directory. If there are other names in this directory or in other
     898 directories that point to the resource, then it will remain accessible
     899 through those paths. Even if all names pointing to this object are removed
     900 from their parent directories, then someone with possession of its read-cap
     901 can continue to access the object through that cap.
     902
     903 The object will only become completely unreachable once 1: there are no
     904 reachable directories that reference it, and 2: nobody is holding a read-
     905 or write- cap to the object. (This behavior is very similar to the way
     906 hardlinks and anonymous files work in traditional UNIX filesystems).
     907
     908 This operation will not modify more than a single directory. Intermediate
     909 directories which were implicitly created by PUT or POST methods will *not*
     910 be automatically removed by DELETE.
     911
     912 This method returns the file- or directory- cap of the object that was just
     913 removed.
     914
     915Browser Operations: Human-oriented interfaces
     916=============================================
     917
     918This section describes the HTTP operations that provide support for humans
     919running a web browser. Most of these operations use HTML forms that use POST
     920to drive the Tahoe node. This section is intended for HTML authors who want
     921to write web pages that contain forms and buttons which manipulate the Tahoe
     922filesystem.
     923
     924Note that for all POST operations, the arguments listed can be provided
     925either as URL query arguments or as form body fields. URL query arguments are
     926separated from the main URL by "?", and from each other by "&". For example,
     927"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
     928specified by using <input type="hidden"> elements. For clarity, the
     929descriptions below display the most significant arguments as URL query args.
     930
     931Viewing A Directory (as HTML)
     932-----------------------------
     933
     934``GET /uri/$DIRCAP/[SUBDIRS../]``
     935
     936 This returns an HTML page, intended to be displayed to a human by a web
     937 browser, which contains HREF links to all files and directories reachable
     938 from this directory. These HREF links do not have a t= argument, meaning
     939 that a human who follows them will get pages also meant for a human. It also
     940 contains forms to upload new files, and to delete files and directories.
     941 Those forms use POST methods to do their job.
     942
     943Viewing/Downloading a File
     944--------------------------
     945
     946``GET /uri/$FILECAP``
     947
     948``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
     949
     950 This will retrieve the contents of the given file. The HTTP response body
     951 will contain the sequence of bytes that make up the file.
     952
     953 If you want the HTTP response to include a useful Content-Type header,
     954 either use the second form (which starts with a $DIRCAP), or add a
     955 "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
     956 The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
     957 to determine a Content-Type (since Tahoe immutable files are merely
     958 sequences of bytes, not typed+named file objects).
     959
     960 If the URL has both filename= and "save=true" in the query arguments, then
     961 the server to add a "Content-Disposition: attachment" header, along with a
     962 filename= parameter. When a user clicks on such a link, most browsers will
     963 offer to let the user save the file instead of displaying it inline (indeed,
     964 most browsers will refuse to display it inline). "true", "t", "1", and other
     965 case-insensitive equivalents are all treated the same.
     966
     967 Character-set handling in URLs and HTTP headers is a dubious art [1]_. For
     968 maximum compatibility, Tahoe simply copies the bytes from the filename=
     969 argument into the Content-Disposition header's filename= parameter, without
     970 trying to interpret them in any particular way.
     971
     972
     973``GET /named/$FILECAP/FILENAME``
     974
     975 This is an alternate download form which makes it easier to get the correct
     976 filename. The Tahoe server will provide the contents of the given file, with
     977 a Content-Type header derived from the given filename. This form is used to
     978 get browsers to use the "Save Link As" feature correctly, and also helps
     979 command-line tools like "wget" and "curl" use the right filename. Note that
     980 this form can *only* be used with file caps; it is an error to use a
     981 directory cap after the /named/ prefix.
     982
     983Get Information About A File Or Directory (as HTML)
     984---------------------------------------------------
     985
     986``GET /uri/$FILECAP?t=info``
     987
     988``GET /uri/$DIRCAP/?t=info``
     989
     990``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info``
     991
     992``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info``
     993
     994 This returns a human-oriented HTML page with more detail about the selected
     995 file or directory object. This page contains the following items:
     996
     997 * object size
     998 * storage index
     999 * JSON representation
     1000 * raw contents (text/plain)
     1001 * access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
     1002 * check/verify/repair form
     1003 * deep-check/deep-size/deep-stats/manifest (for directories)
     1004 * replace-conents form (for mutable files)
     1005
     1006Creating a Directory
     1007--------------------
     1008
     1009``POST /uri?t=mkdir``
     1010
     1011 This creates a new empty directory, but does not attach it to the virtual
     1012 filesystem.
     1013
     1014 If a "redirect_to_result=true" argument is provided, then the HTTP response
     1015 will cause the web browser to be redirected to a /uri/$DIRCAP page that
     1016 gives access to the newly-created directory. If you bookmark this page,
     1017 you'll be able to get back to the directory again in the future. This is the
     1018 recommended way to start working with a Tahoe server: create a new unlinked
     1019 directory (using redirect_to_result=true), then bookmark the resulting
     1020 /uri/$DIRCAP page. There is a "create directory" button on the Welcome page
     1021 to invoke this action.
     1022
     1023 If "redirect_to_result=true" is not provided (or is given a value of
     1024 "false"), then the HTTP response body will simply be the write-cap of the
     1025 new directory.
     1026
     1027``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME``
     1028
     1029 This creates a new empty directory as a child of the designated SUBDIR. This
     1030 will create additional intermediate directories as necessary.
     1031
     1032 If a "when_done=URL" argument is provided, the HTTP response will cause the
     1033 web browser to redirect to the given URL. This provides a convenient way to
     1034 return the browser to the directory that was just modified. Without a
     1035 when_done= argument, the HTTP response will simply contain the write-cap of
     1036 the directory that was just created.
     1037
     1038
     1039Uploading a File
     1040----------------
     1041
     1042``POST /uri?t=upload``
     1043
     1044 This uploads a file, and produces a file-cap for the contents, but does not
     1045 attach the file into the filesystem. No directories will be modified by
     1046 this operation.
     1047
     1048 The file must be provided as the "file" field of an HTML encoded form body,
     1049 produced in response to an HTML form like this::
     1050 
     1051  <form action="/uri" method="POST" enctype="multipart/form-data">
     1052   <input type="hidden" name="t" value="upload" />
     1053   <input type="file" name="file" />
     1054   <input type="submit" value="Upload Unlinked" />
     1055  </form>
     1056
     1057 If a "when_done=URL" argument is provided, the response body will cause the
     1058 browser to redirect to the given URL. If the when_done= URL has the string
     1059 "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
     1060 newly created file-cap. (Note that without this substitution, there is no
     1061 way to access the file that was just uploaded).
     1062
     1063 The default (in the absence of when_done=) is to return an HTML page that
     1064 describes the results of the upload. This page will contain information
     1065 about which storage servers were used for the upload, how long each
     1066 operation took, etc.
     1067
     1068 If a "mutable=true" argument is provided, the operation will create a
     1069 mutable file, and the response body will contain the write-cap instead of
     1070 the upload results page. The default is to create an immutable file,
     1071 returning the upload results page as a response.
     1072
     1073
     1074``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
     1075
     1076 This uploads a file, and attaches it as a new child of the given directory,
     1077 which must be mutable. The file must be provided as the "file" field of an
     1078 HTML-encoded form body, produced in response to an HTML form like this::
     1079 
     1080  <form action="." method="POST" enctype="multipart/form-data">
     1081   <input type="hidden" name="t" value="upload" />
     1082   <input type="file" name="file" />
     1083   <input type="submit" value="Upload" />
     1084  </form>
     1085
     1086 A "name=" argument can be provided to specify the new child's name,
     1087 otherwise it will be taken from the "filename" field of the upload form
     1088 (most web browsers will copy the last component of the original file's
     1089 pathname into this field). To avoid confusion, name= is not allowed to
     1090 contain a slash.
     1091
     1092 If there is already a child with that name, and it is a mutable file, then
     1093 its contents are replaced with the data being uploaded. If it is not a
     1094 mutable file, the default behavior is to remove the existing child before
     1095 creating a new one. To prevent this (and make the operation return an error
     1096 instead of overwriting the old child), add a "replace=false" argument, as
     1097 "?t=upload&replace=false". With replace=false, this operation will return an
     1098 HTTP 409 "Conflict" error if there is already an object at the given
     1099 location, rather than overwriting the existing object. Note that "true",
     1100 "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
     1101 synonyms for "False". the parameter is case-insensitive.
     1102
     1103 This will create additional intermediate directories as necessary, although
     1104 since it is expected to be triggered by a form that was retrieved by "GET
     1105 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
     1106 already exist.
     1107
     1108 If a "mutable=true" argument is provided, any new file that is created will
     1109 be a mutable file instead of an immutable one. <input type="checkbox"
     1110 name="mutable" /> will give the user a way to set this option.
     1111
     1112 If a "when_done=URL" argument is provided, the HTTP response will cause the
     1113 web browser to redirect to the given URL. This provides a convenient way to
     1114 return the browser to the directory that was just modified. Without a
     1115 when_done= argument, the HTTP response will simply contain the file-cap of
     1116 the file that was just uploaded (a write-cap for mutable files, or a
     1117 read-cap for immutable files).
     1118
     1119``POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload``
     1120
     1121 This also uploads a file and attaches it as a new child of the given
     1122 directory, which must be mutable. It is a slight variant of the previous
     1123 operation, as the URL refers to the target file rather than the parent
     1124 directory. It is otherwise identical: this accepts mutable= and when_done=
     1125 arguments too.
     1126
     1127``POST /uri/$FILECAP?t=upload``
     1128
     1129 This modifies the contents of an existing mutable file in-place. An error is
     1130 signalled if $FILECAP does not refer to a mutable file. It behaves just like
     1131 the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms
     1132 in a web browser.
     1133
     1134Attaching An Existing File Or Directory (by URI)
     1135------------------------------------------------
     1136
     1137``POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP``
     1138
     1139 This attaches a given read- or write- cap "CHILDCAP" to the designated
     1140 directory, with a specified child name. This behaves much like the PUT t=uri
     1141 operation, and is a lot like a UNIX hardlink. It is subject to the same
     1142 restrictions as that operation on the use of cap formats unknown to the
     1143 webapi server.
     1144
     1145 This will create additional intermediate directories as necessary, although
     1146 since it is expected to be triggered by a form that was retrieved by "GET
     1147 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
     1148 already exist.
     1149
     1150 This accepts the same replace= argument as POST t=upload.
     1151
     1152Deleting A Child
     1153----------------
     1154
     1155``POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME``
     1156
     1157 This instructs the node to remove a child object (file or subdirectory) from
     1158 the given directory, which must be mutable. Note that the entire subtree is
     1159 unlinked from the parent. Unlike deleting a subdirectory in a UNIX local
     1160 filesystem, the subtree need not be empty; if it isn't, then other references
     1161 into the subtree will see that the child subdirectories are not modified by
     1162 this operation. Only the link from the given directory to its child is severed.
     1163
     1164Renaming A Child
     1165----------------
     1166
     1167``POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW``
     1168
     1169 This instructs the node to rename a child of the given directory, which must
     1170 be mutable. This has a similar effect to removing the child, then adding the
     1171 same child-cap under the new name, except that it preserves metadata. This
     1172 operation cannot move the child to a different directory.
     1173
     1174 This operation will replace any existing child of the new name, making it
     1175 behave like the UNIX "``mv -f``" command.
     1176
     1177Other Utilities
     1178---------------
     1179
     1180``GET /uri?uri=$CAP``
     1181
     1182  This causes a redirect to /uri/$CAP, and retains any additional query
     1183  arguments (like filename= or save=). This is for the convenience of web
     1184  forms which allow the user to paste in a read- or write- cap (obtained
     1185  through some out-of-band channel, like IM or email).
     1186
     1187  Note that this form merely redirects to the specific file or directory
     1188  indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
     1189  traverse to children by appending additional path segments to the URL.
     1190
     1191``GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME``
     1192
     1193  This provides a useful facility to browser-based user interfaces. It
     1194  returns a page containing a form targetting the "POST $DIRCAP t=rename"
     1195  functionality described above, with the provided $CHILDNAME present in the
     1196  'from_name' field of that form. I.e. this presents a form offering to
     1197  rename $CHILDNAME, requesting the new name, and submitting POST rename.
     1198
     1199``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
     1200
     1201 This returns the file- or directory- cap for the specified object.
     1202
     1203``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri``
     1204
     1205 This returns a read-only file- or directory- cap for the specified object.
     1206 If the object is an immutable file, this will return the same value as
     1207 t=uri.
     1208
     1209Debugging and Testing Features
     1210------------------------------
     1211
     1212These URLs are less-likely to be helpful to the casual Tahoe user, and are
     1213mainly intended for developers.
     1214
     1215``POST $URL?t=check``
     1216
     1217 This triggers the FileChecker to determine the current "health" of the
     1218 given file or directory, by counting how many shares are available. The
     1219 page that is returned will display the results. This can be used as a "show
     1220 me detailed information about this file" page.
     1221
     1222 If a verify=true argument is provided, the node will perform a more
     1223 intensive check, downloading and verifying every single bit of every share.
     1224
     1225 If an add-lease=true argument is provided, the node will also add (or
     1226 renew) a lease to every share it encounters. Each lease will keep the share
     1227 alive for a certain period of time (one month by default). Once the last
     1228 lease expires or is explicitly cancelled, the storage server is allowed to
     1229 delete the share.
     1230
     1231 If an output=JSON argument is provided, the response will be
     1232 machine-readable JSON instead of human-oriented HTML. The data is a
     1233 dictionary with the following keys::
     1234
     1235  storage-index: a base32-encoded string with the objects's storage index,
     1236                                 or an empty string for LIT files
     1237  summary: a string, with a one-line summary of the stats of the file
     1238  results: a dictionary that describes the state of the file. For LIT files,
     1239                   this dictionary has only the 'healthy' key, which will always be
     1240                   True. For distributed files, this dictionary has the following
     1241                   keys:
     1242        count-shares-good: the number of good shares that were found
     1243        count-shares-needed: 'k', the number of shares required for recovery
     1244        count-shares-expected: 'N', the number of total shares generated
     1245        count-good-share-hosts: this was intended to be the number of distinct
     1246                                                        storage servers with good shares. It is currently
     1247                                                        (as of Tahoe-LAFS v1.8.0) computed incorrectly;
     1248                                                        see ticket #1115.
     1249        count-wrong-shares: for mutable files, the number of shares for
     1250                                                versions other than the 'best' one (highest
     1251                                                sequence number, highest roothash). These are
     1252                                                either old ...
     1253        count-recoverable-versions: for mutable files, the number of
     1254                                                                recoverable versions of the file. For
     1255                                                                a healthy file, this will equal 1.
     1256        count-unrecoverable-versions: for mutable files, the number of
     1257                                                                  unrecoverable versions of the file.
     1258                                                                  For a healthy file, this will be 0.
     1259        count-corrupt-shares: the number of shares with integrity failures
     1260        list-corrupt-shares: a list of "share locators", one for each share
     1261                                                 that was found to be corrupt. Each share locator
     1262                                                 is a list of (serverid, storage_index, sharenum).
     1263        needs-rebalancing: (bool) True if there are multiple shares on a single
     1264                                           storage server, indicating a reduction in reliability
     1265                                           that could be resolved by moving shares to new
     1266                                           servers.
     1267        servers-responding: list of base32-encoded storage server identifiers,
     1268                                                one for each server which responded to the share
     1269                                                query.
     1270        healthy: (bool) True if the file is completely healthy, False otherwise.
     1271                         Healthy files have at least N good shares. Overlapping shares
     1272                         do not currently cause a file to be marked unhealthy. If there
     1273                         are at least N good shares, then corrupt shares do not cause the
     1274                         file to be marked unhealthy, although the corrupt shares will be
     1275                         listed in the results (list-corrupt-shares) and should be manually
     1276                         removed to wasting time in subsequent downloads (as the
     1277                         downloader rediscovers the corruption and uses alternate shares).
     1278                         Future compatibility: the meaning of this field may change to
     1279                         reflect whether the servers-of-happiness criterion is met
     1280                         (see ticket #614).
     1281        sharemap: dict mapping share identifier to list of serverids
     1282                          (base32-encoded strings). This indicates which servers are
     1283                          holding which shares. For immutable files, the shareid is
     1284                          an integer (the share number, from 0 to N-1). For
     1285                          immutable files, it is a string of the form
     1286                          'seq%d-%s-sh%d', containing the sequence number, the
     1287                          roothash, and the share number.
     1288
     1289``POST $URL?t=start-deep-check``    (must add &ophandle=XYZ)
     1290
     1291 This initiates a recursive walk of all files and directories reachable from
     1292 the target, performing a check on each one just like t=check. The result
     1293 page will contain a summary of the results, including details on any
     1294 file/directory that was not fully healthy.
     1295
     1296 t=start-deep-check can only be invoked on a directory. An error (400
     1297 BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
     1298 walker will deal with loops safely.
     1299
     1300 This accepts the same verify= and add-lease= arguments as t=check.
     1301
     1302 Since this operation can take a long time (perhaps a second per object),
     1303 the ophandle= argument is required (see "Slow Operations, Progress, and
     1304 Cancelling" above). The response to this POST will be a redirect to the
     1305 corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
     1306 match the output= argument given to the POST). The deep-check operation
     1307 will continue to run in the background, and the /operations page should be
     1308 used to find out when the operation is done.
     1309
     1310 Detailed check results for non-healthy files and directories will be
     1311 available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
     1312 contain links to these detailed results.
     1313
     1314 The HTML /operations/$HANDLE page for incomplete operations will contain a
     1315 meta-refresh tag, set to 60 seconds, so that a browser which uses
     1316 deep-check will automatically poll until the operation has completed.
     1317
     1318 The JSON page (/options/$HANDLE?output=JSON) will contain a
     1319 machine-readable JSON dictionary with the following keys::
     1320
     1321  finished: a boolean, True if the operation is complete, else False. Some
     1322                        of the remaining keys may not be present until the operation
     1323                        is complete.
     1324  root-storage-index: a base32-encoded string with the storage index of the
     1325                                          starting point of the deep-check operation
     1326  count-objects-checked: count of how many objects were checked. Note that
     1327                                                 non-distributed objects (i.e. small immutable LIT
     1328                                                 files) are not checked, since for these objects,
     1329                                                 the data is contained entirely in the URI.
     1330  count-objects-healthy: how many of those objects were completely healthy
     1331  count-objects-unhealthy: how many were damaged in some way
     1332  count-corrupt-shares: how many shares were found to have corruption,
     1333                                                summed over all objects examined
     1334  list-corrupt-shares: a list of "share identifiers", one for each share
     1335                                           that was found to be corrupt. Each share identifier
     1336                                           is a list of (serverid, storage_index, sharenum).
     1337  list-unhealthy-files: a list of (pathname, check-results) tuples, for
     1338                                                each file that was not fully healthy. 'pathname' is
     1339                                                a list of strings (which can be joined by "/"
     1340                                                characters to turn it into a single string),
     1341                                                relative to the directory on which deep-check was
     1342                                                invoked. The 'check-results' field is the same as
     1343                                                that returned by t=check&output=JSON, described
     1344                                                above.
     1345  stats: a dictionary with the same keys as the t=start-deep-stats command
     1346                 (described below)
     1347
     1348``POST $URL?t=stream-deep-check``
     1349
     1350 This initiates a recursive walk of all files and directories reachable from
     1351 the target, performing a check on each one just like t=check. For each
     1352 unique object (duplicates are skipped), a single line of JSON is emitted to
     1353 the HTTP response channel (or an error indication, see below). When the walk
     1354 is complete, a final line of JSON is emitted which contains the accumulated
     1355 file-size/count "deep-stats" data.
     1356
     1357 This command takes the same arguments as t=start-deep-check.
     1358
     1359 A CLI tool can split the response stream on newlines into "response units",
     1360 and parse each response unit as JSON. Each such parsed unit will be a
     1361 dictionary, and will contain at least the "type" key: a string, one of
     1362 "file", "directory", or "stats".
     1363
     1364 For all units that have a type of "file" or "directory", the dictionary will
     1365 contain the following keys::
     1366
     1367  "path": a list of strings, with the path that is traversed to reach the
     1368          object
     1369  "cap": a write-cap URI for the file or directory, if available, else a
     1370         read-cap URI
     1371  "verifycap": a verify-cap URI for the file or directory
     1372  "repaircap": an URI for the weakest cap that can still be used to repair
     1373               the object
     1374  "storage-index": a base32 storage index for the object
     1375  "check-results": a copy of the dictionary which would be returned by
     1376                   t=check&output=json, with three top-level keys:
     1377                   "storage-index", "summary", and "results", and a variety
     1378                   of counts and sharemaps in the "results" value.
     1379
     1380 Note that non-distributed files (i.e. LIT files) will have values of None
     1381 for verifycap, repaircap, and storage-index, since these files can neither
     1382 be verified nor repaired, and are not stored on the storage servers.
     1383 Likewise the check-results dictionary will be limited: an empty string for
     1384 storage-index, and a results dictionary with only the "healthy" key.
     1385
     1386 The last unit in the stream will have a type of "stats", and will contain
     1387 the keys described in the "start-deep-stats" operation, below.
     1388
     1389 If any errors occur during the traversal (specifically if a directory is
     1390 unrecoverable, such that further traversal is not possible), an error
     1391 indication is written to the response body, instead of the usual line of
     1392 JSON. This error indication line will begin with the string "ERROR:" (in all
     1393 caps), and contain a summary of the error on the rest of the line. The
     1394 remaining lines of the response body will be a python exception. The client
     1395 application should look for the ERROR: and stop processing JSON as soon as
     1396 it is seen. Note that neither a file being unrecoverable nor a directory
     1397 merely being unhealthy will cause traversal to stop. The line just before
     1398 the ERROR: will describe the directory that was untraversable, since the
     1399 unit is emitted to the HTTP response body before the child is traversed.
     1400
     1401
     1402``POST $URL?t=check&repair=true``
     1403
     1404 This performs a health check of the given file or directory, and if the
     1405 checker determines that the object is not healthy (some shares are missing
     1406 or corrupted), it will perform a "repair". During repair, any missing
     1407 shares will be regenerated and uploaded to new servers.
     1408
     1409 This accepts the same verify=true and add-lease= arguments as t=check. When
     1410 an output=JSON argument is provided, the machine-readable JSON response
     1411 will contain the following keys::
     1412
     1413  storage-index: a base32-encoded string with the objects's storage index,
     1414                                 or an empty string for LIT files
     1415  repair-attempted: (bool) True if repair was attempted
     1416  repair-successful: (bool) True if repair was attempted and the file was
     1417                                         fully healthy afterwards. False if no repair was
     1418                                         attempted, or if a repair attempt failed.
     1419  pre-repair-results: a dictionary that describes the state of the file
     1420                                          before any repair was performed. This contains exactly
     1421                                          the same keys as the 'results' value of the t=check
     1422                                          response, described above.
     1423  post-repair-results: a dictionary that describes the state of the file
     1424                                           after any repair was performed. If no repair was
     1425                                           performed, post-repair-results and pre-repair-results
     1426                                           will be the same. This contains exactly the same keys
     1427                                           as the 'results' value of the t=check response,
     1428                                           described above.
     1429
     1430``POST $URL?t=start-deep-check&repair=true``    (must add &ophandle=XYZ)
     1431
     1432 This triggers a recursive walk of all files and directories, performing a
     1433 t=check&repair=true on each one.
     1434
     1435 Like t=start-deep-check without the repair= argument, this can only be
     1436 invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
     1437 is invoked on a file. The recursive walker will deal with loops safely.
     1438
     1439 This accepts the same verify= and add-lease= arguments as
     1440 t=start-deep-check. It uses the same ophandle= mechanism as
     1441 start-deep-check. When an output=JSON argument is provided, the response
     1442 will contain the following keys::
     1443
     1444  finished: (bool) True if the operation has completed, else False
     1445  root-storage-index: a base32-encoded string with the storage index of the
     1446                                          starting point of the deep-check operation
     1447  count-objects-checked: count of how many objects were checked
     1448
     1449  count-objects-healthy-pre-repair: how many of those objects were completely
     1450                                                                        healthy, before any repair
     1451  count-objects-unhealthy-pre-repair: how many were damaged in some way
     1452  count-objects-healthy-post-repair: how many of those objects were completely
     1453                                                                          healthy, after any repair
     1454  count-objects-unhealthy-post-repair: how many were damaged in some way
     1455
     1456  count-repairs-attempted: repairs were attempted on this many objects.
     1457  count-repairs-successful: how many repairs resulted in healthy objects
     1458  count-repairs-unsuccessful: how many repairs resulted did not results in
     1459                                                          completely healthy objects
     1460  count-corrupt-shares-pre-repair: how many shares were found to have
     1461                                                                   corruption, summed over all objects
     1462                                                                   examined, before any repair
     1463  count-corrupt-shares-post-repair: how many shares were found to have
     1464                                                                        corruption, summed over all objects
     1465                                                                        examined, after any repair
     1466  list-corrupt-shares: a list of "share identifiers", one for each share
     1467                                           that was found to be corrupt (before any repair).
     1468                                           Each share identifier is a list of (serverid,
     1469                                           storage_index, sharenum).
     1470  list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
     1471                                                                 that were successfully repaired are not
     1472                                                                 included. These are shares that need
     1473                                                                 manual processing. Since immutable shares
     1474                                                                 cannot be modified by clients, all corruption
     1475                                                                 in immutable shares will be listed here.
     1476  list-unhealthy-files: a list of (pathname, check-results) tuples, for
     1477                                                each file that was not fully healthy. 'pathname' is
     1478                                                relative to the directory on which deep-check was
     1479                                                invoked. The 'check-results' field is the same as
     1480                                                that returned by t=check&repair=true&output=JSON,
     1481                                                described above.
     1482  stats: a dictionary with the same keys as the t=start-deep-stats command
     1483                 (described below)
     1484
     1485``POST $URL?t=stream-deep-check&repair=true``
     1486
     1487 This triggers a recursive walk of all files and directories, performing a
     1488 t=check&repair=true on each one. For each unique object (duplicates are
     1489 skipped), a single line of JSON is emitted to the HTTP response channel (or
     1490 an error indication). When the walk is complete, a final line of JSON is
     1491 emitted which contains the accumulated file-size/count "deep-stats" data.
     1492
     1493 This emits the same data as t=stream-deep-check (without the repair=true),
     1494 except that the "check-results" field is replaced with a
     1495 "check-and-repair-results" field, which contains the keys returned by
     1496 t=check&repair=true&output=json (i.e. repair-attempted, repair-successful,
     1497 pre-repair-results, and post-repair-results). The output does not contain
     1498 the summary dictionary that is provied by t=start-deep-check&repair=true
     1499 (the one with count-objects-checked and list-unhealthy-files), since the
     1500 receiving client is expected to calculate those values itself from the
     1501 stream of per-object check-and-repair-results.
     1502
     1503 Note that the "ERROR:" indication will only be emitted if traversal stops,
     1504 which will only occur if an unrecoverable directory is encountered. If a
     1505 file or directory repair fails, the traversal will continue, and the repair
     1506 failure will be indicated in the JSON data (in the "repair-successful" key).
     1507
     1508``POST $DIRURL?t=start-manifest``    (must add &ophandle=XYZ)
     1509
     1510 This operation generates a "manfest" of the given directory tree, mostly
     1511 for debugging. This is a table of (path, filecap/dircap), for every object
     1512 reachable from the starting directory. The path will be slash-joined, and
     1513 the filecap/dircap will contain a link to the object in question. This page
     1514 gives immediate access to every object in the virtual filesystem subtree.
     1515
     1516 This operation uses the same ophandle= mechanism as deep-check. The
     1517 corresponding /operations/$HANDLE page has three different forms. The
     1518 default is output=HTML.
     1519
     1520 If output=text is added to the query args, the results will be a text/plain
     1521 list. The first line is special: it is either "finished: yes" or "finished:
     1522 no"; if the operation is not finished, you must periodically reload the
     1523 page until it completes. The rest of the results are a plaintext list, with
     1524 one file/dir per line, slash-separated, with the filecap/dircap separated
     1525 by a space.
     1526
     1527 If output=JSON is added to the queryargs, then the results will be a
     1528 JSON-formatted dictionary with six keys. Note that because large directory
     1529 structures can result in very large JSON results, the full results will not
     1530 be available until the operation is complete (i.e. until output["finished"]
     1531 is True)::
     1532
     1533  finished (bool): if False then you must reload the page until True
     1534  origin_si (base32 str): the storage index of the starting point
     1535  manifest: list of (path, cap) tuples, where path is a list of strings.
     1536  verifycaps: list of (printable) verify cap strings
     1537  storage-index: list of (base32) storage index strings
     1538  stats: a dictionary with the same keys as the t=start-deep-stats command
     1539                 (described below)
     1540
     1541``POST $DIRURL?t=start-deep-size``   (must add &ophandle=XYZ)
     1542
     1543 This operation generates a number (in bytes) containing the sum of the
     1544 filesize of all directories and immutable files reachable from the given
     1545 directory. This is a rough lower bound of the total space consumed by this
     1546 subtree. It does not include space consumed by mutable files, nor does it
     1547 take expansion or encoding overhead into account. Later versions of the
     1548 code may improve this estimate upwards.
     1549
     1550 The /operations/$HANDLE status output consists of two lines of text::
     1551
     1552  finished: yes
     1553  size: 1234
     1554
     1555``POST $DIRURL?t=start-deep-stats``    (must add &ophandle=XYZ)
     1556
     1557 This operation performs a recursive walk of all files and directories
     1558 reachable from the given directory, and generates a collection of
     1559 statistics about those objects.
     1560
     1561 The result (obtained from the /operations/$OPHANDLE page) is a
     1562 JSON-serialized dictionary with the following keys (note that some of these
     1563 keys may be missing until 'finished' is True)::
     1564
     1565  finished: (bool) True if the operation has finished, else False
     1566  count-immutable-files: count of how many CHK files are in the set
     1567  count-mutable-files: same, for mutable files (does not include directories)
     1568  count-literal-files: same, for LIT files (data contained inside the URI)
     1569  count-files: sum of the above three
     1570  count-directories: count of directories
     1571  count-unknown: count of unrecognized objects (perhaps from the future)
     1572  size-immutable-files: total bytes for all CHK files in the set, =deep-size
     1573  size-mutable-files (TODO): same, for current version of all mutable files
     1574  size-literal-files: same, for LIT files
     1575  size-directories: size of directories (includes size-literal-files)
     1576  size-files-histogram: list of (minsize, maxsize, count) buckets,
     1577                                                with a histogram of filesizes, 5dB/bucket,
     1578                                                for both literal and immutable files
     1579  largest-directory: number of children in the largest directory
     1580  largest-immutable-file: number of bytes in the largest CHK file
     1581
     1582 size-mutable-files is not implemented, because it would require extra
     1583 queries to each mutable file to get their size. This may be implemented in
     1584 the future.
     1585
     1586 Assuming no sharing, the basic space consumed by a single root directory is
     1587 the sum of size-immutable-files, size-mutable-files, and size-directories.
     1588 The actual disk space used by the shares is larger, because of the
     1589 following sources of overhead::
     1590
     1591  integrity data
     1592  expansion due to erasure coding
     1593  share management data (leases)
     1594  backend (ext3) minimum block size
     1595
     1596``POST $URL?t=stream-manifest``
     1597
     1598 This operation performs a recursive walk of all files and directories
     1599 reachable from the given starting point. For each such unique object
     1600 (duplicates are skipped), a single line of JSON is emitted to the HTTP
     1601 response channel (or an error indication, see below). When the walk is
     1602 complete, a final line of JSON is emitted which contains the accumulated
     1603 file-size/count "deep-stats" data.
     1604
     1605 A CLI tool can split the response stream on newlines into "response units",
     1606 and parse each response unit as JSON. Each such parsed unit will be a
     1607 dictionary, and will contain at least the "type" key: a string, one of
     1608 "file", "directory", or "stats".
     1609
     1610 For all units that have a type of "file" or "directory", the dictionary will
     1611 contain the following keys::
     1612
     1613  "path": a list of strings, with the path that is traversed to reach the
     1614          object
     1615  "cap": a write-cap URI for the file or directory, if available, else a
     1616         read-cap URI
     1617  "verifycap": a verify-cap URI for the file or directory
     1618  "repaircap": an URI for the weakest cap that can still be used to repair
     1619               the object
     1620  "storage-index": a base32 storage index for the object
     1621
     1622 Note that non-distributed files (i.e. LIT files) will have values of None
     1623 for verifycap, repaircap, and storage-index, since these files can neither
     1624 be verified nor repaired, and are not stored on the storage servers.
     1625
     1626 The last unit in the stream will have a type of "stats", and will contain
     1627 the keys described in the "start-deep-stats" operation, below.
     1628
     1629 If any errors occur during the traversal (specifically if a directory is
     1630 unrecoverable, such that further traversal is not possible), an error
     1631 indication is written to the response body, instead of the usual line of
     1632 JSON. This error indication line will begin with the string "ERROR:" (in all
     1633 caps), and contain a summary of the error on the rest of the line. The
     1634 remaining lines of the response body will be a python exception. The client
     1635 application should look for the ERROR: and stop processing JSON as soon as
     1636 it is seen. The line just before the ERROR: will describe the directory that
     1637 was untraversable, since the manifest entry is emitted to the HTTP response
     1638 body before the child is traversed.
     1639
     1640Other Useful Pages
     1641==================
     1642
     1643The portion of the web namespace that begins with "/uri" (and "/named") is
     1644dedicated to giving users (both humans and programs) access to the Tahoe
     1645virtual filesystem. The rest of the namespace provides status information
     1646about the state of the Tahoe node.
     1647
     1648``GET /``   (the root page)
     1649
     1650This is the "Welcome Page", and contains a few distinct sections::
     1651
     1652 Node information: library versions, local nodeid, services being provided.
     1653
     1654 Filesystem Access Forms: create a new directory, view a file/directory by
     1655                          URI, upload a file (unlinked), download a file by
     1656                          URI.
     1657
     1658 Grid Status: introducer information, helper information, connected storage
     1659              servers.
     1660
     1661``GET /status/``
     1662
     1663 This page lists all active uploads and downloads, and contains a short list
     1664 of recent upload/download operations. Each operation has a link to a page
     1665 that describes file sizes, servers that were involved, and the time consumed
     1666 in each phase of the operation.
     1667
     1668 A GET of /status/?t=json will contain a machine-readable subset of the same
     1669 data. It returns a JSON-encoded dictionary. The only key defined at this
     1670 time is "active", with a value that is a list of operation dictionaries, one
     1671 for each active operation. Once an operation is completed, it will no longer
     1672 appear in data["active"] .
     1673
     1674 Each op-dict contains a "type" key, one of "upload", "download",
     1675 "mapupdate", "publish", or "retrieve" (the first two are for immutable
     1676 files, while the latter three are for mutable files and directories).
     1677
     1678 The "upload" op-dict will contain the following keys::
     1679
     1680  type (string): "upload"
     1681  storage-index-string (string): a base32-encoded storage index
     1682  total-size (int): total size of the file
     1683  status (string): current status of the operation
     1684  progress-hash (float): 1.0 when the file has been hashed
     1685  progress-ciphertext (float): 1.0 when the file has been encrypted.
     1686  progress-encode-push (float): 1.0 when the file has been encoded and
     1687                                                                pushed to the storage servers. For helper
     1688                                                                uploads, the ciphertext value climbs to 1.0
     1689                                                                first, then encoding starts. For unassisted
     1690                                                                uploads, ciphertext and encode-push progress
     1691                                                                will climb at the same pace.
     1692
     1693 The "download" op-dict will contain the following keys::
     1694
     1695  type (string): "download"
     1696  storage-index-string (string): a base32-encoded storage index
     1697  total-size (int): total size of the file
     1698  status (string): current status of the operation
     1699  progress (float): 1.0 when the file has been fully downloaded
     1700
     1701 Front-ends which want to report progress information are advised to simply
     1702 average together all the progress-* indicators. A slightly more accurate
     1703 value can be found by ignoring the progress-hash value (since the current
     1704 implementation hashes synchronously, so clients will probably never see
     1705 progress-hash!=1.0).
     1706
     1707``GET /provisioning/``
     1708
     1709 This page provides a basic tool to predict the likely storage and bandwidth
     1710 requirements of a large Tahoe grid. It provides forms to input things like
     1711 total number of users, number of files per user, average file size, number
     1712 of servers, expansion ratio, hard drive failure rate, etc. It then provides
     1713 numbers like how many disks per server will be needed, how many read
     1714 operations per second should be expected, and the likely MTBF for files in
     1715 the grid. This information is very preliminary, and the model upon which it
     1716 is based still needs a lot of work.
     1717
     1718``GET /helper_status/``
     1719
     1720 If the node is running a helper (i.e. if [helper]enabled is set to True in
     1721 tahoe.cfg), then this page will provide a list of all the helper operations
     1722 currently in progress. If "?t=json" is added to the URL, it will return a
     1723 JSON-formatted list of helper statistics, which can then be used to produce
     1724 graphs to indicate how busy the helper is.
     1725
     1726``GET /statistics/``
     1727
     1728 This page provides "node statistics", which are collected from a variety of
     1729 sources::
     1730
     1731   load_monitor: every second, the node schedules a timer for one second in
     1732                 the future, then measures how late the subsequent callback
     1733                 is. The "load_average" is this tardiness, measured in
     1734                 seconds, averaged over the last minute. It is an indication
     1735                 of a busy node, one which is doing more work than can be
     1736                 completed in a timely fashion. The "max_load" value is the
     1737                 highest value that has been seen in the last 60 seconds.
     1738
     1739   cpu_monitor: every minute, the node uses time.clock() to measure how much
     1740                CPU time it has used, and it uses this value to produce
     1741                1min/5min/15min moving averages. These values range from 0%
     1742                (0.0) to 100% (1.0), and indicate what fraction of the CPU
     1743                has been used by the Tahoe node. Not all operating systems
     1744                provide meaningful data to time.clock(): they may report 100%
     1745                CPU usage at all times.
     1746
     1747   uploader: this counts how many immutable files (and bytes) have been
     1748             uploaded since the node was started
     1749
     1750   downloader: this counts how many immutable files have been downloaded
     1751               since the node was started
     1752
     1753   publishes: this counts how many mutable files (including directories) have
     1754              been modified since the node was started
     1755
     1756   retrieves: this counts how many mutable files (including directories) have
     1757              been read since the node was started
     1758
     1759 There are other statistics that are tracked by the node. The "raw stats"
     1760 section shows a formatted dump of all of them.
     1761
     1762 By adding "?t=json" to the URL, the node will return a JSON-formatted
     1763 dictionary of stats values, which can be used by other tools to produce
     1764 graphs of node behavior. The misc/munin/ directory in the source
     1765 distribution provides some tools to produce these graphs.
     1766
     1767``GET /``   (introducer status)
     1768
     1769 For Introducer nodes, the welcome page displays information about both
     1770 clients and servers which are connected to the introducer. Servers make
     1771 "service announcements", and these are listed in a table. Clients will
     1772 subscribe to hear about service announcements, and these subscriptions are
     1773 listed in a separate table. Both tables contain information about what
     1774 version of Tahoe is being run by the remote node, their advertised and
     1775 outbound IP addresses, their nodeid and nickname, and how long they have
     1776 been available.
     1777
     1778 By adding "?t=json" to the URL, the node will return a JSON-formatted
     1779 dictionary of stats values, which can be used to produce graphs of connected
     1780 clients over time. This dictionary has the following keys::
     1781
     1782  ["subscription_summary"] : a dictionary mapping service name (like
     1783                             "storage") to an integer with the number of
     1784                             clients that have subscribed to hear about that
     1785                             service
     1786  ["announcement_summary"] : a dictionary mapping service name to an integer
     1787                             with the number of servers which are announcing
     1788                             that service
     1789  ["announcement_distinct_hosts"] : a dictionary mapping service name to an
     1790                                    integer which represents the number of
     1791                                    distinct hosts that are providing that
     1792                                    service. If two servers have announced
     1793                                    FURLs which use the same hostnames (but
     1794                                    different ports and tubids), they are
     1795                                    considered to be on the same host.
     1796
     1797
     1798Static Files in /public_html
     1799============================
     1800
     1801The webapi server will take any request for a URL that starts with /static
     1802and serve it from a configurable directory which defaults to
     1803$BASEDIR/public_html . This is configured by setting the "[node]web.static"
     1804value in $BASEDIR/tahoe.cfg . If this is left at the default value of
     1805"public_html", then http://localhost:3456/static/subdir/foo.html will be
     1806served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
     1807
     1808This can be useful to serve a javascript application which provides a
     1809prettier front-end to the rest of the Tahoe webapi.
     1810
     1811
     1812Safety and security issues -- names vs. URIs
     1813============================================
     1814
     1815Summary: use explicit file- and dir- caps whenever possible, to reduce the
     1816potential for surprises when the filesystem structure is changed.
     1817
     1818Tahoe provides a mutable filesystem, but the ways that the filesystem can
     1819change are limited. The only thing that can change is that the mapping from
     1820child names to child objects that each directory contains can be changed by
     1821adding a new child name pointing to an object, removing an existing child name,
     1822or changing an existing child name to point to a different object.
     1823
     1824Obviously if you query Tahoe for information about the filesystem and then act
     1825to change the filesystem (such as by getting a listing of the contents of a
     1826directory and then adding a file to the directory), then the filesystem might
     1827have been changed after you queried it and before you acted upon it.  However,
     1828if you use the URI instead of the pathname of an object when you act upon the
     1829object, then the only change that can happen is if the object is a directory
     1830then the set of child names it has might be different. If, on the other hand,
     1831you act upon the object using its pathname, then a different object might be in
     1832that place, which can result in more kinds of surprises.
     1833
     1834For example, suppose you are writing code which recursively downloads the
     1835contents of a directory. The first thing your code does is fetch the listing
     1836of the contents of the directory. For each child that it fetched, if that
     1837child is a file then it downloads the file, and if that child is a directory
     1838then it recurses into that directory. Now, if the download and the recurse
     1839actions are performed using the child's name, then the results might be
     1840wrong, because for example a child name that pointed to a sub-directory when
     1841you listed the directory might have been changed to point to a file (in which
     1842case your attempt to recurse into it would result in an error and the file
     1843would be skipped), or a child name that pointed to a file when you listed the
     1844directory might now point to a sub-directory (in which case your attempt to
     1845download the child would result in a file containing HTML text describing the
     1846sub-directory!).
     1847
     1848If your recursive algorithm uses the uri of the child instead of the name of
     1849the child, then those kinds of mistakes just can't happen. Note that both the
     1850child's name and the child's URI are included in the results of listing the
     1851parent directory, so it isn't any harder to use the URI for this purpose.
     1852
     1853The read and write caps in a given directory node are separate URIs, and
     1854can't be assumed to point to the same object even if they were retrieved in
     1855the same operation (although the webapi server attempts to ensure this
     1856in most cases). If you need to rely on that property, you should explicitly
     1857verify it. More generally, you should not make assumptions about the
     1858internal consistency of the contents of mutable directories. As a result
     1859of the signatures on mutable object versions, it is guaranteed that a given
     1860version was written in a single update, but -- as in the case of a file --
     1861the contents may have been chosen by a malicious writer in a way that is
     1862designed to confuse applications that rely on their consistency.
     1863
     1864In general, use names if you want "whatever object (whether file or
     1865directory) is found by following this name (or sequence of names) when my
     1866request reaches the server". Use URIs if you want "this particular object".
     1867
     1868Concurrency Issues
     1869==================
     1870
     1871Tahoe uses both mutable and immutable files. Mutable files can be created
     1872explicitly by doing an upload with ?mutable=true added, or implicitly by
     1873creating a new directory (since a directory is just a special way to
     1874interpret a given mutable file).
     1875
     1876Mutable files suffer from the same consistency-vs-availability tradeoff that
     1877all distributed data storage systems face. It is not possible to
     1878simultaneously achieve perfect consistency and perfect availability in the
     1879face of network partitions (servers being unreachable or faulty).
     1880
     1881Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
     1882place, known as the Prime Coordination Directive: "Don't Do That". What this
     1883means is that if write-access to a mutable file is available to several
     1884parties, then those parties are responsible for coordinating their activities
     1885to avoid multiple simultaneous updates. This could be achieved by having
     1886these parties talk to each other and using some sort of locking mechanism, or
     1887by serializing all changes through a single writer.
     1888
     1889The consequences of performing uncoordinated writes can vary. Some of the
     1890writers may lose their changes, as somebody else wins the race condition. In
     1891many cases the file will be left in an "unhealthy" state, meaning that there
     1892are not as many redundant shares as we would like (reducing the reliability
     1893of the file against server failures). In the worst case, the file can be left
     1894in such an unhealthy state that no version is recoverable, even the old ones.
     1895It is this small possibility of data loss that prompts us to issue the Prime
     1896Coordination Directive.
     1897
     1898Tahoe nodes implement internal serialization to make sure that a single Tahoe
     1899node cannot conflict with itself. For example, it is safe to issue two
     1900directory modification requests to a single tahoe node's webapi server at the
     1901same time, because the Tahoe node will internally delay one of them until
     1902after the other has finished being applied. (This feature was introduced in
     1903Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
     1904web requests themselves).
     1905
     1906For more details, please see the "Consistency vs Availability" and "The Prime
     1907Coordination Directive" sections of mutable.txt, in the same directory as
     1908this file.
     1909
     1910
     1911.. [1] URLs and HTTP and UTF-8, Oh My
     1912
     1913 HTTP does not provide a mechanism to specify the character set used to
     1914 encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
     1915 the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
     1916 For example, suppose we want to provoke the server into using a filename of
     1917 "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
     1918 is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
     1919 repr() function would show). To encode this into a URL, the non-printable
     1920 characters must be escaped with the urlencode '%XX' mechansim, giving us
     1921 "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
     1922 /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
     1923 provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
     1924
     1925 The response header will need to indicate a non-ASCII filename. The actual
     1926 mechanism to do this is not clear. For ASCII filenames, the response header
     1927 would look like::
     1928
     1929  Content-Disposition: attachment; filename="english.txt"
     1930
     1931 If Tahoe were to enforce the utf-8 convention, it would need to decode the
     1932 URL argument into a unicode string, and then encode it back into a sequence
     1933 of bytes when creating the response header. One possibility would be to use
     1934 unencoded utf-8. Developers suggest that IE7 might accept this::
     1935
     1936  #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
     1937    (note, the last four bytes of that line, not including the newline, are
     1938    0xC3 0xA9 0x65 0x22)
     1939
     1940 RFC2231#4 (dated 1997): suggests that the following might work, and some
     1941 developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
     1942 it is supported by firefox (but not IE7)::
     1943
     1944  #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
     1945
     1946 My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
     1947 the filename= parameter is defined to be wrapped in quotes (presumeably to
     1948 allow spaces without breaking the parsing of subsequent parameters), which
     1949 would give us::
     1950
     1951  #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
     1952
     1953 However this is contrary to the examples in the email thread listed above.
     1954
     1955 Developers report that IE7 (when it is configured for UTF-8 URL encoding,
     1956 which is not the default in asian countries), will accept::
     1957
     1958  #4: Content-Disposition: attachment; filename=fianc%C3%A9e
     1959
     1960 However, for maximum compatibility, Tahoe simply copies bytes from the URL
     1961 into the response header, rather than enforcing the utf-8 convention. This
     1962 means it does not try to decode the filename from the URL argument, nor does
     1963 it encode the filename into the response header.
  • deleted file docs/frontends/webapi.txt

    diff --git a/docs/frontends/webapi.txt b/docs/frontends/webapi.txt
    deleted file mode 100644
    index bf23daf..0000000
    + -  
    1 
    2 = The Tahoe REST-ful Web API =
    3 
    4 1. Enabling the web-API port
    5 2. Basic Concepts: GET, PUT, DELETE, POST
    6 3. URLs, Machine-Oriented Interfaces
    7 4. Browser Operations: Human-Oriented Interfaces
    8 5. Welcome / Debug / Status pages
    9 6. Static Files in /public_html
    10 7. Safety and security issues -- names vs. URIs
    11 8. Concurrency Issues
    12 
    13 
    14 == Enabling the web-API port ==
    15 
    16 Every Tahoe node is capable of running a built-in HTTP server. To enable
    17 this, just write a port number into the "[node]web.port" line of your node's
    18 tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"
    19 section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port
    20 3456.
    21 
    22 This string is actually a Twisted "strports" specification, meaning you can
    23 get more control over the interface to which the server binds by supplying
    24 additional arguments. For more details, see the documentation on
    25 twisted.application.strports:
    26 http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
    27 
    28 Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same
    29 but binds to the loopback interface, ensuring that only the programs on the
    30 local host can connect. Using
    31 "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
    32 
    33 This webport can be set when the node is created by passing a --webport
    34 option to the 'tahoe create-node' command. By default, the node listens on
    35 port 3456, on the loopback (127.0.0.1) interface.
    36 
    37 == Basic Concepts ==
    38 
    39 As described in architecture.txt, each file and directory in a Tahoe virtual
    40 filesystem is referenced by an identifier that combines the designation of
    41 the object with the authority to do something with it (such as read or modify
    42 the contents). This identifier is called a "read-cap" or "write-cap",
    43 depending upon whether it enables read-only or read-write access. These
    44 "caps" are also referred to as URIs.
    45 
    46 The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
    47 "REpresentational State Transfer": the original scheme by which the World
    48 Wide Web was intended to work. Each object (file or directory) is referenced
    49 by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
    50 DELETE) are used to manipulate these objects. You can think of the URL as a
    51 noun, and the method as a verb.
    52 
    53 In REST, the GET method is used to retrieve information about an object, or
    54 to retrieve some representation of the object itself. When the object is a
    55 file, the basic GET method will simply return the contents of that file.
    56 Other variations (generally implemented by adding query parameters to the
    57 URL) will return information about the object, such as metadata. GET
    58 operations are required to have no side-effects.
    59 
    60 PUT is used to upload new objects into the filesystem, or to replace an
    61 existing object. DELETE it used to delete objects from the filesystem. Both
    62 PUT and DELETE are required to be idempotent: performing the same operation
    63 multiple times must have the same side-effects as only performing it once.
    64 
    65 POST is used for more complicated actions that cannot be expressed as a GET,
    66 PUT, or DELETE. POST operations can be thought of as a method call: sending
    67 some message to the object referenced by the URL. In Tahoe, POST is also used
    68 for operations that must be triggered by an HTML form (including upload and
    69 delete), because otherwise a regular web browser has no way to accomplish
    70 these tasks. In general, everything that can be done with a PUT or DELETE can
    71 also be done with a POST.
    72 
    73 Tahoe's web API is designed for two different kinds of consumer. The first is
    74 a program that needs to manipulate the virtual file system. Such programs are
    75 expected to use the RESTful interface described above. The second is a human
    76 using a standard web browser to work with the filesystem. This user is given
    77 a series of HTML pages with links to download files, and forms that use POST
    78 actions to upload, rename, and delete files.
    79 
    80 When an error occurs, the HTTP response code will be set to an appropriate
    81 400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
    82 when the parameters to a webapi operation are invalid), and the HTTP response
    83 body will usually contain a few lines of explanation as to the cause of the
    84 error and possible responses. Unusual exceptions may result in a
    85 500 Internal Server Error as a catch-all, with a default response body containing
    86 a Nevow-generated HTML-ized representation of the Python exception stack trace
    87 that caused the problem. CLI programs which want to copy the response body to
    88 stderr should provide an "Accept: text/plain" header to their requests to get
    89 a plain text stack trace instead. If the Accept header contains */*, or
    90 text/*, or text/html (or if there is no Accept header), HTML tracebacks will
    91 be generated.
    92 
    93 == URLs ==
    94 
    95 Tahoe uses a variety of read- and write- caps to identify files and
    96 directories. The most common of these is the "immutable file read-cap", which
    97 is used for most uploaded files. These read-caps look like the following:
    98 
    99  URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
    100 
    101 The next most common is a "directory write-cap", which provides both read and
    102 write access to a directory, and look like this:
    103 
    104  URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
    105 
    106 There are also "directory read-caps", which start with "URI:DIR2-RO:", and
    107 give read-only access to a directory. Finally there are also mutable file
    108 read- and write- caps, which start with "URI:SSK", and give access to mutable
    109 files.
    110 
    111 (Later versions of Tahoe will make these strings shorter, and will remove the
    112 unfortunate colons, which must be escaped when these caps are embedded in
    113 URLs.)
    114 
    115 To refer to any Tahoe object through the web API, you simply need to combine
    116 a prefix (which indicates the HTTP server to use) with the cap (which
    117 indicates which object inside that server to access). Since the default Tahoe
    118 webport is 3456, the most common prefix is one that will use a local node
    119 listening on this port:
    120 
    121  http://127.0.0.1:3456/uri/ + $CAP
    122 
    123 So, to access the directory named above (which happens to be the
    124 publically-writeable sample directory on the Tahoe test grid, described at
    125 http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
    126 
    127  http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
    128 
    129 (note that the colons in the directory-cap are url-encoded into "%3A"
    130 sequences).
    131 
    132 Likewise, to access the file named above, use:
    133 
    134  http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
    135 
    136 In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
    137 or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
    138 that refers to a file (whether mutable or immutable). So those URLs above can
    139 be abbreviated as:
    140 
    141  http://127.0.0.1:3456/uri/$DIRCAP/
    142  http://127.0.0.1:3456/uri/$FILECAP
    143 
    144 The operation summaries below will abbreviate these further, by eliding the
    145 server prefix. They will be displayed like this:
    146 
    147  /uri/$DIRCAP/
    148  /uri/$FILECAP
    149 
    150 
    151 === Child Lookup ===
    152 
    153 Tahoe directories contain named child entries, just like directories in a regular
    154 local filesystem. These child entries, called "dirnodes", consist of a name,
    155 metadata, a write slot, and a read slot. The write and read slots normally contain
    156 a write-cap and read-cap referring to the same object, which can be either a file
    157 or a subdirectory. The write slot may be empty (actually, both may be empty,
    158 but that is unusual).
    159 
    160 If you have a Tahoe URL that refers to a directory, and want to reference a
    161 named child inside it, just append the child name to the URL. For example, if
    162 our sample directory contains a file named "welcome.txt", we can refer to
    163 that file with:
    164 
    165  http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt
    166 
    167 (or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
    168 
    169 Multiple levels of subdirectories can be handled this way:
    170 
    171  http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt
    172 
    173 In this document, when we need to refer to a URL that references a file using
    174 this child-of-some-directory format, we'll use the following string:
    175 
    176  /uri/$DIRCAP/[SUBDIRS../]FILENAME
    177 
    178 The "[SUBDIRS../]" part means that there are zero or more (optional)
    179 subdirectory names in the middle of the URL. The "FILENAME" at the end means
    180 that this whole URL refers to a file of some sort, rather than to a
    181 directory.
    182 
    183 When we need to refer specifically to a directory in this way, we'll write:
    184 
    185  /uri/$DIRCAP/[SUBDIRS../]SUBDIR
    186 
    187 
    188 Note that all components of pathnames in URLs are required to be UTF-8
    189 encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
    190 with:
    191 
    192  http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
    193 
    194 Also note that the filenames inside upload POST forms are interpreted using
    195 whatever character set was provided in the conventional '_charset' field, and
    196 defaults to UTF-8 if not otherwise specified. The JSON representation of each
    197 directory contains native unicode strings. Tahoe directories are specified to
    198 contain unicode filenames, and cannot contain binary strings that are not
    199 representable as such.
    200 
    201 All Tahoe operations that refer to existing files or directories must include
    202 a suitable read- or write- cap in the URL: the webapi server won't add one
    203 for you. If you don't know the cap, you can't access the file. This allows
    204 the security properties of Tahoe caps to be extended across the webapi
    205 interface.
    206 
    207 == Slow Operations, Progress, and Cancelling ==
    208 
    209 Certain operations can be expected to take a long time. The "t=deep-check",
    210 described below, will recursively visit every file and directory reachable
    211 from a given starting point, which can take minutes or even hours for
    212 extremely large directory structures. A single long-running HTTP request is a
    213 fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
    214 with waiting and give up on the connection.
    215 
    216 For this reason, long-running operations have an "operation handle", which
    217 can be used to poll for status/progress messages while the operation
    218 proceeds. This handle can also be used to cancel the operation. These handles
    219 are created by the client, and passed in as a an "ophandle=" query argument
    220 to the POST or PUT request which starts the operation. The following
    221 operations can then be used to retrieve status:
    222 
    223 GET /operations/$HANDLE?output=HTML   (with or without t=status)
    224 GET /operations/$HANDLE?output=JSON   (same)
    225 
    226  These two retrieve the current status of the given operation. Each operation
    227  presents a different sort of information, but in general the page retrieved
    228  will indicate:
    229 
    230   * whether the operation is complete, or if it is still running
    231   * how much of the operation is complete, and how much is left, if possible
    232 
    233  Note that the final status output can be quite large: a deep-manifest of a
    234  directory structure with 300k directories and 200k unique files is about
    235  275MB of JSON, and might take two minutes to generate. For this reason, the
    236  full status is not provided until the operation has completed.
    237 
    238  The HTML form will include a meta-refresh tag, which will cause a regular
    239  web browser to reload the status page about 60 seconds later. This tag will
    240  be removed once the operation has completed.
    241 
    242  There may be more status information available under
    243  /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
    244 
    245 POST /operations/$HANDLE?t=cancel
    246 
    247  This terminates the operation, and returns an HTML page explaining what was
    248  cancelled. If the operation handle has already expired (see below), this
    249  POST will return a 404, which indicates that the operation is no longer
    250  running (either it was completed or terminated). The response body will be
    251  the same as a GET /operations/$HANDLE on this operation handle, and the
    252  handle will be expired immediately afterwards.
    253 
    254 The operation handle will eventually expire, to avoid consuming an unbounded
    255 amount of memory. The handle's time-to-live can be reset at any time, by
    256 passing a retain-for= argument (with a count of seconds) to either the
    257 initial POST that starts the operation, or the subsequent GET request which
    258 asks about the operation. For example, if a 'GET
    259 /operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
    260 handle will remain active for 600 seconds (10 minutes) after the GET was
    261 received.
    262 
    263 In addition, if the GET includes a release-after-complete=True argument, and
    264 the operation has completed, the operation handle will be released
    265 immediately.
    266 
    267 If a retain-for= argument is not used, the default handle lifetimes are:
    268 
    269  * handles will remain valid at least until their operation finishes
    270  * uncollected handles for finished operations (i.e. handles for
    271    operations that have finished but for which the GET page has not been
    272    accessed since completion) will remain valid for four days, or for
    273    the total time consumed by the operation, whichever is greater.
    274  * collected handles (i.e. the GET page has been retrieved at least once
    275    since the operation completed) will remain valid for one day.
    276 
    277 Many "slow" operations can begin to use unacceptable amounts of memory when
    278 operating on large directory structures. The memory usage increases when the
    279 ophandle is polled, as the results must be copied into a JSON string, sent
    280 over the wire, then parsed by a client. So, as an alternative, many "slow"
    281 operations have streaming equivalents. These equivalents do not use operation
    282 handles. Instead, they emit line-oriented status results immediately. Client
    283 code can cancel the operation by simply closing the HTTP connection.
    284 
    285 == Programmatic Operations ==
    286 
    287 Now that we know how to build URLs that refer to files and directories in a
    288 Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
    289 This section contains a catalog of GET, PUT, DELETE, and POST operations that
    290 can be performed on these URLs. This set of operations are aimed at programs
    291 that use HTTP to communicate with a Tahoe node. A later section describes
    292 operations that are intended for web browsers.
    293 
    294 === Reading A File ===
    295 
    296 GET /uri/$FILECAP
    297 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
    298 
    299  This will retrieve the contents of the given file. The HTTP response body
    300  will contain the sequence of bytes that make up the file.
    301 
    302  To view files in a web browser, you may want more control over the
    303  Content-Type and Content-Disposition headers. Please see the next section
    304  "Browser Operations", for details on how to modify these URLs for that
    305  purpose.
    306 
    307 === Writing/Uploading A File ===
    308 
    309 PUT /uri/$FILECAP
    310 PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
    311 
    312  Upload a file, using the data from the HTTP request body, and add whatever
    313  child links and subdirectories are necessary to make the file available at
    314  the given location. Once this operation succeeds, a GET on the same URL will
    315  retrieve the same contents that were just uploaded. This will create any
    316  necessary intermediate subdirectories.
    317 
    318  To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
    319 
    320  In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
    321  writeable mutable file, that file's contents will be overwritten in-place. If
    322  it is a read-cap for a mutable file, an error will occur. If it is an
    323  immutable file, the old file will be discarded, and a new one will be put in
    324  its place.
    325 
    326  When creating a new file, if "mutable=true" is in the query arguments, the
    327  operation will create a mutable file instead of an immutable one.
    328 
    329  This returns the file-cap of the resulting file. If a new file was created
    330  by this method, the HTTP response code (as dictated by rfc2616) will be set
    331  to 201 CREATED. If an existing file was replaced or modified, the response
    332  code will be 200 OK.
    333 
    334  Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'
    335  command can be used to invoke this operation.
    336 
    337 PUT /uri
    338 
    339  This uploads a file, and produces a file-cap for the contents, but does not
    340  attach the file into the filesystem. No directories will be modified by
    341  this operation. The file-cap is returned as the body of the HTTP response.
    342 
    343  If "mutable=true" is in the query arguments, the operation will create a
    344  mutable file, and return its write-cap in the HTTP respose. The default is
    345  to create an immutable file, returning the read-cap as a response.
    346 
    347 === Creating A New Directory ===
    348 
    349 POST /uri?t=mkdir
    350 PUT /uri?t=mkdir
    351 
    352  Create a new empty directory and return its write-cap as the HTTP response
    353  body. This does not make the newly created directory visible from the
    354  filesystem. The "PUT" operation is provided for backwards compatibility:
    355  new code should use POST.
    356 
    357 POST /uri?t=mkdir-with-children
    358 
    359  Create a new directory, populated with a set of child nodes, and return its
    360  write-cap as the HTTP response body. The new directory is not attached to
    361  any other directory: the returned write-cap is the only reference to it.
    362 
    363  Initial children are provided as the body of the POST form (this is more
    364  efficient than doing separate mkdir and set_children operations). If the
    365  body is empty, the new directory will be empty. If not empty, the body will
    366  be interpreted as a UTF-8 JSON-encoded dictionary of children with which the
    367  new directory should be populated, using the same format as would be
    368  returned in the 'children' value of the t=json GET request, described below.
    369  Each dictionary key should be a child name, and each value should be a list
    370  of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and
    371  "metadata" keys (all others are ignored). For example, the PUT request body
    372  could be:
    373 
    374   {
    375     "Fran\u00e7ais": [ "filenode", {
    376         "ro_uri": "URI:CHK:...",
    377         "size": bytes,
    378         "metadata": {
    379           "ctime": 1202777696.7564139,
    380           "mtime": 1202777696.7564139,
    381           "tahoe": {
    382             "linkcrtime": 1202777696.7564139,
    383             "linkmotime": 1202777696.7564139
    384             } } } ],
    385     "subdir":  [ "dirnode", {
    386         "rw_uri": "URI:DIR2:...",
    387         "ro_uri": "URI:DIR2-RO:...",
    388         "metadata": {
    389           "ctime": 1202778102.7589991,
    390           "mtime": 1202778111.2160511,
    391           "tahoe": {
    392             "linkcrtime": 1202777696.7564139,
    393             "linkmotime": 1202777696.7564139
    394           } } } ]
    395   }
    396 
    397  For forward-compatibility, a mutable directory can also contain caps in
    398  a format that is unknown to the webapi server. When such caps are retrieved
    399  from a mutable directory in a "ro_uri" field, they will be prefixed with
    400  the string "ro.", indicating that they must not be decoded without
    401  checking that they are read-only. The "ro." prefix must not be stripped
    402  off without performing this check. (Future versions of the webapi server
    403  will perform it where necessary.)
    404 
    405  If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
    406  and the webapi server recognizes the rw_uri as a write cap, then it will
    407  reset the ro_uri to the corresponding read cap and discard the original
    408  contents of ro_uri (in order to ensure that the two caps correspond to the
    409  same object and that the ro_uri is in fact read-only). However this may not
    410  happen for caps in a format unknown to the webapi server. Therefore, when
    411  writing a directory the webapi client should ensure that the contents
    412  of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
    413  (write cap, read cap) pair if possible. If the webapi client only has
    414  one cap and does not know whether it is a write cap or read cap, then
    415  it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
    416  client must not put a write cap into a "ro_uri" field.
    417 
    418  The metadata may have a "no-write" field. If this is set to true in the
    419  metadata of a link, it will not be possible to open that link for writing
    420  via the SFTP frontend; see docs/frontends/FTP-and-SFTP.txt for details.
    421  Also, if the "no-write" field is set to true in the metadata of a link to
    422  a mutable child, it will cause the link to be diminished to read-only.
    423 
    424  Note that the webapi-using client application must not provide the
    425  "Content-Type: multipart/form-data" header that usually accompanies HTML
    426  form submissions, since the body is not formatted this way. Doing so will
    427  cause a server error as the lower-level code misparses the request body.
    428 
    429  Child file names should each be expressed as a unicode string, then used as
    430  keys of the dictionary. The dictionary should then be converted into JSON,
    431  and the resulting string encoded into UTF-8. This UTF-8 bytestring should
    432  then be used as the POST body.
    433 
    434 POST /uri?t=mkdir-immutable
    435 
    436  Like t=mkdir-with-children above, but the new directory will be
    437  deep-immutable. This means that the directory itself is immutable, and that
    438  it can only contain objects that are treated as being deep-immutable, like
    439  immutable files, literal files, and deep-immutable directories.
    440 
    441  For forward-compatibility, a deep-immutable directory can also contain caps
    442  in a format that is unknown to the webapi server. When such caps are retrieved
    443  from a deep-immutable directory in a "ro_uri" field, they will be prefixed
    444  with the string "imm.", indicating that they must not be decoded without
    445  checking that they are immutable. The "imm." prefix must not be stripped
    446  off without performing this check. (Future versions of the webapi server
    447  will perform it where necessary.)
    448  
    449  The cap for each child may be given either in the "rw_uri" or "ro_uri"
    450  field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
    451  then the webapi server will check that it is an immutable read-cap of a
    452  *known* format, and give an error if it is not. If a cap is given in the
    453  "ro_uri" field, then the webapi server will still check whether known
    454  caps are immutable, but for unknown caps it will simply assume that the
    455  cap can be stored, as described above. Note that an attacker would be
    456  able to store any cap in an immutable directory, so this check when
    457  creating the directory is only to help non-malicious clients to avoid
    458  accidentally giving away more authority than intended.
    459 
    460  A non-empty request body is mandatory, since after the directory is created,
    461  it will not be possible to add more children to it.
    462 
    463 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
    464 PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
    465 
    466  Create new directories as necessary to make sure that the named target
    467  ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
    468  intermediate mutable directories as necessary. If the named target directory
    469  already exists, this will make no changes to it.
    470 
    471  If the final directory is created, it will be empty.
    472 
    473  This operation will return an error if a blocking file is present at any of
    474  the parent names, preventing the server from creating the necessary parent
    475  directory; or if it would require changing an immutable directory.
    476 
    477  The write-cap of the new directory will be returned as the HTTP response
    478  body.
    479 
    480 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children
    481 
    482  Like /uri?t=mkdir-with-children, but the final directory is created as a
    483  child of an existing mutable directory. This will create additional
    484  intermediate mutable directories as necessary. If the final directory is
    485  created, it will be populated with initial children from the POST request
    486  body, as described above.
    487  
    488  This operation will return an error if a blocking file is present at any of
    489  the parent names, preventing the server from creating the necessary parent
    490  directory; or if it would require changing an immutable directory; or if
    491  the immediate parent directory already has a a child named SUBDIR.
    492 
    493 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable
    494 
    495  Like /uri?t=mkdir-immutable, but the final directory is created as a child
    496  of an existing mutable directory. The final directory will be deep-immutable,
    497  and will be populated with the children specified as a JSON dictionary in
    498  the POST request body.
    499 
    500  In Tahoe 1.6 this operation creates intermediate mutable directories if
    501  necessary, but that behaviour should not be relied on; see ticket #920.
    502 
    503  This operation will return an error if the parent directory is immutable,
    504  or already has a child named SUBDIR.
    505 
    506 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
    507 
    508  Create a new empty mutable directory and attach it to the given existing
    509  directory. This will create additional intermediate directories as necessary.
    510 
    511  This operation will return an error if a blocking file is present at any of
    512  the parent names, preventing the server from creating the necessary parent
    513  directory, or if it would require changing any immutable directory.
    514 
    515  The URL of this operation points to the parent of the bottommost new directory,
    516  whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL
    517  that points directly to the bottommost new directory.
    518 
    519 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME
    520 
    521  Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will
    522  be populated with initial children via the POST request body. This command
    523  will create additional intermediate mutable directories as necessary.
    524  
    525  This operation will return an error if a blocking file is present at any of
    526  the parent names, preventing the server from creating the necessary parent
    527  directory; or if it would require changing an immutable directory; or if
    528  the immediate parent directory already has a a child named NAME.
    529 
    530  Note that the name= argument must be passed as a queryarg, because the POST
    531  request body is used for the initial children JSON.
    532 
    533 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME
    534 
    535  Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the
    536  final directory will be deep-immutable. The children are specified as a
    537  JSON dictionary in the POST request body. Again, the name= argument must be
    538  passed as a queryarg.
    539 
    540  In Tahoe 1.6 this operation creates intermediate mutable directories if
    541  necessary, but that behaviour should not be relied on; see ticket #920.
    542 
    543  This operation will return an error if the parent directory is immutable,
    544  or already has a child named NAME.
    545 
    546 === Get Information About A File Or Directory (as JSON) ===
    547 
    548 GET /uri/$FILECAP?t=json
    549 GET /uri/$DIRCAP?t=json
    550 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
    551 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
    552 
    553   This returns a machine-parseable JSON-encoded description of the given
    554   object. The JSON always contains a list, and the first element of the list is
    555   always a flag that indicates whether the referenced object is a file or a
    556   directory. If it is a capability to a file, then the information includes
    557   file size and URI, like this:
    558 
    559    GET /uri/$FILECAP?t=json :
    560 
    561     [ "filenode", {
    562       "ro_uri": file_uri,
    563       "verify_uri": verify_uri,
    564       "size": bytes,
    565       "mutable": false
    566       } ]
    567 
    568   If it is a capability to a directory followed by a path from that directory
    569   to a file, then the information also includes metadata from the link to the
    570   file in the parent directory, like this:
    571 
    572    GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
    573 
    574     [ "filenode", {
    575       "ro_uri": file_uri,
    576       "verify_uri": verify_uri,
    577       "size": bytes,
    578       "mutable": false,
    579       "metadata": {
    580         "ctime": 1202777696.7564139,
    581         "mtime": 1202777696.7564139,
    582         "tahoe": {
    583           "linkcrtime": 1202777696.7564139,
    584           "linkmotime": 1202777696.7564139
    585           } } } ]
    586 
    587   If it is a directory, then it includes information about the children of
    588   this directory, as a mapping from child name to a set of data about the
    589   child (the same data that would appear in a corresponding GET?t=json of the
    590   child itself). The child entries also include metadata about each child,
    591   including link-creation- and link-change- timestamps. The output looks like
    592   this:
    593 
    594    GET /uri/$DIRCAP?t=json :
    595    GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
    596 
    597     [ "dirnode", {
    598       "rw_uri": read_write_uri,
    599       "ro_uri": read_only_uri,
    600       "verify_uri": verify_uri,
    601       "mutable": true,
    602       "children": {
    603         "foo.txt": [ "filenode", {
    604             "ro_uri": uri,
    605             "size": bytes,
    606             "metadata": {
    607               "ctime": 1202777696.7564139,
    608               "mtime": 1202777696.7564139,
    609               "tahoe": {
    610                 "linkcrtime": 1202777696.7564139,
    611                 "linkmotime": 1202777696.7564139
    612                 } } } ],
    613         "subdir":  [ "dirnode", {
    614             "rw_uri": rwuri,
    615             "ro_uri": rouri,
    616             "metadata": {
    617               "ctime": 1202778102.7589991,
    618               "mtime": 1202778111.2160511,
    619               "tahoe": {
    620                 "linkcrtime": 1202777696.7564139,
    621                 "linkmotime": 1202777696.7564139
    622               } } } ]
    623       } } ]
    624 
    625   In the above example, note how 'children' is a dictionary in which the keys
    626   are child names and the values depend upon whether the child is a file or a
    627   directory. The value is mostly the same as the JSON representation of the
    628   child object (except that directories do not recurse -- the "children"
    629   entry of the child is omitted, and the directory view includes the metadata
    630   that is stored on the directory edge).
    631 
    632   The rw_uri field will be present in the information about a directory
    633   if and only if you have read-write access to that directory. The verify_uri
    634   field will be present if and only if the object has a verify-cap
    635   (non-distributed LIT files do not have verify-caps).
    636  
    637   If the cap is of an unknown format, then the file size and verify_uri will
    638   not be available:
    639 
    640    GET /uri/$UNKNOWNCAP?t=json :
    641 
    642     [ "unknown", {
    643       "ro_uri": unknown_read_uri
    644       } ]
    645 
    646    GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json :
    647 
    648     [ "unknown", {
    649       "rw_uri": unknown_write_uri,
    650       "ro_uri": unknown_read_uri,
    651       "mutable": true,
    652       "metadata": {
    653         "ctime": 1202777696.7564139,
    654         "mtime": 1202777696.7564139,
    655         "tahoe": {
    656           "linkcrtime": 1202777696.7564139,
    657           "linkmotime": 1202777696.7564139
    658           } } } ]
    659 
    660   As in the case of file nodes, the metadata will only be present when the
    661   capability is to a directory followed by a path. The "mutable" field is also
    662   not always present; when it is absent, the mutability of the object is not
    663   known.
    664 
    665 ==== About the metadata ====
    666 
    667   The value of the 'tahoe':'linkmotime' key is updated whenever a link to a
    668   child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever
    669   a link to a child is created -- i.e. when there was not previously a link
    670   under that name.
    671 
    672   Note however, that if the edge in the Tahoe filesystem points to a mutable
    673   file and the contents of that mutable file is changed, then the
    674   'tahoe':'linkmotime' value on that edge will *not* be updated, since the
    675   edge itself wasn't updated -- only the mutable file was.
    676 
    677   The timestamps are represented as a number of seconds since the UNIX epoch
    678   (1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long
    679   term.
    680 
    681   In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated
    682   instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
    683   in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
    684   are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
    685   sub-dict to be deleted by webapi requests in which new metadata is
    686   specified, and not to be added to existing child links that lack it.
    687 
    688   From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
    689   populated or updated (see ticket #924), except by "tahoe backup" as
    690   explained below. For backward compatibility, when an existing link is
    691   updated and 'tahoe':'linkcrtime' is not present in the previous metadata
    692   but 'ctime' is, the old value of 'ctime' is used as the new value of
    693   'tahoe':'linkcrtime'.
    694 
    695   The reason we added the new fields in Tahoe v1.4.0 is that there is a
    696   "set_children" API (described below) which you can use to overwrite the
    697   values of the 'mtime'/'ctime' pair, and this API is used by the
    698   "tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and
    699   'ctime' values when backing up files from a local filesystem into the
    700   Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used
    701   to set anything under the 'tahoe' key of the metadata dict -- if you
    702   include 'tahoe' keys in your 'metadata' arguments then it will silently
    703   ignore those keys.
    704 
    705   Therefore, if the 'tahoe' sub-dict is present, you can rely on the
    706   'linkcrtime' and 'linkmotime' values therein to have the semantics described
    707   above. (This is assuming that only official Tahoe clients have been used to
    708   write those links, and that their system clocks were set to what you expected
    709   -- there is nothing preventing someone from editing their Tahoe client or
    710   writing their own Tahoe client which would overwrite those values however
    711   they like, and there is nothing to constrain their system clock from taking
    712   any value.)
    713 
    714   When an edge is created or updated by "tahoe backup", the 'mtime' and
    715   'ctime' keys on that edge are set as follows:
    716 
    717     * 'mtime' is set to the timestamp read from the local filesystem for the
    718       "mtime" of the local file in question, which means the last time the
    719       contents of that file were changed.
    720 
    721     * On Windows, 'ctime' is set to the creation timestamp for the file
    722       read from the local filesystem. On other platforms, 'ctime' is set to
    723       the UNIX "ctime" of the local file, which means the last time that
    724       either the contents or the metadata of the local file was changed.
    725 
    726   There are several ways that the 'ctime' field could be confusing:
    727 
    728   1. You might be confused about whether it reflects the time of the creation
    729      of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a
    730      timestamp copied in by "tahoe backup" from a local filesystem.
    731 
    732   2. You might be confused about whether it is a copy of the file creation
    733      time (if "tahoe backup" was run on a Windows system) or of the last
    734      contents-or-metadata change (if "tahoe backup" was run on a different
    735      operating system).
    736 
    737   3. You might be confused by the fact that changing the contents of a
    738      mutable file in Tahoe doesn't have any effect on any links pointing at
    739      that file in any directories, although "tahoe backup" sets the link
    740      'ctime'/'mtime' to reflect timestamps about the local file corresponding
    741      to the Tahoe file to which the link points.
    742 
    743   4. Also, quite apart from Tahoe, you might be confused about the meaning
    744      of the "ctime" in UNIX local filesystems, which people sometimes think
    745      means file creation time, but which actually means, in UNIX local
    746      filesystems, the most recent time that the file contents or the file
    747      metadata (such as owner, permission bits, extended attributes, etc.)
    748      has changed. Note that although "ctime" does not mean file creation time
    749      in UNIX, links created by a version of Tahoe prior to v1.7.0, and never
    750      written by "tahoe backup", will have 'ctime' set to the link creation
    751      time.
    752 
    753 
    754 === Attaching an existing File or Directory by its read- or write- cap ===
    755 
    756 PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
    757 
    758   This attaches a child object (either a file or directory) to a specified
    759   location in the virtual filesystem. The child object is referenced by its
    760   read- or write- cap, as provided in the HTTP request body. This will create
    761   intermediate directories as necessary.
    762 
    763   This is similar to a UNIX hardlink: by referencing a previously-uploaded file
    764   (or previously-created directory) instead of uploading/creating a new one,
    765   you can create two references to the same object.
    766 
    767   The read- or write- cap of the child is provided in the body of the HTTP
    768   request, and this same cap is returned in the response body.
    769 
    770   The default behavior is to overwrite any existing object at the same
    771   location. To prevent this (and make the operation return an error instead
    772   of overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
    773   With replace=false, this operation will return an HTTP 409 "Conflict" error
    774   if there is already an object at the given location, rather than
    775   overwriting the existing object. To allow the operation to overwrite a
    776   file, but return an error when trying to overwrite a directory, use
    777   "replace=only-files" (this behavior is closer to the traditional UNIX "mv"
    778   command). Note that "true", "t", and "1" are all synonyms for "True", and
    779   "false", "f", and "0" are synonyms for "False", and the parameter is
    780   case-insensitive.
    781  
    782   Note that this operation does not take its child cap in the form of
    783   separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
    784   child cap in a format unknown to the webapi server, unless its URI
    785   starts with "ro." or "imm.". This restriction is necessary because the
    786   server is not able to attenuate an unknown write cap to a read cap.
    787   Unknown URIs starting with "ro." or "imm.", on the other hand, are
    788   assumed to represent read caps. The client should not prefix a write
    789   cap with "ro." or "imm." and pass it to this operation, since that
    790   would result in granting the cap's write authority to holders of the
    791   directory read cap.
    792 
    793 === Adding multiple files or directories to a parent directory at once ===
    794 
    795 POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children
    796 POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children    (Tahoe >= v1.6)
    797 
    798   This command adds multiple children to a directory in a single operation.
    799   It reads the request body and interprets it as a JSON-encoded description
    800   of the child names and read/write-caps that should be added.
    801 
    802   The body should be a JSON-encoded dictionary, in the same format as the
    803   "children" value returned by the "GET /uri/$DIRCAP?t=json" operation
    804   described above. In this format, each key is a child names, and the
    805   corresponding value is a tuple of (type, childinfo). "type" is ignored, and
    806   "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and
    807   "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and
    808   use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2
    809   look very much like DIR1 (except for any existing children of DIR2 that
    810   were not overwritten, and any existing "tahoe" metadata keys as described
    811   below).
    812 
    813   When the set_children request contains a child name that already exists in
    814   the target directory, this command defaults to overwriting that child with
    815   the new value (both child cap and metadata, but if the JSON data does not
    816   contain a "metadata" key, the old child's metadata is preserved). The
    817   command takes a boolean "overwrite=" query argument to control this
    818   behavior. If you use "?t=set_children&overwrite=false", then an attempt to
    819   replace an existing child will instead cause an error.
    820 
    821   Any "tahoe" key in the new child's "metadata" value is ignored. Any
    822   existing "tahoe" metadata is preserved. The metadata["tahoe"] value is
    823   reserved for metadata generated by the tahoe node itself. The only two keys
    824   currently placed here are "linkcrtime" and "linkmotime". For details, see
    825   the section above entitled "Get Information About A File Or Directory (as
    826   JSON)", in the "About the metadata" subsection.
    827  
    828   Note that this command was introduced with the name "set_children", which
    829   uses an underscore rather than a hyphen as other multi-word command names
    830   do. The variant with a hyphen is now accepted, but clients that desire
    831   backward compatibility should continue to use "set_children".
    832 
    833 
    834 === Deleting a File or Directory ===
    835 
    836 DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
    837 
    838   This removes the given name from its parent directory. CHILDNAME is the
    839   name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
    840   be modified.
    841 
    842   Note that this does not actually delete the file or directory that the name
    843   points to from the tahoe grid -- it only removes the named reference from
    844   this directory. If there are other names in this directory or in other
    845   directories that point to the resource, then it will remain accessible
    846   through those paths. Even if all names pointing to this object are removed
    847   from their parent directories, then someone with possession of its read-cap
    848   can continue to access the object through that cap.
    849 
    850   The object will only become completely unreachable once 1: there are no
    851   reachable directories that reference it, and 2: nobody is holding a read-
    852   or write- cap to the object. (This behavior is very similar to the way
    853   hardlinks and anonymous files work in traditional UNIX filesystems).
    854 
    855   This operation will not modify more than a single directory. Intermediate
    856   directories which were implicitly created by PUT or POST methods will *not*
    857   be automatically removed by DELETE.
    858 
    859   This method returns the file- or directory- cap of the object that was just
    860   removed.
    861 
    862 == Browser Operations ==
    863 
    864 This section describes the HTTP operations that provide support for humans
    865 running a web browser. Most of these operations use HTML forms that use POST
    866 to drive the Tahoe node. This section is intended for HTML authors who want
    867 to write web pages that contain forms and buttons which manipulate the Tahoe
    868 filesystem.
    869 
    870 Note that for all POST operations, the arguments listed can be provided
    871 either as URL query arguments or as form body fields. URL query arguments are
    872 separated from the main URL by "?", and from each other by "&". For example,
    873 "POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
    874 specified by using <input type="hidden"> elements. For clarity, the
    875 descriptions below display the most significant arguments as URL query args.
    876 
    877 === Viewing A Directory (as HTML) ===
    878 
    879 GET /uri/$DIRCAP/[SUBDIRS../]
    880 
    881  This returns an HTML page, intended to be displayed to a human by a web
    882  browser, which contains HREF links to all files and directories reachable
    883  from this directory. These HREF links do not have a t= argument, meaning
    884  that a human who follows them will get pages also meant for a human. It also
    885  contains forms to upload new files, and to delete files and directories.
    886  Those forms use POST methods to do their job.
    887 
    888 === Viewing/Downloading a File ===
    889 
    890 GET /uri/$FILECAP
    891 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
    892 
    893  This will retrieve the contents of the given file. The HTTP response body
    894  will contain the sequence of bytes that make up the file.
    895 
    896  If you want the HTTP response to include a useful Content-Type header,
    897  either use the second form (which starts with a $DIRCAP), or add a
    898  "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
    899  The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
    900  to determine a Content-Type (since Tahoe immutable files are merely
    901  sequences of bytes, not typed+named file objects).
    902 
    903  If the URL has both filename= and "save=true" in the query arguments, then
    904  the server to add a "Content-Disposition: attachment" header, along with a
    905  filename= parameter. When a user clicks on such a link, most browsers will
    906  offer to let the user save the file instead of displaying it inline (indeed,
    907  most browsers will refuse to display it inline). "true", "t", "1", and other
    908  case-insensitive equivalents are all treated the same.
    909 
    910  Character-set handling in URLs and HTTP headers is a dubious art[1]. For
    911  maximum compatibility, Tahoe simply copies the bytes from the filename=
    912  argument into the Content-Disposition header's filename= parameter, without
    913  trying to interpret them in any particular way.
    914 
    915 
    916 GET /named/$FILECAP/FILENAME
    917 
    918  This is an alternate download form which makes it easier to get the correct
    919  filename. The Tahoe server will provide the contents of the given file, with
    920  a Content-Type header derived from the given filename. This form is used to
    921  get browsers to use the "Save Link As" feature correctly, and also helps
    922  command-line tools like "wget" and "curl" use the right filename. Note that
    923  this form can *only* be used with file caps; it is an error to use a
    924  directory cap after the /named/ prefix.
    925 
    926 === Get Information About A File Or Directory (as HTML) ===
    927 
    928 GET /uri/$FILECAP?t=info
    929 GET /uri/$DIRCAP/?t=info
    930 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info
    931 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info
    932 
    933   This returns a human-oriented HTML page with more detail about the selected
    934   file or directory object. This page contains the following items:
    935 
    936    object size
    937    storage index
    938    JSON representation
    939    raw contents (text/plain)
    940    access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
    941    check/verify/repair form
    942    deep-check/deep-size/deep-stats/manifest (for directories)
    943    replace-conents form (for mutable files)
    944 
    945 === Creating a Directory ===
    946 
    947 POST /uri?t=mkdir
    948 
    949  This creates a new empty directory, but does not attach it to the virtual
    950  filesystem.
    951 
    952  If a "redirect_to_result=true" argument is provided, then the HTTP response
    953  will cause the web browser to be redirected to a /uri/$DIRCAP page that
    954  gives access to the newly-created directory. If you bookmark this page,
    955  you'll be able to get back to the directory again in the future. This is the
    956  recommended way to start working with a Tahoe server: create a new unlinked
    957  directory (using redirect_to_result=true), then bookmark the resulting
    958  /uri/$DIRCAP page. There is a "create directory" button on the Welcome page
    959  to invoke this action.
    960 
    961  If "redirect_to_result=true" is not provided (or is given a value of
    962  "false"), then the HTTP response body will simply be the write-cap of the
    963  new directory.
    964 
    965 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
    966 
    967  This creates a new empty directory as a child of the designated SUBDIR. This
    968  will create additional intermediate directories as necessary.
    969 
    970  If a "when_done=URL" argument is provided, the HTTP response will cause the
    971  web browser to redirect to the given URL. This provides a convenient way to
    972  return the browser to the directory that was just modified. Without a
    973  when_done= argument, the HTTP response will simply contain the write-cap of
    974  the directory that was just created.
    975 
    976 
    977 === Uploading a File ===
    978 
    979 POST /uri?t=upload
    980 
    981  This uploads a file, and produces a file-cap for the contents, but does not
    982  attach the file into the filesystem. No directories will be modified by
    983  this operation.
    984 
    985  The file must be provided as the "file" field of an HTML encoded form body,
    986  produced in response to an HTML form like this:
    987   <form action="/uri" method="POST" enctype="multipart/form-data">
    988    <input type="hidden" name="t" value="upload" />
    989    <input type="file" name="file" />
    990    <input type="submit" value="Upload Unlinked" />
    991   </form>
    992 
    993  If a "when_done=URL" argument is provided, the response body will cause the
    994  browser to redirect to the given URL. If the when_done= URL has the string
    995  "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
    996  newly created file-cap. (Note that without this substitution, there is no
    997  way to access the file that was just uploaded).
    998 
    999  The default (in the absence of when_done=) is to return an HTML page that
    1000  describes the results of the upload. This page will contain information
    1001  about which storage servers were used for the upload, how long each
    1002  operation took, etc.
    1003 
    1004  If a "mutable=true" argument is provided, the operation will create a
    1005  mutable file, and the response body will contain the write-cap instead of
    1006  the upload results page. The default is to create an immutable file,
    1007  returning the upload results page as a response.
    1008 
    1009 
    1010 POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
    1011 
    1012  This uploads a file, and attaches it as a new child of the given directory,
    1013  which must be mutable. The file must be provided as the "file" field of an
    1014  HTML-encoded form body, produced in response to an HTML form like this:
    1015   <form action="." method="POST" enctype="multipart/form-data">
    1016    <input type="hidden" name="t" value="upload" />
    1017    <input type="file" name="file" />
    1018    <input type="submit" value="Upload" />
    1019   </form>
    1020 
    1021  A "name=" argument can be provided to specify the new child's name,
    1022  otherwise it will be taken from the "filename" field of the upload form
    1023  (most web browsers will copy the last component of the original file's
    1024  pathname into this field). To avoid confusion, name= is not allowed to
    1025  contain a slash.
    1026 
    1027  If there is already a child with that name, and it is a mutable file, then
    1028  its contents are replaced with the data being uploaded. If it is not a
    1029  mutable file, the default behavior is to remove the existing child before
    1030  creating a new one. To prevent this (and make the operation return an error
    1031  instead of overwriting the old child), add a "replace=false" argument, as
    1032  "?t=upload&replace=false". With replace=false, this operation will return an
    1033  HTTP 409 "Conflict" error if there is already an object at the given
    1034  location, rather than overwriting the existing object. Note that "true",
    1035  "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
    1036  synonyms for "False". the parameter is case-insensitive.
    1037 
    1038  This will create additional intermediate directories as necessary, although
    1039  since it is expected to be triggered by a form that was retrieved by "GET
    1040  /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
    1041  already exist.
    1042 
    1043  If a "mutable=true" argument is provided, any new file that is created will
    1044  be a mutable file instead of an immutable one. <input type="checkbox"
    1045  name="mutable" /> will give the user a way to set this option.
    1046 
    1047  If a "when_done=URL" argument is provided, the HTTP response will cause the
    1048  web browser to redirect to the given URL. This provides a convenient way to
    1049  return the browser to the directory that was just modified. Without a
    1050  when_done= argument, the HTTP response will simply contain the file-cap of
    1051  the file that was just uploaded (a write-cap for mutable files, or a
    1052  read-cap for immutable files).
    1053 
    1054 POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
    1055 
    1056  This also uploads a file and attaches it as a new child of the given
    1057  directory, which must be mutable. It is a slight variant of the previous
    1058  operation, as the URL refers to the target file rather than the parent
    1059  directory. It is otherwise identical: this accepts mutable= and when_done=
    1060  arguments too.
    1061 
    1062 POST /uri/$FILECAP?t=upload
    1063 
    1064  This modifies the contents of an existing mutable file in-place. An error is
    1065  signalled if $FILECAP does not refer to a mutable file. It behaves just like
    1066  the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms
    1067  in a web browser.
    1068 
    1069 === Attaching An Existing File Or Directory (by URI) ===
    1070 
    1071 POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
    1072 
    1073  This attaches a given read- or write- cap "CHILDCAP" to the designated
    1074  directory, with a specified child name. This behaves much like the PUT t=uri
    1075  operation, and is a lot like a UNIX hardlink. It is subject to the same
    1076  restrictions as that operation on the use of cap formats unknown to the
    1077  webapi server.
    1078 
    1079  This will create additional intermediate directories as necessary, although
    1080  since it is expected to be triggered by a form that was retrieved by "GET
    1081  /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
    1082  already exist.
    1083 
    1084  This accepts the same replace= argument as POST t=upload.
    1085 
    1086 === Deleting A Child ===
    1087 
    1088 POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
    1089 
    1090  This instructs the node to remove a child object (file or subdirectory) from
    1091  the given directory, which must be mutable. Note that the entire subtree is
    1092  unlinked from the parent. Unlike deleting a subdirectory in a UNIX local
    1093  filesystem, the subtree need not be empty; if it isn't, then other references
    1094  into the subtree will see that the child subdirectories are not modified by
    1095  this operation. Only the link from the given directory to its child is severed.
    1096 
    1097 === Renaming A Child ===
    1098 
    1099 POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
    1100 
    1101  This instructs the node to rename a child of the given directory, which must
    1102  be mutable. This has a similar effect to removing the child, then adding the
    1103  same child-cap under the new name, except that it preserves metadata. This
    1104  operation cannot move the child to a different directory.
    1105 
    1106  This operation will replace any existing child of the new name, making it
    1107  behave like the UNIX "mv -f" command.
    1108 
    1109 === Other Utilities ===
    1110 
    1111 GET /uri?uri=$CAP
    1112 
    1113   This causes a redirect to /uri/$CAP, and retains any additional query
    1114   arguments (like filename= or save=). This is for the convenience of web
    1115   forms which allow the user to paste in a read- or write- cap (obtained
    1116   through some out-of-band channel, like IM or email).
    1117 
    1118   Note that this form merely redirects to the specific file or directory
    1119   indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
    1120   traverse to children by appending additional path segments to the URL.
    1121 
    1122 GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
    1123 
    1124   This provides a useful facility to browser-based user interfaces. It
    1125   returns a page containing a form targetting the "POST $DIRCAP t=rename"
    1126   functionality described above, with the provided $CHILDNAME present in the
    1127   'from_name' field of that form. I.e. this presents a form offering to
    1128   rename $CHILDNAME, requesting the new name, and submitting POST rename.
    1129 
    1130 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
    1131 
    1132  This returns the file- or directory- cap for the specified object.
    1133 
    1134 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
    1135 
    1136  This returns a read-only file- or directory- cap for the specified object.
    1137  If the object is an immutable file, this will return the same value as
    1138  t=uri.
    1139 
    1140 === Debugging and Testing Features ===
    1141 
    1142 These URLs are less-likely to be helpful to the casual Tahoe user, and are
    1143 mainly intended for developers.
    1144 
    1145 POST $URL?t=check
    1146 
    1147   This triggers the FileChecker to determine the current "health" of the
    1148   given file or directory, by counting how many shares are available. The
    1149   page that is returned will display the results. This can be used as a "show
    1150   me detailed information about this file" page.
    1151 
    1152   If a verify=true argument is provided, the node will perform a more
    1153   intensive check, downloading and verifying every single bit of every share.
    1154 
    1155   If an add-lease=true argument is provided, the node will also add (or
    1156   renew) a lease to every share it encounters. Each lease will keep the share
    1157   alive for a certain period of time (one month by default). Once the last
    1158   lease expires or is explicitly cancelled, the storage server is allowed to
    1159   delete the share.
    1160 
    1161   If an output=JSON argument is provided, the response will be
    1162   machine-readable JSON instead of human-oriented HTML. The data is a
    1163   dictionary with the following keys:
    1164 
    1165    storage-index: a base32-encoded string with the objects's storage index,
    1166                   or an empty string for LIT files
    1167    summary: a string, with a one-line summary of the stats of the file
    1168    results: a dictionary that describes the state of the file. For LIT files,
    1169             this dictionary has only the 'healthy' key, which will always be
    1170             True. For distributed files, this dictionary has the following
    1171             keys:
    1172      count-shares-good: the number of good shares that were found
    1173      count-shares-needed: 'k', the number of shares required for recovery
    1174      count-shares-expected: 'N', the number of total shares generated
    1175      count-good-share-hosts: this was intended to be the number of distinct
    1176                              storage servers with good shares. It is currently
    1177                              (as of Tahoe-LAFS v1.8.0) computed incorrectly;
    1178                              see ticket #1115.
    1179      count-wrong-shares: for mutable files, the number of shares for
    1180                          versions other than the 'best' one (highest
    1181                          sequence number, highest roothash). These are
    1182                          either old ...
    1183      count-recoverable-versions: for mutable files, the number of
    1184                                  recoverable versions of the file. For
    1185                                  a healthy file, this will equal 1.
    1186      count-unrecoverable-versions: for mutable files, the number of
    1187                                    unrecoverable versions of the file.
    1188                                    For a healthy file, this will be 0.
    1189      count-corrupt-shares: the number of shares with integrity failures
    1190      list-corrupt-shares: a list of "share locators", one for each share
    1191                           that was found to be corrupt. Each share locator
    1192                           is a list of (serverid, storage_index, sharenum).
    1193      needs-rebalancing: (bool) True if there are multiple shares on a single
    1194                         storage server, indicating a reduction in reliability
    1195                         that could be resolved by moving shares to new
    1196                         servers.
    1197      servers-responding: list of base32-encoded storage server identifiers,
    1198                          one for each server which responded to the share
    1199                          query.
    1200      healthy: (bool) True if the file is completely healthy, False otherwise.
    1201               Healthy files have at least N good shares. Overlapping shares
    1202               do not currently cause a file to be marked unhealthy. If there
    1203               are at least N good shares, then corrupt shares do not cause the
    1204               file to be marked unhealthy, although the corrupt shares will be
    1205               listed in the results (list-corrupt-shares) and should be manually
    1206               removed to wasting time in subsequent downloads (as the
    1207               downloader rediscovers the corruption and uses alternate shares).
    1208               Future compatibility: the meaning of this field may change to
    1209               reflect whether the servers-of-happiness criterion is met
    1210               (see ticket #614).
    1211      sharemap: dict mapping share identifier to list of serverids
    1212                (base32-encoded strings). This indicates which servers are
    1213                holding which shares. For immutable files, the shareid is
    1214                an integer (the share number, from 0 to N-1). For
    1215                immutable files, it is a string of the form
    1216                'seq%d-%s-sh%d', containing the sequence number, the
    1217                roothash, and the share number.
    1218 
    1219 POST $URL?t=start-deep-check    (must add &ophandle=XYZ)
    1220 
    1221   This initiates a recursive walk of all files and directories reachable from
    1222   the target, performing a check on each one just like t=check. The result
    1223   page will contain a summary of the results, including details on any
    1224   file/directory that was not fully healthy.
    1225 
    1226   t=start-deep-check can only be invoked on a directory. An error (400
    1227   BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
    1228   walker will deal with loops safely.
    1229 
    1230   This accepts the same verify= and add-lease= arguments as t=check.
    1231 
    1232   Since this operation can take a long time (perhaps a second per object),
    1233   the ophandle= argument is required (see "Slow Operations, Progress, and
    1234   Cancelling" above). The response to this POST will be a redirect to the
    1235   corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
    1236   match the output= argument given to the POST). The deep-check operation
    1237   will continue to run in the background, and the /operations page should be
    1238   used to find out when the operation is done.
    1239 
    1240   Detailed check results for non-healthy files and directories will be
    1241   available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
    1242   contain links to these detailed results.
    1243 
    1244   The HTML /operations/$HANDLE page for incomplete operations will contain a
    1245   meta-refresh tag, set to 60 seconds, so that a browser which uses
    1246   deep-check will automatically poll until the operation has completed.
    1247 
    1248   The JSON page (/options/$HANDLE?output=JSON) will contain a
    1249   machine-readable JSON dictionary with the following keys:
    1250 
    1251    finished: a boolean, True if the operation is complete, else False. Some
    1252              of the remaining keys may not be present until the operation
    1253              is complete.
    1254    root-storage-index: a base32-encoded string with the storage index of the
    1255                        starting point of the deep-check operation
    1256    count-objects-checked: count of how many objects were checked. Note that
    1257                           non-distributed objects (i.e. small immutable LIT
    1258                           files) are not checked, since for these objects,
    1259                           the data is contained entirely in the URI.
    1260    count-objects-healthy: how many of those objects were completely healthy
    1261    count-objects-unhealthy: how many were damaged in some way
    1262    count-corrupt-shares: how many shares were found to have corruption,
    1263                          summed over all objects examined
    1264    list-corrupt-shares: a list of "share identifiers", one for each share
    1265                         that was found to be corrupt. Each share identifier
    1266                         is a list of (serverid, storage_index, sharenum).
    1267    list-unhealthy-files: a list of (pathname, check-results) tuples, for
    1268                          each file that was not fully healthy. 'pathname' is
    1269                          a list of strings (which can be joined by "/"
    1270                          characters to turn it into a single string),
    1271                          relative to the directory on which deep-check was
    1272                          invoked. The 'check-results' field is the same as
    1273                          that returned by t=check&output=JSON, described
    1274                          above.
    1275    stats: a dictionary with the same keys as the t=start-deep-stats command
    1276           (described below)
    1277 
    1278 POST $URL?t=stream-deep-check
    1279 
    1280  This initiates a recursive walk of all files and directories reachable from
    1281  the target, performing a check on each one just like t=check. For each
    1282  unique object (duplicates are skipped), a single line of JSON is emitted to
    1283  the HTTP response channel (or an error indication, see below). When the walk
    1284  is complete, a final line of JSON is emitted which contains the accumulated
    1285  file-size/count "deep-stats" data.
    1286 
    1287  This command takes the same arguments as t=start-deep-check.
    1288 
    1289  A CLI tool can split the response stream on newlines into "response units",
    1290  and parse each response unit as JSON. Each such parsed unit will be a
    1291  dictionary, and will contain at least the "type" key: a string, one of
    1292  "file", "directory", or "stats".
    1293 
    1294  For all units that have a type of "file" or "directory", the dictionary will
    1295  contain the following keys:
    1296 
    1297   "path": a list of strings, with the path that is traversed to reach the
    1298           object
    1299   "cap": a write-cap URI for the file or directory, if available, else a
    1300          read-cap URI
    1301   "verifycap": a verify-cap URI for the file or directory
    1302   "repaircap": an URI for the weakest cap that can still be used to repair
    1303                the object
    1304   "storage-index": a base32 storage index for the object
    1305   "check-results": a copy of the dictionary which would be returned by
    1306                    t=check&output=json, with three top-level keys:
    1307                    "storage-index", "summary", and "results", and a variety
    1308                    of counts and sharemaps in the "results" value.
    1309 
    1310  Note that non-distributed files (i.e. LIT files) will have values of None
    1311  for verifycap, repaircap, and storage-index, since these files can neither
    1312  be verified nor repaired, and are not stored on the storage servers.
    1313  Likewise the check-results dictionary will be limited: an empty string for
    1314  storage-index, and a results dictionary with only the "healthy" key.
    1315 
    1316  The last unit in the stream will have a type of "stats", and will contain
    1317  the keys described in the "start-deep-stats" operation, below.
    1318 
    1319  If any errors occur during the traversal (specifically if a directory is
    1320  unrecoverable, such that further traversal is not possible), an error
    1321  indication is written to the response body, instead of the usual line of
    1322  JSON. This error indication line will begin with the string "ERROR:" (in all
    1323  caps), and contain a summary of the error on the rest of the line. The
    1324  remaining lines of the response body will be a python exception. The client
    1325  application should look for the ERROR: and stop processing JSON as soon as
    1326  it is seen. Note that neither a file being unrecoverable nor a directory
    1327  merely being unhealthy will cause traversal to stop. The line just before
    1328  the ERROR: will describe the directory that was untraversable, since the
    1329  unit is emitted to the HTTP response body before the child is traversed.
    1330 
    1331 
    1332 POST $URL?t=check&repair=true
    1333 
    1334   This performs a health check of the given file or directory, and if the
    1335   checker determines that the object is not healthy (some shares are missing
    1336   or corrupted), it will perform a "repair". During repair, any missing
    1337   shares will be regenerated and uploaded to new servers.
    1338 
    1339   This accepts the same verify=true and add-lease= arguments as t=check. When
    1340   an output=JSON argument is provided, the machine-readable JSON response
    1341   will contain the following keys:
    1342 
    1343    storage-index: a base32-encoded string with the objects's storage index,
    1344                   or an empty string for LIT files
    1345    repair-attempted: (bool) True if repair was attempted
    1346    repair-successful: (bool) True if repair was attempted and the file was
    1347                       fully healthy afterwards. False if no repair was
    1348                       attempted, or if a repair attempt failed.
    1349    pre-repair-results: a dictionary that describes the state of the file
    1350                        before any repair was performed. This contains exactly
    1351                        the same keys as the 'results' value of the t=check
    1352                        response, described above.
    1353    post-repair-results: a dictionary that describes the state of the file
    1354                         after any repair was performed. If no repair was
    1355                         performed, post-repair-results and pre-repair-results
    1356                         will be the same. This contains exactly the same keys
    1357                         as the 'results' value of the t=check response,
    1358                         described above.
    1359 
    1360 POST $URL?t=start-deep-check&repair=true    (must add &ophandle=XYZ)
    1361 
    1362   This triggers a recursive walk of all files and directories, performing a
    1363   t=check&repair=true on each one.
    1364 
    1365   Like t=start-deep-check without the repair= argument, this can only be
    1366   invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
    1367   is invoked on a file. The recursive walker will deal with loops safely.
    1368 
    1369   This accepts the same verify= and add-lease= arguments as
    1370   t=start-deep-check. It uses the same ophandle= mechanism as
    1371   start-deep-check. When an output=JSON argument is provided, the response
    1372   will contain the following keys:
    1373 
    1374    finished: (bool) True if the operation has completed, else False
    1375    root-storage-index: a base32-encoded string with the storage index of the
    1376                        starting point of the deep-check operation
    1377    count-objects-checked: count of how many objects were checked
    1378 
    1379    count-objects-healthy-pre-repair: how many of those objects were completely
    1380                                      healthy, before any repair
    1381    count-objects-unhealthy-pre-repair: how many were damaged in some way
    1382    count-objects-healthy-post-repair: how many of those objects were completely
    1383                                        healthy, after any repair
    1384    count-objects-unhealthy-post-repair: how many were damaged in some way
    1385 
    1386    count-repairs-attempted: repairs were attempted on this many objects.
    1387    count-repairs-successful: how many repairs resulted in healthy objects
    1388    count-repairs-unsuccessful: how many repairs resulted did not results in
    1389                                completely healthy objects
    1390    count-corrupt-shares-pre-repair: how many shares were found to have
    1391                                     corruption, summed over all objects
    1392                                     examined, before any repair
    1393    count-corrupt-shares-post-repair: how many shares were found to have
    1394                                      corruption, summed over all objects
    1395                                      examined, after any repair
    1396    list-corrupt-shares: a list of "share identifiers", one for each share
    1397                         that was found to be corrupt (before any repair).
    1398                         Each share identifier is a list of (serverid,
    1399                         storage_index, sharenum).
    1400    list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
    1401                                   that were successfully repaired are not
    1402                                   included. These are shares that need
    1403                                   manual processing. Since immutable shares
    1404                                   cannot be modified by clients, all corruption
    1405                                   in immutable shares will be listed here.
    1406    list-unhealthy-files: a list of (pathname, check-results) tuples, for
    1407                          each file that was not fully healthy. 'pathname' is
    1408                          relative to the directory on which deep-check was
    1409                          invoked. The 'check-results' field is the same as
    1410                          that returned by t=check&repair=true&output=JSON,
    1411                          described above.
    1412    stats: a dictionary with the same keys as the t=start-deep-stats command
    1413           (described below)
    1414 
    1415 POST $URL?t=stream-deep-check&repair=true
    1416 
    1417  This triggers a recursive walk of all files and directories, performing a
    1418  t=check&repair=true on each one. For each unique object (duplicates are
    1419  skipped), a single line of JSON is emitted to the HTTP response channel (or
    1420  an error indication). When the walk is complete, a final line of JSON is
    1421  emitted which contains the accumulated file-size/count "deep-stats" data.
    1422 
    1423  This emits the same data as t=stream-deep-check (without the repair=true),
    1424  except that the "check-results" field is replaced with a
    1425  "check-and-repair-results" field, which contains the keys returned by
    1426  t=check&repair=true&output=json (i.e. repair-attempted, repair-successful,
    1427  pre-repair-results, and post-repair-results). The output does not contain
    1428  the summary dictionary that is provied by t=start-deep-check&repair=true
    1429  (the one with count-objects-checked and list-unhealthy-files), since the
    1430  receiving client is expected to calculate those values itself from the
    1431  stream of per-object check-and-repair-results.
    1432 
    1433  Note that the "ERROR:" indication will only be emitted if traversal stops,
    1434  which will only occur if an unrecoverable directory is encountered. If a
    1435  file or directory repair fails, the traversal will continue, and the repair
    1436  failure will be indicated in the JSON data (in the "repair-successful" key).
    1437 
    1438 POST $DIRURL?t=start-manifest    (must add &ophandle=XYZ)
    1439 
    1440   This operation generates a "manfest" of the given directory tree, mostly
    1441   for debugging. This is a table of (path, filecap/dircap), for every object
    1442   reachable from the starting directory. The path will be slash-joined, and
    1443   the filecap/dircap will contain a link to the object in question. This page
    1444   gives immediate access to every object in the virtual filesystem subtree.
    1445 
    1446   This operation uses the same ophandle= mechanism as deep-check. The
    1447   corresponding /operations/$HANDLE page has three different forms. The
    1448   default is output=HTML.
    1449 
    1450   If output=text is added to the query args, the results will be a text/plain
    1451   list. The first line is special: it is either "finished: yes" or "finished:
    1452   no"; if the operation is not finished, you must periodically reload the
    1453   page until it completes. The rest of the results are a plaintext list, with
    1454   one file/dir per line, slash-separated, with the filecap/dircap separated
    1455   by a space.
    1456 
    1457   If output=JSON is added to the queryargs, then the results will be a
    1458   JSON-formatted dictionary with six keys. Note that because large directory
    1459   structures can result in very large JSON results, the full results will not
    1460   be available until the operation is complete (i.e. until output["finished"]
    1461   is True):
    1462 
    1463    finished (bool): if False then you must reload the page until True
    1464    origin_si (base32 str): the storage index of the starting point
    1465    manifest: list of (path, cap) tuples, where path is a list of strings.
    1466    verifycaps: list of (printable) verify cap strings
    1467    storage-index: list of (base32) storage index strings
    1468    stats: a dictionary with the same keys as the t=start-deep-stats command
    1469           (described below)
    1470 
    1471 POST $DIRURL?t=start-deep-size    (must add &ophandle=XYZ)
    1472 
    1473   This operation generates a number (in bytes) containing the sum of the
    1474   filesize of all directories and immutable files reachable from the given
    1475   directory. This is a rough lower bound of the total space consumed by this
    1476   subtree. It does not include space consumed by mutable files, nor does it
    1477   take expansion or encoding overhead into account. Later versions of the
    1478   code may improve this estimate upwards.
    1479 
    1480   The /operations/$HANDLE status output consists of two lines of text:
    1481 
    1482    finished: yes
    1483    size: 1234
    1484 
    1485 POST $DIRURL?t=start-deep-stats    (must add &ophandle=XYZ)
    1486 
    1487   This operation performs a recursive walk of all files and directories
    1488   reachable from the given directory, and generates a collection of
    1489   statistics about those objects.
    1490 
    1491   The result (obtained from the /operations/$OPHANDLE page) is a
    1492   JSON-serialized dictionary with the following keys (note that some of these
    1493   keys may be missing until 'finished' is True):
    1494 
    1495    finished: (bool) True if the operation has finished, else False
    1496    count-immutable-files: count of how many CHK files are in the set
    1497    count-mutable-files: same, for mutable files (does not include directories)
    1498    count-literal-files: same, for LIT files (data contained inside the URI)
    1499    count-files: sum of the above three
    1500    count-directories: count of directories
    1501    count-unknown: count of unrecognized objects (perhaps from the future)
    1502    size-immutable-files: total bytes for all CHK files in the set, =deep-size
    1503    size-mutable-files (TODO): same, for current version of all mutable files
    1504    size-literal-files: same, for LIT files
    1505    size-directories: size of directories (includes size-literal-files)
    1506    size-files-histogram: list of (minsize, maxsize, count) buckets,
    1507                          with a histogram of filesizes, 5dB/bucket,
    1508                          for both literal and immutable files
    1509    largest-directory: number of children in the largest directory
    1510    largest-immutable-file: number of bytes in the largest CHK file
    1511 
    1512   size-mutable-files is not implemented, because it would require extra
    1513   queries to each mutable file to get their size. This may be implemented in
    1514   the future.
    1515 
    1516   Assuming no sharing, the basic space consumed by a single root directory is
    1517   the sum of size-immutable-files, size-mutable-files, and size-directories.
    1518   The actual disk space used by the shares is larger, because of the
    1519   following sources of overhead:
    1520 
    1521    integrity data
    1522    expansion due to erasure coding
    1523    share management data (leases)
    1524    backend (ext3) minimum block size
    1525 
    1526 POST $URL?t=stream-manifest
    1527 
    1528  This operation performs a recursive walk of all files and directories
    1529  reachable from the given starting point. For each such unique object
    1530  (duplicates are skipped), a single line of JSON is emitted to the HTTP
    1531  response channel (or an error indication, see below). When the walk is
    1532  complete, a final line of JSON is emitted which contains the accumulated
    1533  file-size/count "deep-stats" data.
    1534 
    1535  A CLI tool can split the response stream on newlines into "response units",
    1536  and parse each response unit as JSON. Each such parsed unit will be a
    1537  dictionary, and will contain at least the "type" key: a string, one of
    1538  "file", "directory", or "stats".
    1539 
    1540  For all units that have a type of "file" or "directory", the dictionary will
    1541  contain the following keys:
    1542 
    1543   "path": a list of strings, with the path that is traversed to reach the
    1544           object
    1545   "cap": a write-cap URI for the file or directory, if available, else a
    1546          read-cap URI
    1547   "verifycap": a verify-cap URI for the file or directory
    1548   "repaircap": an URI for the weakest cap that can still be used to repair
    1549                the object
    1550   "storage-index": a base32 storage index for the object
    1551 
    1552  Note that non-distributed files (i.e. LIT files) will have values of None
    1553  for verifycap, repaircap, and storage-index, since these files can neither
    1554  be verified nor repaired, and are not stored on the storage servers.
    1555 
    1556  The last unit in the stream will have a type of "stats", and will contain
    1557  the keys described in the "start-deep-stats" operation, below.
    1558 
    1559  If any errors occur during the traversal (specifically if a directory is
    1560  unrecoverable, such that further traversal is not possible), an error
    1561  indication is written to the response body, instead of the usual line of
    1562  JSON. This error indication line will begin with the string "ERROR:" (in all
    1563  caps), and contain a summary of the error on the rest of the line. The
    1564  remaining lines of the response body will be a python exception. The client
    1565  application should look for the ERROR: and stop processing JSON as soon as
    1566  it is seen. The line just before the ERROR: will describe the directory that
    1567  was untraversable, since the manifest entry is emitted to the HTTP response
    1568  body before the child is traversed.
    1569 
    1570 == Other Useful Pages ==
    1571 
    1572 The portion of the web namespace that begins with "/uri" (and "/named") is
    1573 dedicated to giving users (both humans and programs) access to the Tahoe
    1574 virtual filesystem. The rest of the namespace provides status information
    1575 about the state of the Tahoe node.
    1576 
    1577 GET /   (the root page)
    1578 
    1579 This is the "Welcome Page", and contains a few distinct sections:
    1580 
    1581  Node information: library versions, local nodeid, services being provided.
    1582 
    1583  Filesystem Access Forms: create a new directory, view a file/directory by
    1584                           URI, upload a file (unlinked), download a file by
    1585                           URI.
    1586 
    1587  Grid Status: introducer information, helper information, connected storage
    1588               servers.
    1589 
    1590 GET /status/
    1591 
    1592  This page lists all active uploads and downloads, and contains a short list
    1593  of recent upload/download operations. Each operation has a link to a page
    1594  that describes file sizes, servers that were involved, and the time consumed
    1595  in each phase of the operation.
    1596 
    1597  A GET of /status/?t=json will contain a machine-readable subset of the same
    1598  data. It returns a JSON-encoded dictionary. The only key defined at this
    1599  time is "active", with a value that is a list of operation dictionaries, one
    1600  for each active operation. Once an operation is completed, it will no longer
    1601  appear in data["active"] .
    1602 
    1603  Each op-dict contains a "type" key, one of "upload", "download",
    1604  "mapupdate", "publish", or "retrieve" (the first two are for immutable
    1605  files, while the latter three are for mutable files and directories).
    1606 
    1607  The "upload" op-dict will contain the following keys:
    1608 
    1609    type (string): "upload"
    1610    storage-index-string (string): a base32-encoded storage index
    1611    total-size (int): total size of the file
    1612    status (string): current status of the operation
    1613    progress-hash (float): 1.0 when the file has been hashed
    1614    progress-ciphertext (float): 1.0 when the file has been encrypted.
    1615    progress-encode-push (float): 1.0 when the file has been encoded and
    1616                                  pushed to the storage servers. For helper
    1617                                  uploads, the ciphertext value climbs to 1.0
    1618                                  first, then encoding starts. For unassisted
    1619                                  uploads, ciphertext and encode-push progress
    1620                                  will climb at the same pace.
    1621 
    1622  The "download" op-dict will contain the following keys:
    1623 
    1624    type (string): "download"
    1625    storage-index-string (string): a base32-encoded storage index
    1626    total-size (int): total size of the file
    1627    status (string): current status of the operation
    1628    progress (float): 1.0 when the file has been fully downloaded
    1629 
    1630  Front-ends which want to report progress information are advised to simply
    1631  average together all the progress-* indicators. A slightly more accurate
    1632  value can be found by ignoring the progress-hash value (since the current
    1633  implementation hashes synchronously, so clients will probably never see
    1634  progress-hash!=1.0).
    1635 
    1636 GET /provisioning/
    1637 
    1638  This page provides a basic tool to predict the likely storage and bandwidth
    1639  requirements of a large Tahoe grid. It provides forms to input things like
    1640  total number of users, number of files per user, average file size, number
    1641  of servers, expansion ratio, hard drive failure rate, etc. It then provides
    1642  numbers like how many disks per server will be needed, how many read
    1643  operations per second should be expected, and the likely MTBF for files in
    1644  the grid. This information is very preliminary, and the model upon which it
    1645  is based still needs a lot of work.
    1646 
    1647 GET /helper_status/
    1648 
    1649  If the node is running a helper (i.e. if [helper]enabled is set to True in
    1650  tahoe.cfg), then this page will provide a list of all the helper operations
    1651  currently in progress. If "?t=json" is added to the URL, it will return a
    1652  JSON-formatted list of helper statistics, which can then be used to produce
    1653  graphs to indicate how busy the helper is.
    1654 
    1655 GET /statistics/
    1656 
    1657  This page provides "node statistics", which are collected from a variety of
    1658  sources.
    1659 
    1660    load_monitor: every second, the node schedules a timer for one second in
    1661                  the future, then measures how late the subsequent callback
    1662                  is. The "load_average" is this tardiness, measured in
    1663                  seconds, averaged over the last minute. It is an indication
    1664                  of a busy node, one which is doing more work than can be
    1665                  completed in a timely fashion. The "max_load" value is the
    1666                  highest value that has been seen in the last 60 seconds.
    1667 
    1668    cpu_monitor: every minute, the node uses time.clock() to measure how much
    1669                 CPU time it has used, and it uses this value to produce
    1670                 1min/5min/15min moving averages. These values range from 0%
    1671                 (0.0) to 100% (1.0), and indicate what fraction of the CPU
    1672                 has been used by the Tahoe node. Not all operating systems
    1673                 provide meaningful data to time.clock(): they may report 100%
    1674                 CPU usage at all times.
    1675 
    1676    uploader: this counts how many immutable files (and bytes) have been
    1677              uploaded since the node was started
    1678 
    1679    downloader: this counts how many immutable files have been downloaded
    1680                since the node was started
    1681 
    1682    publishes: this counts how many mutable files (including directories) have
    1683               been modified since the node was started
    1684 
    1685    retrieves: this counts how many mutable files (including directories) have
    1686               been read since the node was started
    1687 
    1688  There are other statistics that are tracked by the node. The "raw stats"
    1689  section shows a formatted dump of all of them.
    1690 
    1691  By adding "?t=json" to the URL, the node will return a JSON-formatted
    1692  dictionary of stats values, which can be used by other tools to produce
    1693  graphs of node behavior. The misc/munin/ directory in the source
    1694  distribution provides some tools to produce these graphs.
    1695 
    1696 GET /   (introducer status)
    1697 
    1698  For Introducer nodes, the welcome page displays information about both
    1699  clients and servers which are connected to the introducer. Servers make
    1700  "service announcements", and these are listed in a table. Clients will
    1701  subscribe to hear about service announcements, and these subscriptions are
    1702  listed in a separate table. Both tables contain information about what
    1703  version of Tahoe is being run by the remote node, their advertised and
    1704  outbound IP addresses, their nodeid and nickname, and how long they have
    1705  been available.
    1706 
    1707  By adding "?t=json" to the URL, the node will return a JSON-formatted
    1708  dictionary of stats values, which can be used to produce graphs of connected
    1709  clients over time. This dictionary has the following keys:
    1710 
    1711   ["subscription_summary"] : a dictionary mapping service name (like
    1712                              "storage") to an integer with the number of
    1713                              clients that have subscribed to hear about that
    1714                              service
    1715   ["announcement_summary"] : a dictionary mapping service name to an integer
    1716                              with the number of servers which are announcing
    1717                              that service
    1718   ["announcement_distinct_hosts"] : a dictionary mapping service name to an
    1719                                     integer which represents the number of
    1720                                     distinct hosts that are providing that
    1721                                     service. If two servers have announced
    1722                                     FURLs which use the same hostnames (but
    1723                                     different ports and tubids), they are
    1724                                     considered to be on the same host.
    1725 
    1726 
    1727 == Static Files in /public_html ==
    1728 
    1729 The webapi server will take any request for a URL that starts with /static
    1730 and serve it from a configurable directory which defaults to
    1731 $BASEDIR/public_html . This is configured by setting the "[node]web.static"
    1732 value in $BASEDIR/tahoe.cfg . If this is left at the default value of
    1733 "public_html", then http://localhost:3456/static/subdir/foo.html will be
    1734 served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
    1735 
    1736 This can be useful to serve a javascript application which provides a
    1737 prettier front-end to the rest of the Tahoe webapi.
    1738 
    1739 
    1740 == Safety and security issues -- names vs. URIs ==
    1741 
    1742 Summary: use explicit file- and dir- caps whenever possible, to reduce the
    1743 potential for surprises when the filesystem structure is changed.
    1744 
    1745 Tahoe provides a mutable filesystem, but the ways that the filesystem can
    1746 change are limited. The only thing that can change is that the mapping from
    1747 child names to child objects that each directory contains can be changed by
    1748 adding a new child name pointing to an object, removing an existing child name,
    1749 or changing an existing child name to point to a different object.
    1750 
    1751 Obviously if you query Tahoe for information about the filesystem and then act
    1752 to change the filesystem (such as by getting a listing of the contents of a
    1753 directory and then adding a file to the directory), then the filesystem might
    1754 have been changed after you queried it and before you acted upon it.  However,
    1755 if you use the URI instead of the pathname of an object when you act upon the
    1756 object, then the only change that can happen is if the object is a directory
    1757 then the set of child names it has might be different. If, on the other hand,
    1758 you act upon the object using its pathname, then a different object might be in
    1759 that place, which can result in more kinds of surprises.
    1760 
    1761 For example, suppose you are writing code which recursively downloads the
    1762 contents of a directory. The first thing your code does is fetch the listing
    1763 of the contents of the directory. For each child that it fetched, if that
    1764 child is a file then it downloads the file, and if that child is a directory
    1765 then it recurses into that directory. Now, if the download and the recurse
    1766 actions are performed using the child's name, then the results might be
    1767 wrong, because for example a child name that pointed to a sub-directory when
    1768 you listed the directory might have been changed to point to a file (in which
    1769 case your attempt to recurse into it would result in an error and the file
    1770 would be skipped), or a child name that pointed to a file when you listed the
    1771 directory might now point to a sub-directory (in which case your attempt to
    1772 download the child would result in a file containing HTML text describing the
    1773 sub-directory!).
    1774 
    1775 If your recursive algorithm uses the uri of the child instead of the name of
    1776 the child, then those kinds of mistakes just can't happen. Note that both the
    1777 child's name and the child's URI are included in the results of listing the
    1778 parent directory, so it isn't any harder to use the URI for this purpose.
    1779 
    1780 The read and write caps in a given directory node are separate URIs, and
    1781 can't be assumed to point to the same object even if they were retrieved in
    1782 the same operation (although the webapi server attempts to ensure this
    1783 in most cases). If you need to rely on that property, you should explicitly
    1784 verify it. More generally, you should not make assumptions about the
    1785 internal consistency of the contents of mutable directories. As a result
    1786 of the signatures on mutable object versions, it is guaranteed that a given
    1787 version was written in a single update, but -- as in the case of a file --
    1788 the contents may have been chosen by a malicious writer in a way that is
    1789 designed to confuse applications that rely on their consistency.
    1790 
    1791 In general, use names if you want "whatever object (whether file or
    1792 directory) is found by following this name (or sequence of names) when my
    1793 request reaches the server". Use URIs if you want "this particular object".
    1794 
    1795 == Concurrency Issues ==
    1796 
    1797 Tahoe uses both mutable and immutable files. Mutable files can be created
    1798 explicitly by doing an upload with ?mutable=true added, or implicitly by
    1799 creating a new directory (since a directory is just a special way to
    1800 interpret a given mutable file).
    1801 
    1802 Mutable files suffer from the same consistency-vs-availability tradeoff that
    1803 all distributed data storage systems face. It is not possible to
    1804 simultaneously achieve perfect consistency and perfect availability in the
    1805 face of network partitions (servers being unreachable or faulty).
    1806 
    1807 Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
    1808 place, known as the Prime Coordination Directive: "Don't Do That". What this
    1809 means is that if write-access to a mutable file is available to several
    1810 parties, then those parties are responsible for coordinating their activities
    1811 to avoid multiple simultaneous updates. This could be achieved by having
    1812 these parties talk to each other and using some sort of locking mechanism, or
    1813 by serializing all changes through a single writer.
    1814 
    1815 The consequences of performing uncoordinated writes can vary. Some of the
    1816 writers may lose their changes, as somebody else wins the race condition. In
    1817 many cases the file will be left in an "unhealthy" state, meaning that there
    1818 are not as many redundant shares as we would like (reducing the reliability
    1819 of the file against server failures). In the worst case, the file can be left
    1820 in such an unhealthy state that no version is recoverable, even the old ones.
    1821 It is this small possibility of data loss that prompts us to issue the Prime
    1822 Coordination Directive.
    1823 
    1824 Tahoe nodes implement internal serialization to make sure that a single Tahoe
    1825 node cannot conflict with itself. For example, it is safe to issue two
    1826 directory modification requests to a single tahoe node's webapi server at the
    1827 same time, because the Tahoe node will internally delay one of them until
    1828 after the other has finished being applied. (This feature was introduced in
    1829 Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
    1830 web requests themselves).
    1831 
    1832 For more details, please see the "Consistency vs Availability" and "The Prime
    1833 Coordination Directive" sections of mutable.txt, in the same directory as
    1834 this file.
    1835 
    1836 
    1837 [1]: URLs and HTTP and UTF-8, Oh My
    1838 
    1839  HTTP does not provide a mechanism to specify the character set used to
    1840  encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
    1841  the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
    1842  For example, suppose we want to provoke the server into using a filename of
    1843  "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
    1844  is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
    1845  repr() function would show). To encode this into a URL, the non-printable
    1846  characters must be escaped with the urlencode '%XX' mechansim, giving us
    1847  "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
    1848  /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
    1849  provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
    1850 
    1851  The response header will need to indicate a non-ASCII filename. The actual
    1852  mechanism to do this is not clear. For ASCII filenames, the response header
    1853  would look like:
    1854 
    1855   Content-Disposition: attachment; filename="english.txt"
    1856 
    1857  If Tahoe were to enforce the utf-8 convention, it would need to decode the
    1858  URL argument into a unicode string, and then encode it back into a sequence
    1859  of bytes when creating the response header. One possibility would be to use
    1860  unencoded utf-8. Developers suggest that IE7 might accept this:
    1861 
    1862   #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
    1863     (note, the last four bytes of that line, not including the newline, are
    1864     0xC3 0xA9 0x65 0x22)
    1865 
    1866  RFC2231#4 (dated 1997): suggests that the following might work, and some
    1867  developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
    1868  it is supported by firefox (but not IE7):
    1869 
    1870   #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
    1871 
    1872  My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
    1873  the filename= parameter is defined to be wrapped in quotes (presumeably to
    1874  allow spaces without breaking the parsing of subsequent parameters), which
    1875  would give us:
    1876 
    1877   #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
    1878 
    1879  However this is contrary to the examples in the email thread listed above.
    1880 
    1881  Developers report that IE7 (when it is configured for UTF-8 URL encoding,
    1882  which is not the default in asian countries), will accept:
    1883 
    1884   #4: Content-Disposition: attachment; filename=fianc%C3%A9e
    1885 
    1886  However, for maximum compatibility, Tahoe simply copies bytes from the URL
    1887  into the response header, rather than enforcing the utf-8 convention. This
    1888  means it does not try to decode the filename from the URL argument, nor does
    1889  it encode the filename into the response header.
  • new file docs/specifications/URI-extension.rst

    diff --git a/docs/specifications/URI-extension.rst b/docs/specifications/URI-extension.rst
    new file mode 100644
    index 0000000..6d40652
    - +  
     1===================
     2URI Extension Block
     3===================
     4
     5This block is a serialized dictionary with string keys and string values
     6(some of which represent numbers, some of which are SHA-256 hashes). All
     7buckets hold an identical copy. The hash of the serialized data is kept in
     8the URI.
     9
     10The download process must obtain a valid copy of this data before any
     11decoding can take place. The download process must also obtain other data
     12before incremental validation can be performed. Full-file validation (for
     13clients who do not wish to do incremental validation) can be performed solely
     14with the data from this block.
     15
     16At the moment, this data block contains the following keys (and an estimate
     17on their sizes)::
     18
     19 size                5
     20 segment_size        7
     21 num_segments        2
     22 needed_shares       2
     23 total_shares        3
     24
     25 codec_name          3
     26 codec_params        5+1+2+1+3=12
     27 tail_codec_params   12
     28
     29 share_root_hash     32 (binary) or 52 (base32-encoded) each
     30 plaintext_hash
     31 plaintext_root_hash
     32 crypttext_hash
     33 crypttext_root_hash
     34
     35Some pieces are needed elsewhere (size should be visible without pulling the
     36block, the Tahoe3 algorithm needs total_shares to find the right peers, all
     37peer selection algorithms need needed_shares to ask a minimal set of peers).
     38Some pieces are arguably redundant but are convenient to have present
     39(test_encode.py makes use of num_segments).
     40
     41The rule for this data block is that it should be a constant size for all
     42files, regardless of file size. Therefore hash trees (which have a size that
     43depends linearly upon the number of segments) are stored elsewhere in the
     44bucket, with only the hash tree root stored in this data block.
     45
     46This block will be serialized as follows::
     47
     48 assert that all keys match ^[a-zA-z_\-]+$
     49 sort all the keys lexicographically
     50 for k in keys:
     51  write("%s:" % k)
     52  write(netstring(data[k]))
     53
     54
     55Serialized size::
     56
     57 dense binary (but decimal) packing: 160+46=206
     58 including 'key:' (185) and netstring (6*3+7*4=46) on values: 231
     59 including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=504
     60
     61We'll go with the 231-sized block, and provide a tool to dump it as text if
     62we really want one.
  • deleted file docs/specifications/URI-extension.txt

    diff --git a/docs/specifications/URI-extension.txt b/docs/specifications/URI-extension.txt
    deleted file mode 100644
    index 8ec383e..0000000
    + -  
    1 
    2 "URI Extension Block"
    3 
    4 This block is a serialized dictionary with string keys and string values
    5 (some of which represent numbers, some of which are SHA-256 hashes). All
    6 buckets hold an identical copy. The hash of the serialized data is kept in
    7 the URI.
    8 
    9 The download process must obtain a valid copy of this data before any
    10 decoding can take place. The download process must also obtain other data
    11 before incremental validation can be performed. Full-file validation (for
    12 clients who do not wish to do incremental validation) can be performed solely
    13 with the data from this block.
    14 
    15 At the moment, this data block contains the following keys (and an estimate
    16 on their sizes):
    17 
    18  size                5
    19  segment_size        7
    20  num_segments        2
    21  needed_shares       2
    22  total_shares        3
    23 
    24  codec_name          3
    25  codec_params        5+1+2+1+3=12
    26  tail_codec_params   12
    27 
    28  share_root_hash     32 (binary) or 52 (base32-encoded) each
    29  plaintext_hash
    30  plaintext_root_hash
    31  crypttext_hash
    32  crypttext_root_hash
    33 
    34 Some pieces are needed elsewhere (size should be visible without pulling the
    35 block, the Tahoe3 algorithm needs total_shares to find the right peers, all
    36 peer selection algorithms need needed_shares to ask a minimal set of peers).
    37 Some pieces are arguably redundant but are convenient to have present
    38 (test_encode.py makes use of num_segments).
    39 
    40 The rule for this data block is that it should be a constant size for all
    41 files, regardless of file size. Therefore hash trees (which have a size that
    42 depends linearly upon the number of segments) are stored elsewhere in the
    43 bucket, with only the hash tree root stored in this data block.
    44 
    45 This block will be serialized as follows:
    46 
    47  assert that all keys match ^[a-zA-z_\-]+$
    48  sort all the keys lexicographically
    49  for k in keys:
    50   write("%s:" % k)
    51   write(netstring(data[k]))
    52 
    53 
    54 Serialized size:
    55 
    56  dense binary (but decimal) packing: 160+46=206
    57  including 'key:' (185) and netstring (6*3+7*4=46) on values: 231
    58  including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=504
    59 
    60 We'll go with the 231-sized block, and provide a tool to dump it as text if
    61 we really want one.
  • new file docs/specifications/dirnodes.rst

    diff --git a/docs/specifications/dirnodes.rst b/docs/specifications/dirnodes.rst
    new file mode 100644
    index 0000000..129e499
    - +  
     1==========================
     2Tahoe-LAFS Directory Nodes
     3==========================
     4
     5As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as
     6a collection of three layers. The lowest layer is the key-value store: it
     7provides operations that accept files and upload them to the grid, creating
     8a URI in the process which securely references the file's contents.
     9The middle layer is the filesystem, creating a structure of directories and
     10filenames resembling the traditional unix/windows filesystems. The top layer
     11is the application layer, which uses the lower layers to provide useful
     12services to users, like a backup application, or a way to share files with
     13friends.
     14
     15This document examines the middle layer, the "filesystem".
     16
     171.  `Key-value Store Primitives`_
     182.  `Filesystem goals`_
     193.  `Dirnode goals`_
     204.  `Dirnode secret values`_
     215.  `Dirnode storage format`_
     226.  `Dirnode sizes, mutable-file initial read sizes`_
     237.  `Design Goals, redux`_
     24
     25    1. `Confidentiality leaks in the storage servers`_
     26    2. `Integrity failures in the storage servers`_
     27    3. `Improving the efficiency of dirnodes`_
     28    4. `Dirnode expiration and leases`_
     29
     308.  `Starting Points: root dirnodes`_
     319.  `Mounting and Sharing Directories`_
     3210. `Revocation`_
     33
     34Key-value Store Primitives
     35==========================
     36
     37In the lowest layer (key-value store), there are two operations that reference
     38immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or
     39"CHK read-caps"). One puts data into the grid (but only if it doesn't exist
     40already), the other retrieves it::
     41
     42 chk_uri = put(data)
     43 data = get(chk_uri)
     44
     45We also have three operations which reference mutable data (which we refer to
     46as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK
     47slots"). One creates a slot with some initial contents, a second replaces the
     48contents of a pre-existing slot, and the third retrieves the contents::
     49
     50 mutable_uri = create(initial_data)
     51 replace(mutable_uri, new_data)
     52 data = get(mutable_uri)
     53
     54Filesystem Goals
     55================
     56
     57The main goal for the middle (filesystem) layer is to give users a way to
     58organize the data that they have uploaded into the grid. The traditional way
     59to do this in computer filesystems is to put this data into files, give those
     60files names, and collect these names into directories.
     61
     62Each directory is a set of name-entry pairs, each of which maps a "child name"
     63to a directory entry pointing to an object of some kind. Those child objects
     64might be files, or they might be other directories. Each directory entry also
     65contains metadata.
     66
     67The directory structure is therefore a directed graph of nodes, in which each
     68node might be a directory node or a file node. All file nodes are terminal
     69nodes.
     70
     71Dirnode Goals
     72=============
     73
     74What properties might be desirable for these directory nodes? In no
     75particular order:
     76
     771. functional. Code which does not work doesn't count.
     782. easy to document, explain, and understand
     793. confidential: it should not be possible for others to see the contents of
     80   a directory
     814. integrity: it should not be possible for others to modify the contents
     82   of a directory
     835. available: directories should survive host failure, just like files do
     846. efficient: in storage, communication bandwidth, number of round-trips
     857. easy to delegate individual directories in a flexible way
     868. updateness: everybody looking at a directory should see the same contents
     879. monotonicity: everybody looking at a directory should see the same
     88   sequence of updates
     89
     90Some of these goals are mutually exclusive. For example, availability and
     91consistency are opposing, so it is not possible to achieve #5 and #8 at the
     92same time. Moreover, it takes a more complex architecture to get close to the
     93available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.
     94
     95Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key
     96cryptography for integrity, and erasure coding for availability. These
     97achieve roughly the same properties as immutable CHK files, but their
     98contents can be replaced without changing their identity. Dirnodes are then
     99just a special way of interpreting the contents of a specific mutable file.
     100Earlier releases used a "vdrive server": this server was abolished in the
     101v0.7.0 release.
     102
     103For details of how mutable files work, please see "mutable.txt" in this
     104directory.
     105
     106For releases since v0.7.0, we achieve most of our desired properties. The
     107integrity and availability of dirnodes is equivalent to that of regular
     108(immutable) files, with the exception that there are more simultaneous-update
     109failure modes for mutable slots. Delegation is quite strong: you can give
     110read-write or read-only access to any subtree, and the data format used for
     111dirnodes is such that read-only access is transitive: i.e. if you grant Bob
     112read-only access to a parent directory, then Bob will get read-only access
     113(and *not* read-write access) to its children.
     114
     115Relative to the previous "vdrive-server" based scheme, the current
     116distributed dirnode approach gives better availability, but cannot guarantee
     117updateness quite as well, and requires far more network traffic for each
     118retrieval and update. Mutable files are somewhat less available than
     119immutable files, simply because of the increased number of combinations
     120(shares of an immutable file are either present or not, whereas there are
     121multiple versions of each mutable file, and you might have some shares of
     122version 1 and other shares of version 2). In extreme cases of simultaneous
     123update, mutable files might suffer from non-monotonicity.
     124
     125
     126Dirnode secret values
     127=====================
     128
     129As mentioned before, dirnodes are simply a special way to interpret the
     130contents of a mutable file, so the secret keys and capability strings
     131described in "mutable.txt" are all the same. Each dirnode contains an RSA
     132public/private keypair, and the holder of the "write capability" will be able
     133to retrieve the private key (as well as the AES encryption key used for the
     134data itself). The holder of the "read capability" will be able to obtain the
     135public key and the AES data key, but not the RSA private key needed to modify
     136the data.
     137
     138The "write capability" for a dirnode grants read-write access to its
     139contents. This is expressed on concrete form as the "dirnode write cap": a
     140printable string which contains the necessary secrets to grant this access.
     141Likewise, the "read capability" grants read-only access to a dirnode, and can
     142be represented by a "dirnode read cap" string.
     143
     144For example,
     145URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
     146is a write-capability URI, while
     147URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
     148is a read-capability URI, both for the same dirnode.
     149
     150
     151Dirnode storage format
     152======================
     153
     154Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS
     155grid. The contents of this file are a serialized list of netstrings, one per
     156child. Each child is a list of four netstrings: (name, rocap, rwcap,
     157metadata). (Remember that the contents of the mutable file are encrypted by
     158the read-cap, so this section describes the plaintext contents of the mutable
     159file, *after* it has been decrypted by the read-cap.)
     160
     161The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only
     162capability URI to that child, either an immutable (CHK) file, a mutable file,
     163or a directory. It is also possible to store 'unknown' URIs that are not
     164recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write
     165capability URI for that child, encrypted with the dirnode's write-cap: this
     166enables the "transitive readonlyness" property, described further below. The
     167'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some
     168metadata keys are pre-defined, the rest are left up to the application.
     169
     170Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random
     171value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI
     172string, using a key that is formed from a tagged hash of the IV and the
     173dirnode's writekey. The MAC is written only for compatibility with older
     174Tahoe-LAFS versions and is no longer verified.
     175
     176If Bob has read-only access to the 'bar' directory, and he adds it as a child
     177to the 'foo' directory, then he will put the read-only cap for 'bar' in both
     178the rwcap and rocap slots (encrypting the rwcap contents as described above).
     179If he has full read-write access to 'bar', then he will put the read-write
     180cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since
     181other users who have read-only access to 'foo' will be unable to decrypt its
     182rwcap slot, this limits those users to read-only access to 'bar' as well,
     183thus providing the transitive readonlyness that we desire.
     184
     185Dirnode sizes, mutable-file initial read sizes
     186==============================================
     187
     188How big are dirnodes? When reading dirnode data out of mutable files, how
     189large should our initial read be? If we guess exactly, we can read a dirnode
     190in a single round-trip, and update one in two RTT. If we guess too high,
     191we'll waste some amount of bandwidth. If we guess low, we need to make a
     192second pass to get the data (or the encrypted privkey, for writes), which
     193will cost us at least another RTT.
     194
     195Assuming child names are between 10 and 99 characters long, how long are the
     196various pieces of a dirnode?
     197
     198::
     199
     200 netstring(name) ~= 4+len(name)
     201 chk-cap = 97 (for 4-char filesizes)
     202 dir-rw-cap = 88
     203 dir-ro-cap = 91
     204 netstring(cap) = 4+len(cap)
     205 encrypted(cap) = 16+cap+32
     206 JSON({}) = 2
     207 JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137
     208 netstring(metadata) = 4+137 = 141
     209
     210so a CHK entry is::
     211
     212 5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137
     213
     214And a 15-byte filename gives a 416-byte entry. When the entry points at a
     215subdirectory instead of a file, the entry is a little bit smaller. So an
     216empty directory uses 0 bytes, a directory with one child uses about 416
     217bytes, a directory with two children uses about 832, etc.
     218
     219When the dirnode data is encoding using our default 3-of-10, that means we
     220get 139ish bytes of data in each share per child.
     221
     222The pubkey, signature, and hashes form the first 935ish bytes of the
     223container, then comes our data, then about 1216 bytes of encprivkey. So if we
     224read the first::
     225
     226 1kB: we get 65bytes of dirnode data : only empty directories
     227 2kB: 1065bytes: about 8
     228 3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey
     229 4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey
     230
     231So we've written the code to do an initial read of 4kB from each share when
     232we read the mutable file, which should give good performance (one RTT) for
     233small directories.
     234
     235
     236Design Goals, redux
     237===================
     238
     239How well does this design meet the goals?
     240
     2411. functional: YES: the code works and has extensive unit tests
     2422. documentable: YES: this document is the existence proof
     2433. confidential: YES: see below
     2444. integrity: MOSTLY: a coalition of storage servers can rollback individual
     245   mutable files, but not a single one. No server can
     246   substitute fake data as genuine.
     2475. availability: YES: as long as 'k' storage servers are present and have
     248   the same version of the mutable file, the dirnode will
     249   be available.
     2506. efficient: MOSTLY:
     251     network: single dirnode lookup is very efficient, since clients can
     252       fetch specific keys rather than being required to get or set
     253       the entire dirnode each time. Traversing many directories
     254       takes a lot of roundtrips, and these can't be collapsed with
     255       promise-pipelining because the intermediate values must only
     256       be visible to the client. Modifying many dirnodes at once
     257       (e.g. importing a large pre-existing directory tree) is pretty
     258       slow, since each graph edge must be created independently.
     259     storage: each child has a separate IV, which makes them larger than
     260       if all children were aggregated into a single encrypted string
     2617. delegation: VERY: each dirnode is a completely independent object,
     262   to which clients can be granted separate read-write or
     263   read-only access
     2648. updateness: VERY: with only a single point of access, and no caching,
     265   each client operation starts by fetching the current
     266   value, so there are no opportunities for staleness
     2679. monotonicity: VERY: the single point of access also protects against
     268   retrograde motion
     269     
     270
     271
     272Confidentiality leaks in the storage servers
     273--------------------------------------------
     274
     275Dirnode (and the mutable files upon which they are based) are very private
     276against other clients: traffic between the client and the storage servers is
     277protected by the Foolscap SSL connection, so they can observe very little.
     278Storage index values are hashes of secrets and thus unguessable, and they are
     279not made public, so other clients cannot snoop through encrypted dirnodes
     280that they have not been told about.
     281
     282Storage servers can observe access patterns and see ciphertext, but they
     283cannot see the plaintext (of child names, metadata, or URIs). If an attacker
     284operates a significant number of storage servers, they can infer the shape of
     285the directory structure by assuming that directories are usually accessed
     286from root to leaf in rapid succession. Since filenames are usually much
     287shorter than read-caps and write-caps, the attacker can use the length of the
     288ciphertext to guess the number of children of each node, and might be able to
     289guess the length of the child names (or at least their sum). From this, the
     290attacker may be able to build up a graph with the same shape as the plaintext
     291filesystem, but with unlabeled edges and unknown file contents.
     292
     293
     294Integrity failures in the storage servers
     295-----------------------------------------
     296
     297The mutable file's integrity mechanism (RSA signature on the hash of the file
     298contents) prevents the storage server from modifying the dirnode's contents
     299without detection. Therefore the storage servers can make the dirnode
     300unavailable, but not corrupt it.
     301
     302A sufficient number of colluding storage servers can perform a rollback
     303attack: replace all shares of the whole mutable file with an earlier version.
     304To prevent this, when retrieving the contents of a mutable file, the
     305client queries more servers than necessary and uses the highest available
     306version number. This insures that one or two misbehaving storage servers
     307cannot cause this rollback on their own.
     308
     309
     310Improving the efficiency of dirnodes
     311------------------------------------
     312
     313The current mutable-file -based dirnode scheme suffers from certain
     314inefficiencies. A very large directory (with thousands or millions of
     315children) will take a significant time to extract any single entry, because
     316the whole file must be downloaded first, then parsed and searched to find the
     317desired child entry. Likewise, modifying a single child will require the
     318whole file to be re-uploaded.
     319
     320The current design assumes (and in some cases, requires) that dirnodes remain
     321small. The mutable files on which dirnodes are based are currently using
     322"SDMF" ("Small Distributed Mutable File") design rules, which state that the
     323size of the data shall remain below one megabyte. More advanced forms of
     324mutable files (MDMF and LDMF) are in the design phase to allow efficient
     325manipulation of larger mutable files. This would reduce the work needed to
     326modify a single entry in a large directory.
     327
     328Judicious caching may help improve the reading-large-directory case. Some
     329form of mutable index at the beginning of the dirnode might help as well. The
     330MDMF design rules allow for efficient random-access reads from the middle of
     331the file, which would give the index something useful to point at.
     332
     333The current SDMF design generates a new RSA public/private keypair for each
     334directory. This takes considerable time and CPU effort, generally one or two
     335seconds per directory. We have designed (but not yet built) a DSA-based
     336mutable file scheme which will use shared parameters to reduce the
     337directory-creation effort to a bare minimum (picking a random number instead
     338of generating two random primes).
     339
     340When a backup program is run for the first time, it needs to copy a large
     341amount of data from a pre-existing filesystem into reliable storage. This
     342means that a large and complex directory structure needs to be duplicated in
     343the dirnode layer. With the one-object-per-dirnode approach described here,
     344this requires as many operations as there are edges in the imported
     345filesystem graph.
     346
     347Another approach would be to aggregate multiple directories into a single
     348storage object. This object would contain a serialized graph rather than a
     349single name-to-child dictionary. Most directory operations would fetch the
     350whole block of data (and presumeably cache it for a while to avoid lots of
     351re-fetches), and modification operations would need to replace the whole
     352thing at once. This "realm" approach would have the added benefit of
     353combining more data into a single encrypted bundle (perhaps hiding the shape
     354of the graph from a determined attacker), and would reduce round-trips when
     355performing deep directory traversals (assuming the realm was already cached).
     356It would also prevent fine-grained rollback attacks from working: a coalition
     357of storage servers could change the entire realm to look like an earlier
     358state, but it could not independently roll back individual directories.
     359
     360The drawbacks of this aggregation would be that small accesses (adding a
     361single child, looking up a single child) would require pulling or pushing a
     362lot of unrelated data, increasing network overhead (and necessitating
     363test-and-set semantics for the modification side, which increases the chances
     364that a user operation will fail, making it more challenging to provide
     365promises of atomicity to the user).
     366
     367It would also make it much more difficult to enable the delegation
     368("sharing") of specific directories. Since each aggregate "realm" provides
     369all-or-nothing access control, the act of delegating any directory from the
     370middle of the realm would require the realm first be split into the upper
     371piece that isn't being shared and the lower piece that is. This splitting
     372would have to be done in response to what is essentially a read operation,
     373which is not traditionally supposed to be a high-effort action. On the other
     374hand, it may be possible to aggregate the ciphertext, but use distinct
     375encryption keys for each component directory, to get the benefits of both
     376schemes at once.
     377
     378
     379Dirnode expiration and leases
     380-----------------------------
     381
     382Dirnodes are created any time a client wishes to add a new directory. How
     383long do they live? What's to keep them from sticking around forever, taking
     384up space that nobody can reach any longer?
     385
     386Mutable files are created with limited-time "leases", which keep the shares
     387alive until the last lease has expired or been cancelled. Clients which know
     388and care about specific dirnodes can ask to keep them alive for a while, by
     389renewing a lease on them (with a typical period of one month). Clients are
     390expected to assist in the deletion of dirnodes by canceling their leases as
     391soon as they are done with them. This means that when a client deletes a
     392directory, it should also cancel its lease on that directory. When the lease
     393count on a given share goes to zero, the storage server can delete the
     394related storage. Multiple clients may all have leases on the same dirnode:
     395the server may delete the shares only after all of the leases have gone away.
     396
     397We expect that clients will periodically create a "manifest": a list of
     398so-called "refresh capabilities" for all of the dirnodes and files that they
     399can reach. They will give this manifest to the "repairer", which is a service
     400that keeps files (and dirnodes) alive on behalf of clients who cannot take on
     401this responsibility for themselves. These refresh capabilities include the
     402storage index, but do *not* include the readkeys or writekeys, so the
     403repairer does not get to read the files or directories that it is helping to
     404keep alive.
     405
     406After each change to the user's vdrive, the client creates a manifest and
     407looks for differences from their previous version. Anything which was removed
     408prompts the client to send out lease-cancellation messages, allowing the data
     409to be deleted.
     410
     411
     412Starting Points: root dirnodes
     413==============================
     414
     415Any client can record the URI of a directory node in some external form (say,
     416in a local file) and use it as the starting point of later traversal. Each
     417Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first
     418start using the grid, and record its URI for later use.
     419
     420Mounting and Sharing Directories
     421================================
     422
     423The biggest benefit of this dirnode approach is that sharing individual
     424directories is almost trivial. Alice creates a subdirectory that she wants to
     425use to share files with Bob. This subdirectory is attached to Alice's
     426filesystem at "~alice/share-with-bob". She asks her filesystem for the
     427read-write directory URI for that new directory, and emails it to Bob. When
     428Bob receives the URI, he asks his own local vdrive to attach the given URI,
     429perhaps at a place named "~bob/shared-with-alice". Every time either party
     430writes a file into this directory, the other will be able to read it. If
     431Alice prefers, she can give a read-only URI to Bob instead, and then Bob will
     432be able to read files but not change the contents of the directory. Neither
     433Alice nor Bob will get access to any files above the mounted directory: there
     434are no 'parent directory' pointers. If Alice creates a nested set of
     435directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to
     436share-with-bob to Bob, then Bob will be unable to write to either
     437share-with-bob/ or subdir2/.
     438
     439A suitable UI needs to be created to allow users to easily perform this
     440sharing action: dragging a folder their vdrive to an IM or email user icon,
     441for example. The UI will need to give the sending user an opportunity to
     442indicate whether they want to grant read-write or read-only access to the
     443recipient. The recipient then needs an interface to drag the new folder into
     444their vdrive and give it a home.
     445
     446Revocation
     447==========
     448
     449When Alice decides that she no longer wants Bob to be able to access the
     450shared directory, what should she do? Suppose she's shared this folder with
     451both Bob and Carol, and now she wants Carol to retain access to it but Bob to
     452be shut out. Ideally Carol should not have to do anything: her access should
     453continue unabated.
     454
     455The current plan is to have her client create a deep copy of the folder in
     456question, delegate access to the new folder to the remaining members of the
     457group (Carol), asking the lucky survivors to replace their old reference with
     458the new one. Bob may still have access to the old folder, but he is now the
     459only one who cares: everyone else has moved on, and he will no longer be able
     460to see their new changes. In a strict sense, this is the strongest form of
     461revocation that can be accomplished: there is no point trying to force Bob to
     462forget about the files that he read a moment before being kicked out. In
     463addition it must be noted that anyone who can access the directory can proxy
     464for Bob, reading files to him and accepting changes whenever he wants.
     465Preventing delegation between communication parties is just as pointless as
     466asking Bob to forget previously accessed files. However, there may be value
     467to configuring the UI to ask Carol to not share files with Bob, or to
     468removing all files from Bob's view at the same time his access is revoked.
     469
  • deleted file docs/specifications/dirnodes.txt

    diff --git a/docs/specifications/dirnodes.txt b/docs/specifications/dirnodes.txt
    deleted file mode 100644
    index fad7641..0000000
    + -  
    1 
    2 = Tahoe-LAFS Directory Nodes =
    3 
    4 As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as
    5 a collection of three layers. The lowest layer is the key-value store: it
    6 provides operations that accept files and upload them to the grid, creating
    7 a URI in the process which securely references the file's contents.
    8 The middle layer is the filesystem, creating a structure of directories and
    9 filenames resembling the traditional unix/windows filesystems. The top layer
    10 is the application layer, which uses the lower layers to provide useful
    11 services to users, like a backup application, or a way to share files with
    12 friends.
    13 
    14 This document examines the middle layer, the "filesystem".
    15 
    16 == Key-value Store Primitives ==
    17 
    18 In the lowest layer (key-value store), there are two operations that reference
    19 immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or
    20 "CHK read-caps"). One puts data into the grid (but only if it doesn't exist
    21 already), the other retrieves it:
    22 
    23  chk_uri = put(data)
    24  data = get(chk_uri)
    25 
    26 We also have three operations which reference mutable data (which we refer to
    27 as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK
    28 slots"). One creates a slot with some initial contents, a second replaces the
    29 contents of a pre-existing slot, and the third retrieves the contents:
    30 
    31  mutable_uri = create(initial_data)
    32  replace(mutable_uri, new_data)
    33  data = get(mutable_uri)
    34 
    35 == Filesystem Goals ==
    36 
    37 The main goal for the middle (filesystem) layer is to give users a way to
    38 organize the data that they have uploaded into the grid. The traditional way
    39 to do this in computer filesystems is to put this data into files, give those
    40 files names, and collect these names into directories.
    41 
    42 Each directory is a set of name-entry pairs, each of which maps a "child name"
    43 to a directory entry pointing to an object of some kind. Those child objects
    44 might be files, or they might be other directories. Each directory entry also
    45 contains metadata.
    46 
    47 The directory structure is therefore a directed graph of nodes, in which each
    48 node might be a directory node or a file node. All file nodes are terminal
    49 nodes.
    50 
    51 == Dirnode Goals ==
    52 
    53 What properties might be desirable for these directory nodes? In no
    54 particular order:
    55 
    56  1: functional. Code which does not work doesn't count.
    57  2: easy to document, explain, and understand
    58  3: confidential: it should not be possible for others to see the contents of
    59                   a directory
    60  4: integrity: it should not be possible for others to modify the contents
    61                of a directory
    62  5: available: directories should survive host failure, just like files do
    63  6: efficient: in storage, communication bandwidth, number of round-trips
    64  7: easy to delegate individual directories in a flexible way
    65  8: updateness: everybody looking at a directory should see the same contents
    66  9: monotonicity: everybody looking at a directory should see the same
    67                   sequence of updates
    68 
    69 Some of these goals are mutually exclusive. For example, availability and
    70 consistency are opposing, so it is not possible to achieve #5 and #8 at the
    71 same time. Moreover, it takes a more complex architecture to get close to the
    72 available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.
    73 
    74 Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key
    75 cryptography for integrity, and erasure coding for availability. These
    76 achieve roughly the same properties as immutable CHK files, but their
    77 contents can be replaced without changing their identity. Dirnodes are then
    78 just a special way of interpreting the contents of a specific mutable file.
    79 Earlier releases used a "vdrive server": this server was abolished in the
    80 v0.7.0 release.
    81 
    82 For details of how mutable files work, please see "mutable.txt" in this
    83 directory.
    84 
    85 For releases since v0.7.0, we achieve most of our desired properties. The
    86 integrity and availability of dirnodes is equivalent to that of regular
    87 (immutable) files, with the exception that there are more simultaneous-update
    88 failure modes for mutable slots. Delegation is quite strong: you can give
    89 read-write or read-only access to any subtree, and the data format used for
    90 dirnodes is such that read-only access is transitive: i.e. if you grant Bob
    91 read-only access to a parent directory, then Bob will get read-only access
    92 (and *not* read-write access) to its children.
    93 
    94 Relative to the previous "vdrive-server" based scheme, the current
    95 distributed dirnode approach gives better availability, but cannot guarantee
    96 updateness quite as well, and requires far more network traffic for each
    97 retrieval and update. Mutable files are somewhat less available than
    98 immutable files, simply because of the increased number of combinations
    99 (shares of an immutable file are either present or not, whereas there are
    100 multiple versions of each mutable file, and you might have some shares of
    101 version 1 and other shares of version 2). In extreme cases of simultaneous
    102 update, mutable files might suffer from non-monotonicity.
    103 
    104 
    105 == Dirnode secret values ==
    106 
    107 As mentioned before, dirnodes are simply a special way to interpret the
    108 contents of a mutable file, so the secret keys and capability strings
    109 described in "mutable.txt" are all the same. Each dirnode contains an RSA
    110 public/private keypair, and the holder of the "write capability" will be able
    111 to retrieve the private key (as well as the AES encryption key used for the
    112 data itself). The holder of the "read capability" will be able to obtain the
    113 public key and the AES data key, but not the RSA private key needed to modify
    114 the data.
    115 
    116 The "write capability" for a dirnode grants read-write access to its
    117 contents. This is expressed on concrete form as the "dirnode write cap": a
    118 printable string which contains the necessary secrets to grant this access.
    119 Likewise, the "read capability" grants read-only access to a dirnode, and can
    120 be represented by a "dirnode read cap" string.
    121 
    122 For example,
    123 URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
    124 is a write-capability URI, while
    125 URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
    126 is a read-capability URI, both for the same dirnode.
    127 
    128 
    129 == Dirnode storage format ==
    130 
    131 Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS
    132 grid. The contents of this file are a serialized list of netstrings, one per
    133 child. Each child is a list of four netstrings: (name, rocap, rwcap,
    134 metadata). (Remember that the contents of the mutable file are encrypted by
    135 the read-cap, so this section describes the plaintext contents of the mutable
    136 file, *after* it has been decrypted by the read-cap.)
    137 
    138 The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only
    139 capability URI to that child, either an immutable (CHK) file, a mutable file,
    140 or a directory. It is also possible to store 'unknown' URIs that are not
    141 recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write
    142 capability URI for that child, encrypted with the dirnode's write-cap: this
    143 enables the "transitive readonlyness" property, described further below. The
    144 'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some
    145 metadata keys are pre-defined, the rest are left up to the application.
    146 
    147 Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random
    148 value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI
    149 string, using a key that is formed from a tagged hash of the IV and the
    150 dirnode's writekey. The MAC is written only for compatibility with older
    151 Tahoe-LAFS versions and is no longer verified.
    152 
    153 If Bob has read-only access to the 'bar' directory, and he adds it as a child
    154 to the 'foo' directory, then he will put the read-only cap for 'bar' in both
    155 the rwcap and rocap slots (encrypting the rwcap contents as described above).
    156 If he has full read-write access to 'bar', then he will put the read-write
    157 cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since
    158 other users who have read-only access to 'foo' will be unable to decrypt its
    159 rwcap slot, this limits those users to read-only access to 'bar' as well,
    160 thus providing the transitive readonlyness that we desire.
    161 
    162 === Dirnode sizes, mutable-file initial read sizes ===
    163 
    164 How big are dirnodes? When reading dirnode data out of mutable files, how
    165 large should our initial read be? If we guess exactly, we can read a dirnode
    166 in a single round-trip, and update one in two RTT. If we guess too high,
    167 we'll waste some amount of bandwidth. If we guess low, we need to make a
    168 second pass to get the data (or the encrypted privkey, for writes), which
    169 will cost us at least another RTT.
    170 
    171 Assuming child names are between 10 and 99 characters long, how long are the
    172 various pieces of a dirnode?
    173 
    174  netstring(name) ~= 4+len(name)
    175  chk-cap = 97 (for 4-char filesizes)
    176  dir-rw-cap = 88
    177  dir-ro-cap = 91
    178  netstring(cap) = 4+len(cap)
    179  encrypted(cap) = 16+cap+32
    180  JSON({}) = 2
    181  JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137
    182  netstring(metadata) = 4+137 = 141
    183 
    184 so a CHK entry is:
    185  5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137
    186 And a 15-byte filename gives a 416-byte entry. When the entry points at a
    187 subdirectory instead of a file, the entry is a little bit smaller. So an
    188 empty directory uses 0 bytes, a directory with one child uses about 416
    189 bytes, a directory with two children uses about 832, etc.
    190 
    191 When the dirnode data is encoding using our default 3-of-10, that means we
    192 get 139ish bytes of data in each share per child.
    193 
    194 The pubkey, signature, and hashes form the first 935ish bytes of the
    195 container, then comes our data, then about 1216 bytes of encprivkey. So if we
    196 read the first:
    197 
    198  1kB: we get 65bytes of dirnode data : only empty directories
    199  2kB: 1065bytes: about 8
    200  3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey
    201  4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey
    202 
    203 So we've written the code to do an initial read of 4kB from each share when
    204 we read the mutable file, which should give good performance (one RTT) for
    205 small directories.
    206 
    207 
    208 == Design Goals, redux ==
    209 
    210 How well does this design meet the goals?
    211 
    212  #1 functional: YES: the code works and has extensive unit tests
    213  #2 documentable: YES: this document is the existence proof
    214  #3 confidential: YES: see below
    215  #4 integrity: MOSTLY: a coalition of storage servers can rollback individual
    216                        mutable files, but not a single one. No server can
    217                        substitute fake data as genuine.
    218  #5 availability: YES: as long as 'k' storage servers are present and have
    219                        the same version of the mutable file, the dirnode will
    220                        be available.
    221  #6 efficient: MOSTLY:
    222       network: single dirnode lookup is very efficient, since clients can
    223                fetch specific keys rather than being required to get or set
    224                the entire dirnode each time. Traversing many directories
    225                takes a lot of roundtrips, and these can't be collapsed with
    226                promise-pipelining because the intermediate values must only
    227                be visible to the client. Modifying many dirnodes at once
    228                (e.g. importing a large pre-existing directory tree) is pretty
    229                slow, since each graph edge must be created independently.
    230       storage: each child has a separate IV, which makes them larger than
    231                if all children were aggregated into a single encrypted string
    232  #7 delegation: VERY: each dirnode is a completely independent object,
    233                 to which clients can be granted separate read-write or
    234                 read-only access
    235  #8 updateness: VERY: with only a single point of access, and no caching,
    236                 each client operation starts by fetching the current
    237                 value, so there are no opportunities for staleness
    238  #9 monotonicity: VERY: the single point of access also protects against
    239                   retrograde motion
    240      
    241 
    242 
    243 === Confidentiality leaks in the storage servers ===
    244 
    245 Dirnode (and the mutable files upon which they are based) are very private
    246 against other clients: traffic between the client and the storage servers is
    247 protected by the Foolscap SSL connection, so they can observe very little.
    248 Storage index values are hashes of secrets and thus unguessable, and they are
    249 not made public, so other clients cannot snoop through encrypted dirnodes
    250 that they have not been told about.
    251 
    252 Storage servers can observe access patterns and see ciphertext, but they
    253 cannot see the plaintext (of child names, metadata, or URIs). If an attacker
    254 operates a significant number of storage servers, they can infer the shape of
    255 the directory structure by assuming that directories are usually accessed
    256 from root to leaf in rapid succession. Since filenames are usually much
    257 shorter than read-caps and write-caps, the attacker can use the length of the
    258 ciphertext to guess the number of children of each node, and might be able to
    259 guess the length of the child names (or at least their sum). From this, the
    260 attacker may be able to build up a graph with the same shape as the plaintext
    261 filesystem, but with unlabeled edges and unknown file contents.
    262 
    263 
    264 === Integrity failures in the storage servers ===
    265 
    266 The mutable file's integrity mechanism (RSA signature on the hash of the file
    267 contents) prevents the storage server from modifying the dirnode's contents
    268 without detection. Therefore the storage servers can make the dirnode
    269 unavailable, but not corrupt it.
    270 
    271 A sufficient number of colluding storage servers can perform a rollback
    272 attack: replace all shares of the whole mutable file with an earlier version.
    273 To prevent this, when retrieving the contents of a mutable file, the
    274 client queries more servers than necessary and uses the highest available
    275 version number. This insures that one or two misbehaving storage servers
    276 cannot cause this rollback on their own.
    277 
    278 
    279 === Improving the efficiency of dirnodes ===
    280 
    281 The current mutable-file -based dirnode scheme suffers from certain
    282 inefficiencies. A very large directory (with thousands or millions of
    283 children) will take a significant time to extract any single entry, because
    284 the whole file must be downloaded first, then parsed and searched to find the
    285 desired child entry. Likewise, modifying a single child will require the
    286 whole file to be re-uploaded.
    287 
    288 The current design assumes (and in some cases, requires) that dirnodes remain
    289 small. The mutable files on which dirnodes are based are currently using
    290 "SDMF" ("Small Distributed Mutable File") design rules, which state that the
    291 size of the data shall remain below one megabyte. More advanced forms of
    292 mutable files (MDMF and LDMF) are in the design phase to allow efficient
    293 manipulation of larger mutable files. This would reduce the work needed to
    294 modify a single entry in a large directory.
    295 
    296 Judicious caching may help improve the reading-large-directory case. Some
    297 form of mutable index at the beginning of the dirnode might help as well. The
    298 MDMF design rules allow for efficient random-access reads from the middle of
    299 the file, which would give the index something useful to point at.
    300 
    301 The current SDMF design generates a new RSA public/private keypair for each
    302 directory. This takes considerable time and CPU effort, generally one or two
    303 seconds per directory. We have designed (but not yet built) a DSA-based
    304 mutable file scheme which will use shared parameters to reduce the
    305 directory-creation effort to a bare minimum (picking a random number instead
    306 of generating two random primes).
    307 
    308 
    309 When a backup program is run for the first time, it needs to copy a large
    310 amount of data from a pre-existing filesystem into reliable storage. This
    311 means that a large and complex directory structure needs to be duplicated in
    312 the dirnode layer. With the one-object-per-dirnode approach described here,
    313 this requires as many operations as there are edges in the imported
    314 filesystem graph.
    315 
    316 Another approach would be to aggregate multiple directories into a single
    317 storage object. This object would contain a serialized graph rather than a
    318 single name-to-child dictionary. Most directory operations would fetch the
    319 whole block of data (and presumeably cache it for a while to avoid lots of
    320 re-fetches), and modification operations would need to replace the whole
    321 thing at once. This "realm" approach would have the added benefit of
    322 combining more data into a single encrypted bundle (perhaps hiding the shape
    323 of the graph from a determined attacker), and would reduce round-trips when
    324 performing deep directory traversals (assuming the realm was already cached).
    325 It would also prevent fine-grained rollback attacks from working: a coalition
    326 of storage servers could change the entire realm to look like an earlier
    327 state, but it could not independently roll back individual directories.
    328 
    329 The drawbacks of this aggregation would be that small accesses (adding a
    330 single child, looking up a single child) would require pulling or pushing a
    331 lot of unrelated data, increasing network overhead (and necessitating
    332 test-and-set semantics for the modification side, which increases the chances
    333 that a user operation will fail, making it more challenging to provide
    334 promises of atomicity to the user).
    335 
    336 It would also make it much more difficult to enable the delegation
    337 ("sharing") of specific directories. Since each aggregate "realm" provides
    338 all-or-nothing access control, the act of delegating any directory from the
    339 middle of the realm would require the realm first be split into the upper
    340 piece that isn't being shared and the lower piece that is. This splitting
    341 would have to be done in response to what is essentially a read operation,
    342 which is not traditionally supposed to be a high-effort action. On the other
    343 hand, it may be possible to aggregate the ciphertext, but use distinct
    344 encryption keys for each component directory, to get the benefits of both
    345 schemes at once.
    346 
    347 
    348 === Dirnode expiration and leases ===
    349 
    350 Dirnodes are created any time a client wishes to add a new directory. How
    351 long do they live? What's to keep them from sticking around forever, taking
    352 up space that nobody can reach any longer?
    353 
    354 Mutable files are created with limited-time "leases", which keep the shares
    355 alive until the last lease has expired or been cancelled. Clients which know
    356 and care about specific dirnodes can ask to keep them alive for a while, by
    357 renewing a lease on them (with a typical period of one month). Clients are
    358 expected to assist in the deletion of dirnodes by canceling their leases as
    359 soon as they are done with them. This means that when a client deletes a
    360 directory, it should also cancel its lease on that directory. When the lease
    361 count on a given share goes to zero, the storage server can delete the
    362 related storage. Multiple clients may all have leases on the same dirnode:
    363 the server may delete the shares only after all of the leases have gone away.
    364 
    365 We expect that clients will periodically create a "manifest": a list of
    366 so-called "refresh capabilities" for all of the dirnodes and files that they
    367 can reach. They will give this manifest to the "repairer", which is a service
    368 that keeps files (and dirnodes) alive on behalf of clients who cannot take on
    369 this responsibility for themselves. These refresh capabilities include the
    370 storage index, but do *not* include the readkeys or writekeys, so the
    371 repairer does not get to read the files or directories that it is helping to
    372 keep alive.
    373 
    374 After each change to the user's vdrive, the client creates a manifest and
    375 looks for differences from their previous version. Anything which was removed
    376 prompts the client to send out lease-cancellation messages, allowing the data
    377 to be deleted.
    378 
    379 
    380 == Starting Points: root dirnodes ==
    381 
    382 Any client can record the URI of a directory node in some external form (say,
    383 in a local file) and use it as the starting point of later traversal. Each
    384 Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first
    385 start using the grid, and record its URI for later use.
    386 
    387 == Mounting and Sharing Directories ==
    388 
    389 The biggest benefit of this dirnode approach is that sharing individual
    390 directories is almost trivial. Alice creates a subdirectory that she wants to
    391 use to share files with Bob. This subdirectory is attached to Alice's
    392 filesystem at "~alice/share-with-bob". She asks her filesystem for the
    393 read-write directory URI for that new directory, and emails it to Bob. When
    394 Bob receives the URI, he asks his own local vdrive to attach the given URI,
    395 perhaps at a place named "~bob/shared-with-alice". Every time either party
    396 writes a file into this directory, the other will be able to read it. If
    397 Alice prefers, she can give a read-only URI to Bob instead, and then Bob will
    398 be able to read files but not change the contents of the directory. Neither
    399 Alice nor Bob will get access to any files above the mounted directory: there
    400 are no 'parent directory' pointers. If Alice creates a nested set of
    401 directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to
    402 share-with-bob to Bob, then Bob will be unable to write to either
    403 share-with-bob/ or subdir2/.
    404 
    405 A suitable UI needs to be created to allow users to easily perform this
    406 sharing action: dragging a folder their vdrive to an IM or email user icon,
    407 for example. The UI will need to give the sending user an opportunity to
    408 indicate whether they want to grant read-write or read-only access to the
    409 recipient. The recipient then needs an interface to drag the new folder into
    410 their vdrive and give it a home.
    411 
    412 == Revocation ==
    413 
    414 When Alice decides that she no longer wants Bob to be able to access the
    415 shared directory, what should she do? Suppose she's shared this folder with
    416 both Bob and Carol, and now she wants Carol to retain access to it but Bob to
    417 be shut out. Ideally Carol should not have to do anything: her access should
    418 continue unabated.
    419 
    420 The current plan is to have her client create a deep copy of the folder in
    421 question, delegate access to the new folder to the remaining members of the
    422 group (Carol), asking the lucky survivors to replace their old reference with
    423 the new one. Bob may still have access to the old folder, but he is now the
    424 only one who cares: everyone else has moved on, and he will no longer be able
    425 to see their new changes. In a strict sense, this is the strongest form of
    426 revocation that can be accomplished: there is no point trying to force Bob to
    427 forget about the files that he read a moment before being kicked out. In
    428 addition it must be noted that anyone who can access the directory can proxy
    429 for Bob, reading files to him and accepting changes whenever he wants.
    430 Preventing delegation between communication parties is just as pointless as
    431 asking Bob to forget previously accessed files. However, there may be value
    432 to configuring the UI to ask Carol to not share files with Bob, or to
    433 removing all files from Bob's view at the same time his access is revoked.
    434 
  • new file docs/specifications/file-encoding.rst

    diff --git a/docs/specifications/file-encoding.rst b/docs/specifications/file-encoding.rst
    new file mode 100644
    index 0000000..1f2ee74
    - +  
     1=============
     2File Encoding
     3=============
     4
     5When the client wishes to upload an immutable file, the first step is to
     6decide upon an encryption key. There are two methods: convergent or random.
     7The goal of the convergent-key method is to make sure that multiple uploads
     8of the same file will result in only one copy on the grid, whereas the
     9random-key method does not provide this "convergence" feature.
     10
     11The convergent-key method computes the SHA-256d hash of a single-purpose tag,
     12the encoding parameters, a "convergence secret", and the contents of the
     13file. It uses a portion of the resulting hash as the AES encryption key.
     14There are security concerns with using convergence this approach (the
     15"partial-information guessing attack", please see ticket #365 for some
     16references), so Tahoe uses a separate (randomly-generated) "convergence
     17secret" for each node, stored in NODEDIR/private/convergence . The encoding
     18parameters (k, N, and the segment size) are included in the hash to make sure
     19that two different encodings of the same file will get different keys. This
     20method requires an extra IO pass over the file, to compute this key, and
     21encryption cannot be started until the pass is complete. This means that the
     22convergent-key method will require at least two total passes over the file.
     23
     24The random-key method simply chooses a random encryption key. Convergence is
     25disabled, however this method does not require a separate IO pass, so upload
     26can be done with a single pass. This mode makes it easier to perform
     27streaming upload.
     28
     29Regardless of which method is used to generate the key, the plaintext file is
     30encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
     31then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
     32are generated as the encryption proceeds: a flat hash of the whole
     33ciphertext, and a Merkle tree. These are used to verify the correctness of
     34the erasure decoding step, and can be used by a "verifier" process to make
     35sure the file is intact without requiring the decryption key.
     36
     37The encryption key is hashed (with SHA-256d and a single-purpose tag) to
     38produce the "Storage Index". This Storage Index (or SI) is used to identify
     39the shares produced by the method described below. The grid can be thought of
     40as a large table that maps Storage Index to a ciphertext. Since the
     41ciphertext is stored as erasure-coded shares, it can also be thought of as a
     42table that maps SI to shares.
     43
     44Anybody who knows a Storage Index can retrieve the associated ciphertext:
     45ciphertexts are not secret.
     46
     47.. image:: file-encoding1.svg
     48
     49The ciphertext file is then broken up into segments. The last segment is
     50likely to be shorter than the rest. Each segment is erasure-coded into a
     51number of "blocks". This takes place one segment at a time. (In fact,
     52encryption and erasure-coding take place at the same time, once per plaintext
     53segment). Larger segment sizes result in less overhead overall, but increase
     54both the memory footprint and the "alacrity" (the number of bytes we have to
     55receive before we can deliver validated plaintext to the user). The current
     56default segment size is 128KiB.
     57
     58One block from each segment is sent to each shareholder (aka leaseholder,
     59aka landlord, aka storage node, aka peer). The "share" held by each remote
     60shareholder is nominally just a collection of these blocks. The file will
     61be recoverable when a certain number of shares have been retrieved.
     62
     63.. image:: file-encoding2.svg
     64
     65The blocks are hashed as they are generated and transmitted. These
     66block hashes are put into a Merkle hash tree. When the last share has been
     67created, the merkle tree is completed and delivered to the peer. Later, when
     68we retrieve these blocks, the peer will send many of the merkle hash tree
     69nodes ahead of time, so we can validate each block independently.
     70
     71The root of this block hash tree is called the "block root hash" and
     72used in the next step.
     73
     74.. image:: file-encoding3.svg
     75
     76There is a higher-level Merkle tree called the "share hash tree". Its leaves
     77are the block root hashes from each share. The root of this tree is called
     78the "share root hash" and is included in the "URI Extension Block", aka UEB.
     79The ciphertext hash and Merkle tree are also put here, along with the
     80original file size, and the encoding parameters. The UEB contains all the
     81non-secret values that could be put in the URI, but would have made the URI
     82too big. So instead, the UEB is stored with the share, and the hash of the
     83UEB is put in the URI.
     84
     85The URI then contains the secret encryption key and the UEB hash. It also
     86contains the basic encoding parameters (k and N) and the file size, to make
     87download more efficient (by knowing the number of required shares ahead of
     88time, sufficient download queries can be generated in parallel).
     89
     90The URI (also known as the immutable-file read-cap, since possessing it
     91grants the holder the capability to read the file's plaintext) is then
     92represented as a (relatively) short printable string like so::
     93
     94 URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
     95
     96.. image:: file-encoding4.svg
     97
     98During download, when a peer begins to transmit a share, it first transmits
     99all of the parts of the share hash tree that are necessary to validate its
     100block root hash. Then it transmits the portions of the block hash tree
     101that are necessary to validate the first block. Then it transmits the
     102first block. It then continues this loop: transmitting any portions of the
     103block hash tree to validate block#N, then sending block#N.
     104
     105.. image:: file-encoding5.svg
     106
     107So the "share" that is sent to the remote peer actually consists of three
     108pieces, sent in a specific order as they become available, and retrieved
     109during download in a different order according to when they are needed.
     110
     111The first piece is the blocks themselves, one per segment. The last
     112block will likely be shorter than the rest, because the last segment is
     113probably shorter than the rest. The second piece is the block hash tree,
     114consisting of a total of two SHA-1 hashes per block. The third piece is a
     115hash chain from the share hash tree, consisting of log2(numshares) hashes.
     116
     117During upload, all blocks are sent first, followed by the block hash
     118tree, followed by the share hash chain. During download, the share hash chain
     119is delivered first, followed by the block root hash. The client then uses
     120the hash chain to validate the block root hash. Then the peer delivers
     121enough of the block hash tree to validate the first block, followed by
     122the first block itself. The block hash chain is used to validate the
     123block, then it is passed (along with the first block from several other
     124peers) into decoding, to produce the first segment of crypttext, which is
     125then decrypted to produce the first segment of plaintext, which is finally
     126delivered to the user.
     127
     128.. image:: file-encoding6.svg
     129
     130Hashes
     131======
     132
     133All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
     134and Schneier). All hashes use a single-purpose tag, e.g. the hash that
     135converts an encryption key into a storage index is defined as follows::
     136
     137 SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
     138
     139When two separate values need to be combined together in a hash, we wrap each
     140in a netstring.
     141
     142Using SHA-256d (instead of plain SHA-256) guards against length-extension
     143attacks. Using the tag protects our Merkle trees against attacks in which the
     144hash of a leaf is confused with a hash of two children (allowing an attacker
     145to generate corrupted data that nevertheless appears to be valid), and is
     146simply good "cryptograhic hygiene". The `"Chosen Protocol Attack" by Kelsey,
     147Schneier, and Wagner <http://www.schneier.com/paper-chosen-protocol.html>`_ is
     148relevant. Putting the tag in a netstring guards against attacks that seek to
     149confuse the end of the tag with the beginning of the subsequent value.
     150
  • deleted file docs/specifications/file-encoding.txt

    diff --git a/docs/specifications/file-encoding.txt b/docs/specifications/file-encoding.txt
    deleted file mode 100644
    index 23862ea..0000000
    + -  
    1 
    2 == FileEncoding ==
    3 
    4 When the client wishes to upload an immutable file, the first step is to
    5 decide upon an encryption key. There are two methods: convergent or random.
    6 The goal of the convergent-key method is to make sure that multiple uploads
    7 of the same file will result in only one copy on the grid, whereas the
    8 random-key method does not provide this "convergence" feature.
    9 
    10 The convergent-key method computes the SHA-256d hash of a single-purpose tag,
    11 the encoding parameters, a "convergence secret", and the contents of the
    12 file. It uses a portion of the resulting hash as the AES encryption key.
    13 There are security concerns with using convergence this approach (the
    14 "partial-information guessing attack", please see ticket #365 for some
    15 references), so Tahoe uses a separate (randomly-generated) "convergence
    16 secret" for each node, stored in NODEDIR/private/convergence . The encoding
    17 parameters (k, N, and the segment size) are included in the hash to make sure
    18 that two different encodings of the same file will get different keys. This
    19 method requires an extra IO pass over the file, to compute this key, and
    20 encryption cannot be started until the pass is complete. This means that the
    21 convergent-key method will require at least two total passes over the file.
    22 
    23 The random-key method simply chooses a random encryption key. Convergence is
    24 disabled, however this method does not require a separate IO pass, so upload
    25 can be done with a single pass. This mode makes it easier to perform
    26 streaming upload.
    27 
    28 Regardless of which method is used to generate the key, the plaintext file is
    29 encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
    30 then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
    31 are generated as the encryption proceeds: a flat hash of the whole
    32 ciphertext, and a Merkle tree. These are used to verify the correctness of
    33 the erasure decoding step, and can be used by a "verifier" process to make
    34 sure the file is intact without requiring the decryption key.
    35 
    36 The encryption key is hashed (with SHA-256d and a single-purpose tag) to
    37 produce the "Storage Index". This Storage Index (or SI) is used to identify
    38 the shares produced by the method described below. The grid can be thought of
    39 as a large table that maps Storage Index to a ciphertext. Since the
    40 ciphertext is stored as erasure-coded shares, it can also be thought of as a
    41 table that maps SI to shares.
    42 
    43 Anybody who knows a Storage Index can retrieve the associated ciphertext:
    44 ciphertexts are not secret.
    45 
    46 
    47 [[Image(file-encoding1.png)]]
    48 
    49 The ciphertext file is then broken up into segments. The last segment is
    50 likely to be shorter than the rest. Each segment is erasure-coded into a
    51 number of "blocks". This takes place one segment at a time. (In fact,
    52 encryption and erasure-coding take place at the same time, once per plaintext
    53 segment). Larger segment sizes result in less overhead overall, but increase
    54 both the memory footprint and the "alacrity" (the number of bytes we have to
    55 receive before we can deliver validated plaintext to the user). The current
    56 default segment size is 128KiB.
    57 
    58 One block from each segment is sent to each shareholder (aka leaseholder,
    59 aka landlord, aka storage node, aka peer). The "share" held by each remote
    60 shareholder is nominally just a collection of these blocks. The file will
    61 be recoverable when a certain number of shares have been retrieved.
    62 
    63 [[Image(file-encoding2.png)]]
    64 
    65 The blocks are hashed as they are generated and transmitted. These
    66 block hashes are put into a Merkle hash tree. When the last share has been
    67 created, the merkle tree is completed and delivered to the peer. Later, when
    68 we retrieve these blocks, the peer will send many of the merkle hash tree
    69 nodes ahead of time, so we can validate each block independently.
    70 
    71 The root of this block hash tree is called the "block root hash" and
    72 used in the next step.
    73 
    74 [[Image(file-encoding3.png)]]
    75 
    76 There is a higher-level Merkle tree called the "share hash tree". Its leaves
    77 are the block root hashes from each share. The root of this tree is called
    78 the "share root hash" and is included in the "URI Extension Block", aka UEB.
    79 The ciphertext hash and Merkle tree are also put here, along with the
    80 original file size, and the encoding parameters. The UEB contains all the
    81 non-secret values that could be put in the URI, but would have made the URI
    82 too big. So instead, the UEB is stored with the share, and the hash of the
    83 UEB is put in the URI.
    84 
    85 The URI then contains the secret encryption key and the UEB hash. It also
    86 contains the basic encoding parameters (k and N) and the file size, to make
    87 download more efficient (by knowing the number of required shares ahead of
    88 time, sufficient download queries can be generated in parallel).
    89 
    90 The URI (also known as the immutable-file read-cap, since possessing it
    91 grants the holder the capability to read the file's plaintext) is then
    92 represented as a (relatively) short printable string like so:
    93 
    94  URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
    95 
    96 [[Image(file-encoding4.png)]]
    97 
    98 During download, when a peer begins to transmit a share, it first transmits
    99 all of the parts of the share hash tree that are necessary to validate its
    100 block root hash. Then it transmits the portions of the block hash tree
    101 that are necessary to validate the first block. Then it transmits the
    102 first block. It then continues this loop: transmitting any portions of the
    103 block hash tree to validate block#N, then sending block#N.
    104 
    105 [[Image(file-encoding5.png)]]
    106 
    107 So the "share" that is sent to the remote peer actually consists of three
    108 pieces, sent in a specific order as they become available, and retrieved
    109 during download in a different order according to when they are needed.
    110 
    111 The first piece is the blocks themselves, one per segment. The last
    112 block will likely be shorter than the rest, because the last segment is
    113 probably shorter than the rest. The second piece is the block hash tree,
    114 consisting of a total of two SHA-1 hashes per block. The third piece is a
    115 hash chain from the share hash tree, consisting of log2(numshares) hashes.
    116 
    117 During upload, all blocks are sent first, followed by the block hash
    118 tree, followed by the share hash chain. During download, the share hash chain
    119 is delivered first, followed by the block root hash. The client then uses
    120 the hash chain to validate the block root hash. Then the peer delivers
    121 enough of the block hash tree to validate the first block, followed by
    122 the first block itself. The block hash chain is used to validate the
    123 block, then it is passed (along with the first block from several other
    124 peers) into decoding, to produce the first segment of crypttext, which is
    125 then decrypted to produce the first segment of plaintext, which is finally
    126 delivered to the user.
    127 
    128 [[Image(file-encoding6.png)]]
    129 
    130 == Hashes ==
    131 
    132 All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
    133 and Schneier). All hashes use a single-purpose tag, e.g. the hash that
    134 converts an encryption key into a storage index is defined as follows:
    135 
    136  SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
    137 
    138 When two separate values need to be combined together in a hash, we wrap each
    139 in a netstring.
    140 
    141 Using SHA-256d (instead of plain SHA-256) guards against length-extension
    142 attacks. Using the tag protects our Merkle trees against attacks in which the
    143 hash of a leaf is confused with a hash of two children (allowing an attacker
    144 to generate corrupted data that nevertheless appears to be valid), and is
    145 simply good "cryptograhic hygiene". The "Chosen Protocol Attack" by Kelsey,
    146 Schneier, and Wagner (http://www.schneier.com/paper-chosen-protocol.html) is
    147 relevant. Putting the tag in a netstring guards against attacks that seek to
    148 confuse the end of the tag with the beginning of the subsequent value.
  • new file docs/specifications/mutable.rst

    diff --git a/docs/specifications/mutable.rst b/docs/specifications/mutable.rst
    new file mode 100644
    index 0000000..0d7e71e
    - +  
     1=============
     2Mutable Files
     3=============
     4
     5This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
     6
     71.  `Consistency vs. Availability`_
     82.  `The Prime Coordination Directive: "Don't Do That"`_
     93.  `Small Distributed Mutable Files`_
     10
     11    1. `SDMF slots overview`_
     12    2. `Server Storage Protocol`_
     13    3. `Code Details`_
     14    4. `SMDF Slot Format`_
     15    5. `Recovery`_
     16
     174.  `Medium Distributed Mutable Files`_
     185.  `Large Distributed Mutable Files`_
     196.  `TODO`_
     20
     21Mutable File Slots are places with a stable identifier that can hold data
     22that changes over time. In contrast to CHK slots, for which the
     23URI/identifier is derived from the contents themselves, the Mutable File Slot
     24URI remains fixed for the life of the slot, regardless of what data is placed
     25inside it.
     26
     27Each mutable slot is referenced by two different URIs. The "read-write" URI
     28grants read-write access to its holder, allowing them to put whatever
     29contents they like into the slot. The "read-only" URI is less powerful, only
     30granting read access, and not enabling modification of the data. The
     31read-write URI can be turned into the read-only URI, but not the other way
     32around.
     33
     34The data in these slots is distributed over a number of servers, using the
     35same erasure coding that CHK files use, with 3-of-10 being a typical choice
     36of encoding parameters. The data is encrypted and signed in such a way that
     37only the holders of the read-write URI will be able to set the contents of
     38the slot, and only the holders of the read-only URI will be able to read
     39those contents. Holders of either URI will be able to validate the contents
     40as being written by someone with the read-write URI. The servers who hold the
     41shares cannot read or modify them: the worst they can do is deny service (by
     42deleting or corrupting the shares), or attempt a rollback attack (which can
     43only succeed with the cooperation of at least k servers).
     44
     45Consistency vs. Availability
     46============================
     47
     48There is an age-old battle between consistency and availability. Epic papers
     49have been written, elaborate proofs have been established, and generations of
     50theorists have learned that you cannot simultaneously achieve guaranteed
     51consistency with guaranteed reliability. In addition, the closer to 0 you get
     52on either axis, the cost and complexity of the design goes up.
     53
     54Tahoe's design goals are to largely favor design simplicity, then slightly
     55favor read availability, over the other criteria.
     56
     57As we develop more sophisticated mutable slots, the API may expose multiple
     58read versions to the application layer. The tahoe philosophy is to defer most
     59consistency recovery logic to the higher layers. Some applications have
     60effective ways to merge multiple versions, so inconsistency is not
     61necessarily a problem (i.e. directory nodes can usually merge multiple "add
     62child" operations).
     63
     64The Prime Coordination Directive: "Don't Do That"
     65=================================================
     66
     67The current rule for applications which run on top of Tahoe is "do not
     68perform simultaneous uncoordinated writes". That means you need non-tahoe
     69means to make sure that two parties are not trying to modify the same mutable
     70slot at the same time. For example:
     71
     72* don't give the read-write URI to anyone else. Dirnodes in a private
     73  directory generally satisfy this case, as long as you don't use two
     74  clients on the same account at the same time
     75* if you give a read-write URI to someone else, stop using it yourself. An
     76  inbox would be a good example of this.
     77* if you give a read-write URI to someone else, call them on the phone
     78  before you write into it
     79* build an automated mechanism to have your agents coordinate writes.
     80  For example, we expect a future release to include a FURL for a
     81  "coordination server" in the dirnodes. The rule can be that you must
     82  contact the coordination server and obtain a lock/lease on the file
     83  before you're allowed to modify it.
     84
     85If you do not follow this rule, Bad Things will happen. The worst-case Bad
     86Thing is that the entire file will be lost. A less-bad Bad Thing is that one
     87or more of the simultaneous writers will lose their changes. An observer of
     88the file may not see monotonically-increasing changes to the file, i.e. they
     89may see version 1, then version 2, then 3, then 2 again.
     90
     91Tahoe takes some amount of care to reduce the badness of these Bad Things.
     92One way you can help nudge it from the "lose your file" case into the "lose
     93some changes" case is to reduce the number of competing versions: multiple
     94versions of the file that different parties are trying to establish as the
     95one true current contents. Each simultaneous writer counts as a "competing
     96version", as does the previous version of the file. If the count "S" of these
     97competing versions is larger than N/k, then the file runs the risk of being
     98lost completely. [TODO] If at least one of the writers remains running after
     99the collision is detected, it will attempt to recover, but if S>(N/k) and all
     100writers crash after writing a few shares, the file will be lost.
     101
     102Note that Tahoe uses serialization internally to make sure that a single
     103Tahoe node will not perform simultaneous modifications to a mutable file. It
     104accomplishes this by using a weakref cache of the MutableFileNode (so that
     105there will never be two distinct MutableFileNodes for the same file), and by
     106forcing all mutable file operations to obtain a per-node lock before they
     107run. The Prime Coordination Directive therefore applies to inter-node
     108conflicts, not intra-node ones.
     109
     110
     111Small Distributed Mutable Files
     112===============================
     113
     114SDMF slots are suitable for small (<1MB) files that are editing by rewriting
     115the entire file. The three operations are:
     116
     117 * allocate (with initial contents)
     118 * set (with new contents)
     119 * get (old contents)
     120
     121The first use of SDMF slots will be to hold directories (dirnodes), which map
     122encrypted child names to rw-URI/ro-URI pairs.
     123
     124SDMF slots overview
     125-------------------
     126
     127Each SDMF slot is created with a public/private key pair. The public key is
     128known as the "verification key", while the private key is called the
     129"signature key". The private key is hashed and truncated to 16 bytes to form
     130the "write key" (an AES symmetric key). The write key is then hashed and
     131truncated to form the "read key". The read key is hashed and truncated to
     132form the 16-byte "storage index" (a unique string used as an index to locate
     133stored data).
     134
     135The public key is hashed by itself to form the "verification key hash".
     136
     137The write key is hashed a different way to form the "write enabler master".
     138For each storage server on which a share is kept, the write enabler master is
     139concatenated with the server's nodeid and hashed, and the result is called
     140the "write enabler" for that particular server. Note that multiple shares of
     141the same slot stored on the same server will all get the same write enabler,
     142i.e. the write enabler is associated with the "bucket", rather than the
     143individual shares.
     144
     145The private key is encrypted (using AES in counter mode) by the write key,
     146and the resulting crypttext is stored on the servers. so it will be
     147retrievable by anyone who knows the write key. The write key is not used to
     148encrypt anything else, and the private key never changes, so we do not need
     149an IV for this purpose.
     150
     151The actual data is encrypted (using AES in counter mode) with a key derived
     152by concatenating the readkey with the IV, the hashing the results and
     153truncating to 16 bytes. The IV is randomly generated each time the slot is
     154updated, and stored next to the encrypted data.
     155
     156The read-write URI consists of the write key and the verification key hash.
     157The read-only URI contains the read key and the verification key hash. The
     158verify-only URI contains the storage index and the verification key hash.
     159
     160::
     161
     162 URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)
     163 URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)
     164 URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)
     165
     166Note that this allows the read-only and verify-only URIs to be derived from
     167the read-write URI without actually retrieving the public keys. Also note
     168that it means the read-write agent must validate both the private key and the
     169public key when they are first fetched. All users validate the public key in
     170exactly the same way.
     171
     172The SDMF slot is allocated by sending a request to the storage server with a
     173desired size, the storage index, and the write enabler for that server's
     174nodeid. If granted, the write enabler is stashed inside the slot's backing
     175store file. All further write requests must be accompanied by the write
     176enabler or they will not be honored. The storage server does not share the
     177write enabler with anyone else.
     178
     179The SDMF slot structure will be described in more detail below. The important
     180pieces are:
     181
     182* a sequence number
     183* a root hash "R"
     184* the encoding parameters (including k, N, file size, segment size)
     185* a signed copy of [seqnum,R,encoding_params], using the signature key
     186* the verification key (not encrypted)
     187* the share hash chain (part of a Merkle tree over the share hashes)
     188* the block hash tree (Merkle tree over blocks of share data)
     189* the share data itself (erasure-coding of read-key-encrypted file data)
     190* the signature key, encrypted with the write key
     191
     192The access pattern for read is:
     193
     194* hash read-key to get storage index
     195* use storage index to locate 'k' shares with identical 'R' values
     196
     197  * either get one share, read 'k' from it, then read k-1 shares
     198  * or read, say, 5 shares, discover k, either get more or be finished
     199  * or copy k into the URIs
     200
     201* read verification key
     202* hash verification key, compare against verification key hash
     203* read seqnum, R, encoding parameters, signature
     204* verify signature against verification key
     205* read share data, compute block-hash Merkle tree and root "r"
     206* read share hash chain (leading from "r" to "R")
     207* validate share hash chain up to the root "R"
     208* submit share data to erasure decoding
     209* decrypt decoded data with read-key
     210* submit plaintext to application
     211
     212The access pattern for write is:
     213
     214* hash write-key to get read-key, hash read-key to get storage index
     215* use the storage index to locate at least one share
     216* read verification key and encrypted signature key
     217* decrypt signature key using write-key
     218* hash signature key, compare against write-key
     219* hash verification key, compare against verification key hash
     220* encrypt plaintext from application with read-key
     221
     222  * application can encrypt some data with the write-key to make it only
     223    available to writers (use this for transitive read-onlyness of dirnodes)
     224
     225* erasure-code crypttext to form shares
     226* split shares into blocks
     227* compute Merkle tree of blocks, giving root "r" for each share
     228* compute Merkle tree of shares, find root "R" for the file as a whole
     229* create share data structures, one per server:
     230
     231  * use seqnum which is one higher than the old version
     232  * share hash chain has log(N) hashes, different for each server
     233  * signed data is the same for each server
     234
     235* now we have N shares and need homes for them
     236* walk through peers
     237
     238  * if share is not already present, allocate-and-set
     239  * otherwise, try to modify existing share:
     240  * send testv_and_writev operation to each one
     241  * testv says to accept share if their(seqnum+R) <= our(seqnum+R)
     242  * count how many servers wind up with which versions (histogram over R)
     243  * keep going until N servers have the same version, or we run out of servers
     244
     245    * if any servers wound up with a different version, report error to
     246      application
     247    * if we ran out of servers, initiate recovery process (described below)
     248
     249Server Storage Protocol
     250-----------------------
     251
     252The storage servers will provide a mutable slot container which is oblivious
     253to the details of the data being contained inside it. Each storage index
     254refers to a "bucket", and each bucket has one or more shares inside it. (In a
     255well-provisioned network, each bucket will have only one share). The bucket
     256is stored as a directory, using the base32-encoded storage index as the
     257directory name. Each share is stored in a single file, using the share number
     258as the filename.
     259
     260The container holds space for a container magic number (for versioning), the
     261write enabler, the nodeid which accepted the write enabler (used for share
     262migration, described below), a small number of lease structures, the embedded
     263data itself, and expansion space for additional lease structures::
     264
     265 #   offset    size    name
     266 1   0         32      magic verstr "tahoe mutable container v1" plus binary
     267 2   32        20      write enabler's nodeid
     268 3   52        32      write enabler
     269 4   84        8       data size (actual share data present) (a)
     270 5   92        8       offset of (8) count of extra leases (after data)
     271 6   100       368     four leases, 92 bytes each
     272                        0    4   ownerid (0 means "no lease here")
     273                        4    4   expiration timestamp
     274                        8   32   renewal token
     275                        40  32   cancel token
     276                        72  20   nodeid which accepted the tokens
     277 7   468       (a)     data
     278 8   ??        4       count of extra leases
     279 9   ??        n*92    extra leases
     280
     281The "extra leases" field must be copied and rewritten each time the size of
     282the enclosed data changes. The hope is that most buckets will have four or
     283fewer leases and this extra copying will not usually be necessary.
     284
     285The (4) "data size" field contains the actual number of bytes of data present
     286in field (7), such that a client request to read beyond 504+(a) will result
     287in an error. This allows the client to (one day) read relative to the end of
     288the file. The container size (that is, (8)-(7)) might be larger, especially
     289if extra size was pre-allocated in anticipation of filling the container with
     290a lot of data.
     291
     292The offset in (5) points at the *count* of extra leases, at (8). The actual
     293leases (at (9)) begin 4 bytes later. If the container size changes, both (8)
     294and (9) must be relocated by copying.
     295
     296The server will honor any write commands that provide the write token and do
     297not exceed the server-wide storage size limitations. Read and write commands
     298MUST be restricted to the 'data' portion of the container: the implementation
     299of those commands MUST perform correct bounds-checking to make sure other
     300portions of the container are inaccessible to the clients.
     301
     302The two methods provided by the storage server on these "MutableSlot" share
     303objects are:
     304
     305* readv(ListOf(offset=int, length=int))
     306
     307  * returns a list of bytestrings, of the various requested lengths
     308  * offset < 0 is interpreted relative to the end of the data
     309  * spans which hit the end of the data will return truncated data
     310
     311* testv_and_writev(write_enabler, test_vector, write_vector)
     312
     313  * this is a test-and-set operation which performs the given tests and only
     314    applies the desired writes if all tests succeed. This is used to detect
     315    simultaneous writers, and to reduce the chance that an update will lose
     316    data recently written by some other party (written after the last time
     317    this slot was read).
     318  * test_vector=ListOf(TupleOf(offset, length, opcode, specimen))
     319  * the opcode is a string, from the set [gt, ge, eq, le, lt, ne]
     320  * each element of the test vector is read from the slot's data and
     321    compared against the specimen using the desired (in)equality. If all
     322    tests evaluate True, the write is performed
     323  * write_vector=ListOf(TupleOf(offset, newdata))
     324
     325    * offset < 0 is not yet defined, it probably means relative to the
     326      end of the data, which probably means append, but we haven't nailed
     327      it down quite yet
     328    * write vectors are executed in order, which specifies the results of
     329      overlapping writes
     330
     331  * return value:
     332
     333    * error: OutOfSpace
     334    * error: something else (io error, out of memory, whatever)
     335    * (True, old_test_data): the write was accepted (test_vector passed)
     336    * (False, old_test_data): the write was rejected (test_vector failed)
     337
     338      * both 'accepted' and 'rejected' return the old data that was used
     339        for the test_vector comparison. This can be used by the client
     340        to detect write collisions, including collisions for which the
     341        desired behavior was to overwrite the old version.
     342
     343In addition, the storage server provides several methods to access these
     344share objects:
     345
     346* allocate_mutable_slot(storage_index, sharenums=SetOf(int))
     347
     348  * returns DictOf(int, MutableSlot)
     349
     350* get_mutable_slot(storage_index)
     351
     352  * returns DictOf(int, MutableSlot)
     353  * or raises KeyError
     354
     355We intend to add an interface which allows small slots to allocate-and-write
     356in a single call, as well as do update or read in a single call. The goal is
     357to allow a reasonably-sized dirnode to be created (or updated, or read) in
     358just one round trip (to all N shareholders in parallel).
     359
     360migrating shares
     361````````````````
     362
     363If a share must be migrated from one server to another, two values become
     364invalid: the write enabler (since it was computed for the old server), and
     365the lease renew/cancel tokens.
     366
     367Suppose that a slot was first created on nodeA, and was thus initialized with
     368WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is
     369moved from nodeA to nodeB.
     370
     371Readers may still be able to find the share in its new home, depending upon
     372how many servers are present in the grid, where the new nodeid lands in the
     373permuted index for this particular storage index, and how many servers the
     374reading client is willing to contact.
     375
     376When a client attempts to write to this migrated share, it will get a "bad
     377write enabler" error, since the WE it computes for nodeB will not match the
     378WE(nodeA) that was embedded in the share. When this occurs, the "bad write
     379enabler" message must include the old nodeid (e.g. nodeA) that was in the
     380share.
     381
     382The client then computes H(nodeB+H(WEM+nodeA)), which is the same as
     383H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which
     384is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to
     385anyone else. Also note that the client does not send a value to nodeB that
     386would allow the node to impersonate the client to a third node: everything
     387sent to nodeB will include something specific to nodeB in it.
     388
     389The server locally computes H(nodeB+WE(nodeA)), using its own node id and the
     390old write enabler from the share. It compares this against the value supplied
     391by the client. If they match, this serves as proof that the client was able
     392to compute the old write enabler. The server then accepts the client's new
     393WE(nodeB) and writes it into the container.
     394
     395This WE-fixup process requires an extra round trip, and requires the error
     396message to include the old nodeid, but does not require any public key
     397operations on either client or server.
     398
     399Migrating the leases will require a similar protocol. This protocol will be
     400defined concretely at a later date.
     401
     402Code Details
     403------------
     404
     405The MutableFileNode class is used to manipulate mutable files (as opposed to
     406ImmutableFileNodes). These are initially generated with
     407client.create_mutable_file(), and later recreated from URIs with
     408client.create_node_from_uri(). Instances of this class will contain a URI and
     409a reference to the client (for peer selection and connection).
     410
     411NOTE: this section is out of date. Please see src/allmydata/interfaces.py
     412(the section on IMutableFilesystemNode) for more accurate information.
     413
     414The methods of MutableFileNode are:
     415
     416* download_to_data() -> [deferred] newdata, NotEnoughSharesError
     417
     418  * if there are multiple retrieveable versions in the grid, get() returns
     419    the first version it can reconstruct, and silently ignores the others.
     420    In the future, a more advanced API will signal and provide access to
     421    the multiple heads.
     422
     423* update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
     424* overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
     425
     426download_to_data() causes a new retrieval to occur, pulling the current
     427contents from the grid and returning them to the caller. At the same time,
     428this call caches information about the current version of the file. This
     429information will be used in a subsequent call to update(), and if another
     430change has occured between the two, this information will be out of date,
     431triggering the UncoordinatedWriteError.
     432
     433update() is therefore intended to be used just after a download_to_data(), in
     434the following pattern::
     435
     436 d = mfn.download_to_data()
     437 d.addCallback(apply_delta)
     438 d.addCallback(mfn.update)
     439
     440If the update() call raises UCW, then the application can simply return an
     441error to the user ("you violated the Prime Coordination Directive"), and they
     442can try again later. Alternatively, the application can attempt to retry on
     443its own. To accomplish this, the app needs to pause, download the new
     444(post-collision and post-recovery) form of the file, reapply their delta,
     445then submit the update request again. A randomized pause is necessary to
     446reduce the chances of colliding a second time with another client that is
     447doing exactly the same thing::
     448
     449 d = mfn.download_to_data()
     450 d.addCallback(apply_delta)
     451 d.addCallback(mfn.update)
     452 def _retry(f):
     453   f.trap(UncoordinatedWriteError)
     454   d1 = pause(random.uniform(5, 20))
     455   d1.addCallback(lambda res: mfn.download_to_data())
     456   d1.addCallback(apply_delta)
     457   d1.addCallback(mfn.update)
     458   return d1
     459 d.addErrback(_retry)
     460
     461Enthusiastic applications can retry multiple times, using a randomized
     462exponential backoff between each. A particularly enthusiastic application can
     463retry forever, but such apps are encouraged to provide a means to the user of
     464giving up after a while.
     465
     466UCW does not mean that the update was not applied, so it is also a good idea
     467to skip the retry-update step if the delta was already applied::
     468
     469 d = mfn.download_to_data()
     470 d.addCallback(apply_delta)
     471 d.addCallback(mfn.update)
     472 def _retry(f):
     473   f.trap(UncoordinatedWriteError)
     474   d1 = pause(random.uniform(5, 20))
     475   d1.addCallback(lambda res: mfn.download_to_data())
     476   def _maybe_apply_delta(contents):
     477     new_contents = apply_delta(contents)
     478     if new_contents != contents:
     479       return mfn.update(new_contents)
     480   d1.addCallback(_maybe_apply_delta)
     481   return d1
     482 d.addErrback(_retry)
     483
     484update() is the right interface to use for delta-application situations, like
     485directory nodes (in which apply_delta might be adding or removing child
     486entries from a serialized table).
     487
     488Note that any uncoordinated write has the potential to lose data. We must do
     489more analysis to be sure, but it appears that two clients who write to the
     490same mutable file at the same time (even if both eventually retry) will, with
     491high probability, result in one client observing UCW and the other silently
     492losing their changes. It is also possible for both clients to observe UCW.
     493The moral of the story is that the Prime Coordination Directive is there for
     494a reason, and that recovery/UCW/retry is not a subsitute for write
     495coordination.
     496
     497overwrite() tells the client to ignore this cached version information, and
     498to unconditionally replace the mutable file's contents with the new data.
     499This should not be used in delta application, but rather in situations where
     500you want to replace the file's contents with completely unrelated ones. When
     501raw files are uploaded into a mutable slot through the tahoe webapi (using
     502POST and the ?mutable=true argument), they are put in place with overwrite().
     503
     504The peer-selection and data-structure manipulation (and signing/verification)
     505steps will be implemented in a separate class in allmydata/mutable.py .
     506
     507SMDF Slot Format
     508----------------
     509
     510This SMDF data lives inside a server-side MutableSlot container. The server
     511is oblivious to this format.
     512
     513This data is tightly packed. In particular, the share data is defined to run
     514all the way to the beginning of the encrypted private key (the encprivkey
     515offset is used both to terminate the share data and to begin the encprivkey).
     516
     517::
     518
     519  #    offset   size    name
     520  1    0        1       version byte, \x00 for this format
     521  2    1        8       sequence number. 2^64-1 must be handled specially, TBD
     522  3    9        32      "R" (root of share hash Merkle tree)
     523  4    41       16      IV (share data is AES(H(readkey+IV)) )
     524  5    57       18      encoding parameters:
     525        57       1        k
     526        58       1        N
     527        59       8        segment size
     528        67       8        data length (of original plaintext)
     529  6    75       32      offset table:
     530        75       4        (8) signature
     531        79       4        (9) share hash chain
     532        83       4        (10) block hash tree
     533        87       4        (11) share data
     534        91       8        (12) encrypted private key
     535        99       8        (13) EOF
     536  7    107      436ish  verification key (2048 RSA key)
     537  8    543ish   256ish  signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm))
     538  9    799ish   (a)     share hash chain, encoded as:
     539                         "".join([pack(">H32s", shnum, hash)
     540                                  for (shnum,hash) in needed_hashes])
     541 10    (927ish) (b)     block hash tree, encoded as:
     542                         "".join([pack(">32s",hash) for hash in block_hash_tree])
     543 11    (935ish) LEN     share data (no gap between this and encprivkey)
     544 12    ??       1216ish encrypted private key= AESenc(write-key, RSA-key)
     545 13    ??       --      EOF
     546
     547 (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.
     548    This is the set of hashes necessary to validate this share's leaf in the
     549    share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.
     550 (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes
     551    long. This is the set of hashes necessary to validate any given block of
     552    share data up to the per-share root "r". Each "r" is a leaf of the share
     553    has tree (with root "R"), from which a minimal subset of hashes is put in
     554    the share hash chain in (8).
     555
     556Recovery
     557--------
     558
     559The first line of defense against damage caused by colliding writes is the
     560Prime Coordination Directive: "Don't Do That".
     561
     562The second line of defense is to keep "S" (the number of competing versions)
     563lower than N/k. If this holds true, at least one competing version will have
     564k shares and thus be recoverable. Note that server unavailability counts
     565against us here: the old version stored on the unavailable server must be
     566included in the value of S.
     567
     568The third line of defense is our use of testv_and_writev() (described below),
     569which increases the convergence of simultaneous writes: one of the writers
     570will be favored (the one with the highest "R"), and that version is more
     571likely to be accepted than the others. This defense is least effective in the
     572pathological situation where S simultaneous writers are active, the one with
     573the lowest "R" writes to N-k+1 of the shares and then dies, then the one with
     574the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the
     575one with the highest "R" writes to k-1 shares and dies. Any other sequencing
     576will allow the highest "R" to write to at least k shares and establish a new
     577revision.
     578
     579The fourth line of defense is the fact that each client keeps writing until
     580at least one version has N shares. This uses additional servers, if
     581necessary, to make sure that either the client's version or some
     582newer/overriding version is highly available.
     583
     584The fifth line of defense is the recovery algorithm, which seeks to make sure
     585that at least *one* version is highly available, even if that version is
     586somebody else's.
     587
     588The write-shares-to-peers algorithm is as follows:
     589
     590* permute peers according to storage index
     591* walk through peers, trying to assign one share per peer
     592* for each peer:
     593
     594  * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test
     595
     596    * this means that we will overwrite any old versions, and we will
     597      overwrite simultaenous writers of the same version if our R is higher.
     598      We will not overwrite writers using a higher seqnum.
     599
     600  * record the version that each share winds up with. If the write was
     601    accepted, this is our own version. If it was rejected, read the
     602    old_test_data to find out what version was retained.
     603  * if old_test_data indicates the seqnum was equal or greater than our
     604    own, mark the "Simultanous Writes Detected" flag, which will eventually
     605    result in an error being reported to the writer (in their close() call).
     606  * build a histogram of "R" values
     607  * repeat until the histogram indicate that some version (possibly ours)
     608    has N shares. Use new servers if necessary.
     609  * If we run out of servers:
     610
     611    * if there are at least shares-of-happiness of any one version, we're
     612      happy, so return. (the close() might still get an error)
     613    * not happy, need to reinforce something, goto RECOVERY
     614
     615Recovery:
     616
     617* read all shares, count the versions, identify the recoverable ones,
     618  discard the unrecoverable ones.
     619* sort versions: locate max(seqnums), put all versions with that seqnum
     620  in the list, sort by number of outstanding shares. Then put our own
     621  version. (TODO: put versions with seqnum <max but >us ahead of us?).
     622* for each version:
     623
     624  * attempt to recover that version
     625  * if not possible, remove it from the list, go to next one
     626  * if recovered, start at beginning of peer list, push that version,
     627    continue until N shares are placed
     628  * if pushing our own version, bump up the seqnum to one higher than
     629    the max seqnum we saw
     630  * if we run out of servers:
     631
     632    * schedule retry and exponential backoff to repeat RECOVERY
     633
     634  * admit defeat after some period? presumeably the client will be shut down
     635    eventually, maybe keep trying (once per hour?) until then.
     636
     637
     638Medium Distributed Mutable Files
     639================================
     640
     641These are just like the SDMF case, but:
     642
     643* we actually take advantage of the Merkle hash tree over the blocks, by
     644  reading a single segment of data at a time (and its necessary hashes), to
     645  reduce the read-time alacrity
     646* we allow arbitrary writes to the file (i.e. seek() is provided, and
     647  O_TRUNC is no longer required)
     648* we write more code on the client side (in the MutableFileNode class), to
     649  first read each segment that a write must modify. This looks exactly like
     650  the way a normal filesystem uses a block device, or how a CPU must perform
     651  a cache-line fill before modifying a single word.
     652* we might implement some sort of copy-based atomic update server call,
     653  to allow multiple writev() calls to appear atomic to any readers.
     654
     655MDMF slots provide fairly efficient in-place edits of very large files (a few
     656GB). Appending data is also fairly efficient, although each time a power of 2
     657boundary is crossed, the entire file must effectively be re-uploaded (because
     658the size of the block hash tree changes), so if the filesize is known in
     659advance, that space ought to be pre-allocated (by leaving extra space between
     660the block hash tree and the actual data).
     661
     662MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
     663adds cache-line reads to allow random-access writes.
     664
     665Large Distributed Mutable Files
     666===============================
     667
     668LDMF slots use a fundamentally different way to store the file, inspired by
     669Mercurial's "revlog" format. They enable very efficient insert/remove/replace
     670editing of arbitrary spans. Multiple versions of the file can be retained, in
     671a revision graph that can have multiple heads. Each revision can be
     672referenced by a cryptographic identifier. There are two forms of the URI, one
     673that means "most recent version", and a longer one that points to a specific
     674revision.
     675
     676Metadata can be attached to the revisions, like timestamps, to enable rolling
     677back an entire tree to a specific point in history.
     678
     679LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
     680provides explicit support for revision identifiers and branching.
     681
     682TODO
     683====
     684
     685improve allocate-and-write or get-writer-buckets API to allow one-call (or
     686maybe two-call) updates. The challenge is in figuring out which shares are on
     687which machines. First cut will have lots of round trips.
     688
     689(eventually) define behavior when seqnum wraps. At the very least make sure
     690it can't cause a security problem. "the slot is worn out" is acceptable.
     691
     692(eventually) define share-migration lease update protocol. Including the
     693nodeid who accepted the lease is useful, we can use the same protocol as we
     694do for updating the write enabler. However we need to know which lease to
     695update.. maybe send back a list of all old nodeids that we find, then try all
     696of them when we accept the update?
     697
     698We now do this in a specially-formatted IndexError exception:
     699 "UNABLE to renew non-existent lease. I have leases accepted by " +
     700 "nodeids: '12345','abcde','44221' ."
     701
     702confirm that a repairer can regenerate shares without the private key. Hmm,
     703without the write-enabler they won't be able to write those shares to the
     704servers.. although they could add immutable new shares to new servers.
  • deleted file docs/specifications/mutable.txt

    diff --git a/docs/specifications/mutable.txt b/docs/specifications/mutable.txt
    deleted file mode 100644
    index 40a5374..0000000
    + -  
    1 
    2 This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
    3 
    4 = Mutable Files =
    5 
    6 Mutable File Slots are places with a stable identifier that can hold data
    7 that changes over time. In contrast to CHK slots, for which the
    8 URI/identifier is derived from the contents themselves, the Mutable File Slot
    9 URI remains fixed for the life of the slot, regardless of what data is placed
    10 inside it.
    11 
    12 Each mutable slot is referenced by two different URIs. The "read-write" URI
    13 grants read-write access to its holder, allowing them to put whatever
    14 contents they like into the slot. The "read-only" URI is less powerful, only
    15 granting read access, and not enabling modification of the data. The
    16 read-write URI can be turned into the read-only URI, but not the other way
    17 around.
    18 
    19 The data in these slots is distributed over a number of servers, using the
    20 same erasure coding that CHK files use, with 3-of-10 being a typical choice
    21 of encoding parameters. The data is encrypted and signed in such a way that
    22 only the holders of the read-write URI will be able to set the contents of
    23 the slot, and only the holders of the read-only URI will be able to read
    24 those contents. Holders of either URI will be able to validate the contents
    25 as being written by someone with the read-write URI. The servers who hold the
    26 shares cannot read or modify them: the worst they can do is deny service (by
    27 deleting or corrupting the shares), or attempt a rollback attack (which can
    28 only succeed with the cooperation of at least k servers).
    29 
    30 == Consistency vs Availability ==
    31 
    32 There is an age-old battle between consistency and availability. Epic papers
    33 have been written, elaborate proofs have been established, and generations of
    34 theorists have learned that you cannot simultaneously achieve guaranteed
    35 consistency with guaranteed reliability. In addition, the closer to 0 you get
    36 on either axis, the cost and complexity of the design goes up.
    37 
    38 Tahoe's design goals are to largely favor design simplicity, then slightly
    39 favor read availability, over the other criteria.
    40 
    41 As we develop more sophisticated mutable slots, the API may expose multiple
    42 read versions to the application layer. The tahoe philosophy is to defer most
    43 consistency recovery logic to the higher layers. Some applications have
    44 effective ways to merge multiple versions, so inconsistency is not
    45 necessarily a problem (i.e. directory nodes can usually merge multiple "add
    46 child" operations).
    47 
    48 == The Prime Coordination Directive: "Don't Do That" ==
    49 
    50 The current rule for applications which run on top of Tahoe is "do not
    51 perform simultaneous uncoordinated writes". That means you need non-tahoe
    52 means to make sure that two parties are not trying to modify the same mutable
    53 slot at the same time. For example:
    54 
    55  * don't give the read-write URI to anyone else. Dirnodes in a private
    56    directory generally satisfy this case, as long as you don't use two
    57    clients on the same account at the same time
    58  * if you give a read-write URI to someone else, stop using it yourself. An
    59    inbox would be a good example of this.
    60  * if you give a read-write URI to someone else, call them on the phone
    61    before you write into it
    62  * build an automated mechanism to have your agents coordinate writes.
    63    For example, we expect a future release to include a FURL for a
    64    "coordination server" in the dirnodes. The rule can be that you must
    65    contact the coordination server and obtain a lock/lease on the file
    66    before you're allowed to modify it.
    67 
    68 If you do not follow this rule, Bad Things will happen. The worst-case Bad
    69 Thing is that the entire file will be lost. A less-bad Bad Thing is that one
    70 or more of the simultaneous writers will lose their changes. An observer of
    71 the file may not see monotonically-increasing changes to the file, i.e. they
    72 may see version 1, then version 2, then 3, then 2 again.
    73 
    74 Tahoe takes some amount of care to reduce the badness of these Bad Things.
    75 One way you can help nudge it from the "lose your file" case into the "lose
    76 some changes" case is to reduce the number of competing versions: multiple
    77 versions of the file that different parties are trying to establish as the
    78 one true current contents. Each simultaneous writer counts as a "competing
    79 version", as does the previous version of the file. If the count "S" of these
    80 competing versions is larger than N/k, then the file runs the risk of being
    81 lost completely. [TODO] If at least one of the writers remains running after
    82 the collision is detected, it will attempt to recover, but if S>(N/k) and all
    83 writers crash after writing a few shares, the file will be lost.
    84 
    85 Note that Tahoe uses serialization internally to make sure that a single
    86 Tahoe node will not perform simultaneous modifications to a mutable file. It
    87 accomplishes this by using a weakref cache of the MutableFileNode (so that
    88 there will never be two distinct MutableFileNodes for the same file), and by
    89 forcing all mutable file operations to obtain a per-node lock before they
    90 run. The Prime Coordination Directive therefore applies to inter-node
    91 conflicts, not intra-node ones.
    92 
    93 
    94 == Small Distributed Mutable Files ==
    95 
    96 SDMF slots are suitable for small (<1MB) files that are editing by rewriting
    97 the entire file. The three operations are:
    98 
    99  * allocate (with initial contents)
    100  * set (with new contents)
    101  * get (old contents)
    102 
    103 The first use of SDMF slots will be to hold directories (dirnodes), which map
    104 encrypted child names to rw-URI/ro-URI pairs.
    105 
    106 === SDMF slots overview ===
    107 
    108 Each SDMF slot is created with a public/private key pair. The public key is
    109 known as the "verification key", while the private key is called the
    110 "signature key". The private key is hashed and truncated to 16 bytes to form
    111 the "write key" (an AES symmetric key). The write key is then hashed and
    112 truncated to form the "read key". The read key is hashed and truncated to
    113 form the 16-byte "storage index" (a unique string used as an index to locate
    114 stored data).
    115 
    116 The public key is hashed by itself to form the "verification key hash".
    117 
    118 The write key is hashed a different way to form the "write enabler master".
    119 For each storage server on which a share is kept, the write enabler master is
    120 concatenated with the server's nodeid and hashed, and the result is called
    121 the "write enabler" for that particular server. Note that multiple shares of
    122 the same slot stored on the same server will all get the same write enabler,
    123 i.e. the write enabler is associated with the "bucket", rather than the
    124 individual shares.
    125 
    126 The private key is encrypted (using AES in counter mode) by the write key,
    127 and the resulting crypttext is stored on the servers. so it will be
    128 retrievable by anyone who knows the write key. The write key is not used to
    129 encrypt anything else, and the private key never changes, so we do not need
    130 an IV for this purpose.
    131 
    132 The actual data is encrypted (using AES in counter mode) with a key derived
    133 by concatenating the readkey with the IV, the hashing the results and
    134 truncating to 16 bytes. The IV is randomly generated each time the slot is
    135 updated, and stored next to the encrypted data.
    136 
    137 The read-write URI consists of the write key and the verification key hash.
    138 The read-only URI contains the read key and the verification key hash. The
    139 verify-only URI contains the storage index and the verification key hash.
    140 
    141  URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)
    142  URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)
    143  URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)
    144 
    145 Note that this allows the read-only and verify-only URIs to be derived from
    146 the read-write URI without actually retrieving the public keys. Also note
    147 that it means the read-write agent must validate both the private key and the
    148 public key when they are first fetched. All users validate the public key in
    149 exactly the same way.
    150 
    151 The SDMF slot is allocated by sending a request to the storage server with a
    152 desired size, the storage index, and the write enabler for that server's
    153 nodeid. If granted, the write enabler is stashed inside the slot's backing
    154 store file. All further write requests must be accompanied by the write
    155 enabler or they will not be honored. The storage server does not share the
    156 write enabler with anyone else.
    157 
    158 The SDMF slot structure will be described in more detail below. The important
    159 pieces are:
    160 
    161   * a sequence number
    162   * a root hash "R"
    163   * the encoding parameters (including k, N, file size, segment size)
    164   * a signed copy of [seqnum,R,encoding_params], using the signature key
    165   * the verification key (not encrypted)
    166   * the share hash chain (part of a Merkle tree over the share hashes)
    167   * the block hash tree (Merkle tree over blocks of share data)
    168   * the share data itself (erasure-coding of read-key-encrypted file data)
    169   * the signature key, encrypted with the write key
    170 
    171 The access pattern for read is:
    172  * hash read-key to get storage index
    173  * use storage index to locate 'k' shares with identical 'R' values
    174    * either get one share, read 'k' from it, then read k-1 shares
    175    * or read, say, 5 shares, discover k, either get more or be finished
    176    * or copy k into the URIs
    177  * read verification key
    178  * hash verification key, compare against verification key hash
    179  * read seqnum, R, encoding parameters, signature
    180  * verify signature against verification key
    181  * read share data, compute block-hash Merkle tree and root "r"
    182  * read share hash chain (leading from "r" to "R")
    183  * validate share hash chain up to the root "R"
    184  * submit share data to erasure decoding
    185  * decrypt decoded data with read-key
    186  * submit plaintext to application
    187 
    188 The access pattern for write is:
    189  * hash write-key to get read-key, hash read-key to get storage index
    190  * use the storage index to locate at least one share
    191  * read verification key and encrypted signature key
    192  * decrypt signature key using write-key
    193  * hash signature key, compare against write-key
    194  * hash verification key, compare against verification key hash
    195  * encrypt plaintext from application with read-key
    196    * application can encrypt some data with the write-key to make it only
    197      available to writers (use this for transitive read-onlyness of dirnodes)
    198  * erasure-code crypttext to form shares
    199  * split shares into blocks
    200  * compute Merkle tree of blocks, giving root "r" for each share
    201  * compute Merkle tree of shares, find root "R" for the file as a whole
    202  * create share data structures, one per server:
    203    * use seqnum which is one higher than the old version
    204    * share hash chain has log(N) hashes, different for each server
    205    * signed data is the same for each server
    206  * now we have N shares and need homes for them
    207  * walk through peers
    208    * if share is not already present, allocate-and-set
    209    * otherwise, try to modify existing share:
    210    * send testv_and_writev operation to each one
    211    * testv says to accept share if their(seqnum+R) <= our(seqnum+R)
    212    * count how many servers wind up with which versions (histogram over R)
    213    * keep going until N servers have the same version, or we run out of servers
    214      * if any servers wound up with a different version, report error to
    215        application
    216      * if we ran out of servers, initiate recovery process (described below)
    217 
    218 === Server Storage Protocol ===
    219 
    220 The storage servers will provide a mutable slot container which is oblivious
    221 to the details of the data being contained inside it. Each storage index
    222 refers to a "bucket", and each bucket has one or more shares inside it. (In a
    223 well-provisioned network, each bucket will have only one share). The bucket
    224 is stored as a directory, using the base32-encoded storage index as the
    225 directory name. Each share is stored in a single file, using the share number
    226 as the filename.
    227 
    228 The container holds space for a container magic number (for versioning), the
    229 write enabler, the nodeid which accepted the write enabler (used for share
    230 migration, described below), a small number of lease structures, the embedded
    231 data itself, and expansion space for additional lease structures.
    232 
    233  #   offset    size    name
    234  1   0         32      magic verstr "tahoe mutable container v1" plus binary
    235  2   32        20      write enabler's nodeid
    236  3   52        32      write enabler
    237  4   84        8       data size (actual share data present) (a)
    238  5   92        8       offset of (8) count of extra leases (after data)
    239  6   100       368     four leases, 92 bytes each
    240                         0    4   ownerid (0 means "no lease here")
    241                         4    4   expiration timestamp
    242                         8   32   renewal token
    243                         40  32   cancel token
    244                         72  20   nodeid which accepted the tokens
    245  7   468       (a)     data
    246  8   ??        4       count of extra leases
    247  9   ??        n*92    extra leases
    248 
    249 The "extra leases" field must be copied and rewritten each time the size of
    250 the enclosed data changes. The hope is that most buckets will have four or
    251 fewer leases and this extra copying will not usually be necessary.
    252 
    253 The (4) "data size" field contains the actual number of bytes of data present
    254 in field (7), such that a client request to read beyond 504+(a) will result
    255 in an error. This allows the client to (one day) read relative to the end of
    256 the file. The container size (that is, (8)-(7)) might be larger, especially
    257 if extra size was pre-allocated in anticipation of filling the container with
    258 a lot of data.
    259 
    260 The offset in (5) points at the *count* of extra leases, at (8). The actual
    261 leases (at (9)) begin 4 bytes later. If the container size changes, both (8)
    262 and (9) must be relocated by copying.
    263 
    264 The server will honor any write commands that provide the write token and do
    265 not exceed the server-wide storage size limitations. Read and write commands
    266 MUST be restricted to the 'data' portion of the container: the implementation
    267 of those commands MUST perform correct bounds-checking to make sure other
    268 portions of the container are inaccessible to the clients.
    269 
    270 The two methods provided by the storage server on these "MutableSlot" share
    271 objects are:
    272 
    273  * readv(ListOf(offset=int, length=int))
    274    * returns a list of bytestrings, of the various requested lengths
    275    * offset < 0 is interpreted relative to the end of the data
    276    * spans which hit the end of the data will return truncated data
    277 
    278  * testv_and_writev(write_enabler, test_vector, write_vector)
    279    * this is a test-and-set operation which performs the given tests and only
    280      applies the desired writes if all tests succeed. This is used to detect
    281      simultaneous writers, and to reduce the chance that an update will lose
    282      data recently written by some other party (written after the last time
    283      this slot was read).
    284    * test_vector=ListOf(TupleOf(offset, length, opcode, specimen))
    285    * the opcode is a string, from the set [gt, ge, eq, le, lt, ne]
    286    * each element of the test vector is read from the slot's data and
    287      compared against the specimen using the desired (in)equality. If all
    288      tests evaluate True, the write is performed
    289    * write_vector=ListOf(TupleOf(offset, newdata))
    290      * offset < 0 is not yet defined, it probably means relative to the
    291        end of the data, which probably means append, but we haven't nailed
    292        it down quite yet
    293      * write vectors are executed in order, which specifies the results of
    294        overlapping writes
    295    * return value:
    296      * error: OutOfSpace
    297      * error: something else (io error, out of memory, whatever)
    298      * (True, old_test_data): the write was accepted (test_vector passed)
    299      * (False, old_test_data): the write was rejected (test_vector failed)
    300        * both 'accepted' and 'rejected' return the old data that was used
    301          for the test_vector comparison. This can be used by the client
    302          to detect write collisions, including collisions for which the
    303          desired behavior was to overwrite the old version.
    304 
    305 In addition, the storage server provides several methods to access these
    306 share objects:
    307 
    308  * allocate_mutable_slot(storage_index, sharenums=SetOf(int))
    309    * returns DictOf(int, MutableSlot)
    310  * get_mutable_slot(storage_index)
    311    * returns DictOf(int, MutableSlot)
    312    * or raises KeyError
    313 
    314 We intend to add an interface which allows small slots to allocate-and-write
    315 in a single call, as well as do update or read in a single call. The goal is
    316 to allow a reasonably-sized dirnode to be created (or updated, or read) in
    317 just one round trip (to all N shareholders in parallel).
    318 
    319 ==== migrating shares ====
    320 
    321 If a share must be migrated from one server to another, two values become
    322 invalid: the write enabler (since it was computed for the old server), and
    323 the lease renew/cancel tokens.
    324 
    325 Suppose that a slot was first created on nodeA, and was thus initialized with
    326 WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is
    327 moved from nodeA to nodeB.
    328 
    329 Readers may still be able to find the share in its new home, depending upon
    330 how many servers are present in the grid, where the new nodeid lands in the
    331 permuted index for this particular storage index, and how many servers the
    332 reading client is willing to contact.
    333 
    334 When a client attempts to write to this migrated share, it will get a "bad
    335 write enabler" error, since the WE it computes for nodeB will not match the
    336 WE(nodeA) that was embedded in the share. When this occurs, the "bad write
    337 enabler" message must include the old nodeid (e.g. nodeA) that was in the
    338 share.
    339 
    340 The client then computes H(nodeB+H(WEM+nodeA)), which is the same as
    341 H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which
    342 is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to
    343 anyone else. Also note that the client does not send a value to nodeB that
    344 would allow the node to impersonate the client to a third node: everything
    345 sent to nodeB will include something specific to nodeB in it.
    346 
    347 The server locally computes H(nodeB+WE(nodeA)), using its own node id and the
    348 old write enabler from the share. It compares this against the value supplied
    349 by the client. If they match, this serves as proof that the client was able
    350 to compute the old write enabler. The server then accepts the client's new
    351 WE(nodeB) and writes it into the container.
    352 
    353 This WE-fixup process requires an extra round trip, and requires the error
    354 message to include the old nodeid, but does not require any public key
    355 operations on either client or server.
    356 
    357 Migrating the leases will require a similar protocol. This protocol will be
    358 defined concretely at a later date.
    359 
    360 === Code Details ===
    361 
    362 The MutableFileNode class is used to manipulate mutable files (as opposed to
    363 ImmutableFileNodes). These are initially generated with
    364 client.create_mutable_file(), and later recreated from URIs with
    365 client.create_node_from_uri(). Instances of this class will contain a URI and
    366 a reference to the client (for peer selection and connection).
    367 
    368 NOTE: this section is out of date. Please see src/allmydata/interfaces.py
    369 (the section on IMutableFilesystemNode) for more accurate information.
    370 
    371 The methods of MutableFileNode are:
    372 
    373  * download_to_data() -> [deferred] newdata, NotEnoughSharesError
    374    * if there are multiple retrieveable versions in the grid, get() returns
    375      the first version it can reconstruct, and silently ignores the others.
    376      In the future, a more advanced API will signal and provide access to
    377      the multiple heads.
    378  * update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
    379  * overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
    380 
    381 download_to_data() causes a new retrieval to occur, pulling the current
    382 contents from the grid and returning them to the caller. At the same time,
    383 this call caches information about the current version of the file. This
    384 information will be used in a subsequent call to update(), and if another
    385 change has occured between the two, this information will be out of date,
    386 triggering the UncoordinatedWriteError.
    387 
    388 update() is therefore intended to be used just after a download_to_data(), in
    389 the following pattern:
    390 
    391  d = mfn.download_to_data()
    392  d.addCallback(apply_delta)
    393  d.addCallback(mfn.update)
    394 
    395 If the update() call raises UCW, then the application can simply return an
    396 error to the user ("you violated the Prime Coordination Directive"), and they
    397 can try again later. Alternatively, the application can attempt to retry on
    398 its own. To accomplish this, the app needs to pause, download the new
    399 (post-collision and post-recovery) form of the file, reapply their delta,
    400 then submit the update request again. A randomized pause is necessary to
    401 reduce the chances of colliding a second time with another client that is
    402 doing exactly the same thing:
    403 
    404  d = mfn.download_to_data()
    405  d.addCallback(apply_delta)
    406  d.addCallback(mfn.update)
    407  def _retry(f):
    408    f.trap(UncoordinatedWriteError)
    409    d1 = pause(random.uniform(5, 20))
    410    d1.addCallback(lambda res: mfn.download_to_data())
    411    d1.addCallback(apply_delta)
    412    d1.addCallback(mfn.update)
    413    return d1
    414  d.addErrback(_retry)
    415 
    416 Enthusiastic applications can retry multiple times, using a randomized
    417 exponential backoff between each. A particularly enthusiastic application can
    418 retry forever, but such apps are encouraged to provide a means to the user of
    419 giving up after a while.
    420 
    421 UCW does not mean that the update was not applied, so it is also a good idea
    422 to skip the retry-update step if the delta was already applied:
    423 
    424  d = mfn.download_to_data()
    425  d.addCallback(apply_delta)
    426  d.addCallback(mfn.update)
    427  def _retry(f):
    428    f.trap(UncoordinatedWriteError)
    429    d1 = pause(random.uniform(5, 20))
    430    d1.addCallback(lambda res: mfn.download_to_data())
    431    def _maybe_apply_delta(contents):
    432      new_contents = apply_delta(contents)
    433      if new_contents != contents:
    434        return mfn.update(new_contents)
    435    d1.addCallback(_maybe_apply_delta)
    436    return d1
    437  d.addErrback(_retry)
    438 
    439 update() is the right interface to use for delta-application situations, like
    440 directory nodes (in which apply_delta might be adding or removing child
    441 entries from a serialized table).
    442 
    443 Note that any uncoordinated write has the potential to lose data. We must do
    444 more analysis to be sure, but it appears that two clients who write to the
    445 same mutable file at the same time (even if both eventually retry) will, with
    446 high probability, result in one client observing UCW and the other silently
    447 losing their changes. It is also possible for both clients to observe UCW.
    448 The moral of the story is that the Prime Coordination Directive is there for
    449 a reason, and that recovery/UCW/retry is not a subsitute for write
    450 coordination.
    451 
    452 overwrite() tells the client to ignore this cached version information, and
    453 to unconditionally replace the mutable file's contents with the new data.
    454 This should not be used in delta application, but rather in situations where
    455 you want to replace the file's contents with completely unrelated ones. When
    456 raw files are uploaded into a mutable slot through the tahoe webapi (using
    457 POST and the ?mutable=true argument), they are put in place with overwrite().
    458 
    459 
    460 
    461 The peer-selection and data-structure manipulation (and signing/verification)
    462 steps will be implemented in a separate class in allmydata/mutable.py .
    463 
    464 === SMDF Slot Format ===
    465 
    466 This SMDF data lives inside a server-side MutableSlot container. The server
    467 is oblivious to this format.
    468 
    469 This data is tightly packed. In particular, the share data is defined to run
    470 all the way to the beginning of the encrypted private key (the encprivkey
    471 offset is used both to terminate the share data and to begin the encprivkey).
    472 
    473  #    offset   size    name
    474  1    0        1       version byte, \x00 for this format
    475  2    1        8       sequence number. 2^64-1 must be handled specially, TBD
    476  3    9        32      "R" (root of share hash Merkle tree)
    477  4    41       16      IV (share data is AES(H(readkey+IV)) )
    478  5    57       18      encoding parameters:
    479        57       1        k
    480        58       1        N
    481        59       8        segment size
    482        67       8        data length (of original plaintext)
    483  6    75       32      offset table:
    484        75       4        (8) signature
    485        79       4        (9) share hash chain
    486        83       4        (10) block hash tree
    487        87       4        (11) share data
    488        91       8        (12) encrypted private key
    489        99       8        (13) EOF
    490  7    107      436ish  verification key (2048 RSA key)
    491  8    543ish   256ish  signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm))
    492  9    799ish   (a)     share hash chain, encoded as:
    493                         "".join([pack(">H32s", shnum, hash)
    494                                  for (shnum,hash) in needed_hashes])
    495 10    (927ish) (b)     block hash tree, encoded as:
    496                         "".join([pack(">32s",hash) for hash in block_hash_tree])
    497 11    (935ish) LEN     share data (no gap between this and encprivkey)
    498 12    ??       1216ish encrypted private key= AESenc(write-key, RSA-key)
    499 13    ??       --      EOF
    500 
    501 (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.
    502     This is the set of hashes necessary to validate this share's leaf in the
    503     share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.
    504 (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes
    505     long. This is the set of hashes necessary to validate any given block of
    506     share data up to the per-share root "r". Each "r" is a leaf of the share
    507     has tree (with root "R"), from which a minimal subset of hashes is put in
    508     the share hash chain in (8).
    509 
    510 === Recovery ===
    511 
    512 The first line of defense against damage caused by colliding writes is the
    513 Prime Coordination Directive: "Don't Do That".
    514 
    515 The second line of defense is to keep "S" (the number of competing versions)
    516 lower than N/k. If this holds true, at least one competing version will have
    517 k shares and thus be recoverable. Note that server unavailability counts
    518 against us here: the old version stored on the unavailable server must be
    519 included in the value of S.
    520 
    521 The third line of defense is our use of testv_and_writev() (described below),
    522 which increases the convergence of simultaneous writes: one of the writers
    523 will be favored (the one with the highest "R"), and that version is more
    524 likely to be accepted than the others. This defense is least effective in the
    525 pathological situation where S simultaneous writers are active, the one with
    526 the lowest "R" writes to N-k+1 of the shares and then dies, then the one with
    527 the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the
    528 one with the highest "R" writes to k-1 shares and dies. Any other sequencing
    529 will allow the highest "R" to write to at least k shares and establish a new
    530 revision.
    531 
    532 The fourth line of defense is the fact that each client keeps writing until
    533 at least one version has N shares. This uses additional servers, if
    534 necessary, to make sure that either the client's version or some
    535 newer/overriding version is highly available.
    536 
    537 The fifth line of defense is the recovery algorithm, which seeks to make sure
    538 that at least *one* version is highly available, even if that version is
    539 somebody else's.
    540 
    541 The write-shares-to-peers algorithm is as follows:
    542 
    543  * permute peers according to storage index
    544  * walk through peers, trying to assign one share per peer
    545  * for each peer:
    546    * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test
    547      * this means that we will overwrite any old versions, and we will
    548        overwrite simultaenous writers of the same version if our R is higher.
    549        We will not overwrite writers using a higher seqnum.
    550    * record the version that each share winds up with. If the write was
    551      accepted, this is our own version. If it was rejected, read the
    552      old_test_data to find out what version was retained.
    553    * if old_test_data indicates the seqnum was equal or greater than our
    554      own, mark the "Simultanous Writes Detected" flag, which will eventually
    555      result in an error being reported to the writer (in their close() call).
    556    * build a histogram of "R" values
    557    * repeat until the histogram indicate that some version (possibly ours)
    558      has N shares. Use new servers if necessary.
    559    * If we run out of servers:
    560      * if there are at least shares-of-happiness of any one version, we're
    561        happy, so return. (the close() might still get an error)
    562      * not happy, need to reinforce something, goto RECOVERY
    563 
    564 RECOVERY:
    565  * read all shares, count the versions, identify the recoverable ones,
    566    discard the unrecoverable ones.
    567  * sort versions: locate max(seqnums), put all versions with that seqnum
    568    in the list, sort by number of outstanding shares. Then put our own
    569    version. (TODO: put versions with seqnum <max but >us ahead of us?).
    570  * for each version:
    571    * attempt to recover that version
    572    * if not possible, remove it from the list, go to next one
    573    * if recovered, start at beginning of peer list, push that version,
    574      continue until N shares are placed
    575    * if pushing our own version, bump up the seqnum to one higher than
    576      the max seqnum we saw
    577    * if we run out of servers:
    578      * schedule retry and exponential backoff to repeat RECOVERY
    579    * admit defeat after some period? presumeably the client will be shut down
    580      eventually, maybe keep trying (once per hour?) until then.
    581 
    582 
    583 
    584 
    585 == Medium Distributed Mutable Files ==
    586 
    587 These are just like the SDMF case, but:
    588 
    589  * we actually take advantage of the Merkle hash tree over the blocks, by
    590    reading a single segment of data at a time (and its necessary hashes), to
    591    reduce the read-time alacrity
    592  * we allow arbitrary writes to the file (i.e. seek() is provided, and
    593    O_TRUNC is no longer required)
    594  * we write more code on the client side (in the MutableFileNode class), to
    595    first read each segment that a write must modify. This looks exactly like
    596    the way a normal filesystem uses a block device, or how a CPU must perform
    597    a cache-line fill before modifying a single word.
    598  * we might implement some sort of copy-based atomic update server call,
    599    to allow multiple writev() calls to appear atomic to any readers.
    600 
    601 MDMF slots provide fairly efficient in-place edits of very large files (a few
    602 GB). Appending data is also fairly efficient, although each time a power of 2
    603 boundary is crossed, the entire file must effectively be re-uploaded (because
    604 the size of the block hash tree changes), so if the filesize is known in
    605 advance, that space ought to be pre-allocated (by leaving extra space between
    606 the block hash tree and the actual data).
    607 
    608 MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
    609 adds cache-line reads to allow random-access writes.
    610 
    611 == Large Distributed Mutable Files ==
    612 
    613 LDMF slots use a fundamentally different way to store the file, inspired by
    614 Mercurial's "revlog" format. They enable very efficient insert/remove/replace
    615 editing of arbitrary spans. Multiple versions of the file can be retained, in
    616 a revision graph that can have multiple heads. Each revision can be
    617 referenced by a cryptographic identifier. There are two forms of the URI, one
    618 that means "most recent version", and a longer one that points to a specific
    619 revision.
    620 
    621 Metadata can be attached to the revisions, like timestamps, to enable rolling
    622 back an entire tree to a specific point in history.
    623 
    624 LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
    625 provides explicit support for revision identifiers and branching.
    626 
    627 == TODO ==
    628 
    629 improve allocate-and-write or get-writer-buckets API to allow one-call (or
    630 maybe two-call) updates. The challenge is in figuring out which shares are on
    631 which machines. First cut will have lots of round trips.
    632 
    633 (eventually) define behavior when seqnum wraps. At the very least make sure
    634 it can't cause a security problem. "the slot is worn out" is acceptable.
    635 
    636 (eventually) define share-migration lease update protocol. Including the
    637 nodeid who accepted the lease is useful, we can use the same protocol as we
    638 do for updating the write enabler. However we need to know which lease to
    639 update.. maybe send back a list of all old nodeids that we find, then try all
    640 of them when we accept the update?
    641 
    642  We now do this in a specially-formatted IndexError exception:
    643   "UNABLE to renew non-existent lease. I have leases accepted by " +
    644   "nodeids: '12345','abcde','44221' ."
    645 
    646 confirm that a repairer can regenerate shares without the private key. Hmm,
    647 without the write-enabler they won't be able to write those shares to the
    648 servers.. although they could add immutable new shares to new servers.
  • new file docs/specifications/outline.rst

    diff --git a/docs/specifications/outline.rst b/docs/specifications/outline.rst
    new file mode 100644
    index 0000000..9ec69bf
    - +  
     1==============================
     2Specification Document Outline
     3==============================
     4
     5While we do not yet have a clear set of specification documents for Tahoe
     6(explaining the file formats, so that others can write interoperable
     7implementations), this document is intended to lay out an outline for what
     8these specs ought to contain. Think of this as the ISO 7-Layer Model for
     9Tahoe.
     10
     11We currently imagine 4 documents.
     12
     131.  `#1: Share Format, Encoding Algorithm`_
     142.  `#2: Share Exchange Protocol`_
     153.  `#3: Server Selection Algorithm, filecap format`_
     164.  `#4: Directory Format`_
     17
     18#1: Share Format, Encoding Algorithm
     19====================================
     20
     21This document will describe the way that files are encrypted and encoded into
     22shares. It will include a specification of the share format, and explain both
     23the encoding and decoding algorithms. It will cover both mutable and
     24immutable files.
     25
     26The immutable encoding algorithm, as described by this document, will start
     27with a plaintext series of bytes, encoding parameters "k" and "N", and either
     28an encryption key or a mechanism for deterministically deriving the key from
     29the plaintext (the CHK specification). The algorithm will end with a set of N
     30shares, and a set of values that must be included in the filecap to provide
     31confidentiality (the encryption key) and integrity (the UEB hash).
     32
     33The immutable decoding algorithm will start with the filecap values (key and
     34UEB hash) and "k" shares. It will explain how to validate the shares against
     35the integrity information, how to reverse the erasure-coding, and how to
     36decrypt the resulting ciphertext. It will result in the original plaintext
     37bytes (or some subrange thereof).
     38
     39The sections on mutable files will contain similar information.
     40
     41This document is *not* responsible for explaining the filecap format, since
     42full filecaps may need to contain additional information as described in
     43document #3. Likewise it it not responsible for explaining where to put the
     44generated shares or where to find them again later.
     45
     46It is also not responsible for explaining the access control mechanisms
     47surrounding share upload, download, or modification ("Accounting" is the
     48business of controlling share upload to conserve space, and mutable file
     49shares require some sort of access control to prevent non-writecap holders
     50from destroying shares). We don't yet have a document dedicated to explaining
     51these, but let's call it "Access Control" for now.
     52
     53
     54#2: Share Exchange Protocol
     55===========================
     56
     57This document explains the wire-protocol used to upload, download, and modify
     58shares on the various storage servers.
     59
     60Given the N shares created by the algorithm described in document #1, and a
     61set of servers who are willing to accept those shares, the protocols in this
     62document will be sufficient to get the shares onto the servers. Likewise,
     63given a set of servers who hold at least k shares, these protocols will be
     64enough to retrieve the shares necessary to begin the decoding process
     65described in document #1. The notion of a "storage index" is used to
     66reference a particular share: the storage index is generated by the encoding
     67process described in document #1.
     68
     69This document does *not* describe how to identify or choose those servers,
     70rather it explains what to do once they have been selected (by the mechanisms
     71in document #3).
     72
     73This document also explains the protocols that a client uses to ask a server
     74whether or not it is willing to accept an uploaded share, and whether it has
     75a share available for download. These protocols will be used by the
     76mechanisms in document #3 to help decide where the shares should be placed.
     77
     78Where cryptographic mechanisms are necessary to implement access-control
     79policy, this document will explain those mechanisms.
     80
     81In the future, Tahoe will be able to use multiple protocols to speak to
     82storage servers. There will be alternative forms of this document, one for
     83each protocol. The first one to be written will describe the Foolscap-based
     84protocol that tahoe currently uses, but we anticipate a subsequent one to
     85describe a more HTTP-based protocol.
     86
     87#3: Server Selection Algorithm, filecap format
     88==============================================
     89
     90This document has two interrelated purposes. With a deeper understanding of
     91the issues, we may be able to separate these more cleanly in the future.
     92
     93The first purpose is to explain the server selection algorithm. Given a set
     94of N shares, where should those shares be uploaded? Given some information
     95stored about a previously-uploaded file, how should a downloader locate and
     96recover at least k shares? Given a previously-uploaded mutable file, how
     97should a modifier locate all (or most of) the shares with a reasonable amount
     98of work?
     99
     100This question implies many things, all of which should be explained in this
     101document:
     102
     103* the notion of a "grid", nominally a set of servers who could potentially
     104  hold shares, which might change over time
     105* a way to configure which grid should be used
     106* a way to discover which servers are a part of that grid
     107* a way to decide which servers are reliable enough to be worth sending
     108  shares
     109* an algorithm to handle servers which refuse shares
     110* a way for a downloader to locate which servers have shares
     111* a way to choose which shares should be used for download
     112
     113The server-selection algorithm has several obviously competing goals:
     114
     115* minimize the amount of work that must be done during upload
     116* minimize the total storage resources used
     117* avoid "hot spots", balance load among multiple servers
     118* maximize the chance that enough shares will be downloadable later, by
     119  uploading lots of shares, and by placing them on reliable servers
     120* minimize the work that the future downloader must do
     121* tolerate temporary server failures, permanent server departure, and new
     122  server insertions
     123* minimize the amount of information that must be added to the filecap
     124
     125The server-selection algorithm is defined in some context: some set of
     126expectations about the servers or grid with which it is expected to operate.
     127Different algorithms are appropriate for different situtations, so there will
     128be multiple alternatives of this document.
     129
     130The first version of this document will describe the algorithm that the
     131current (1.3.0) release uses, which is heavily weighted towards the two main
     132use case scenarios for which Tahoe has been designed: the small, stable
     133friendnet, and the allmydata.com managed grid. In both cases, we assume that
     134the storage servers are online most of the time, they are uniformly highly
     135reliable, and that the set of servers does not change very rapidly. The
     136server-selection algorithm for this environment uses a permuted server list
     137to achieve load-balancing, uses all servers identically, and derives the
     138permutation key from the storage index to avoid adding a new field to the
     139filecap.
     140
     141An alternative algorithm could give clients more precise control over share
     142placement, for example by a user who wished to make sure that k+1 shares are
     143located in each datacenter (to allow downloads to take place using only local
     144bandwidth). This algorithm could skip the permuted list and use other
     145mechanisms to accomplish load-balancing (or ignore the issue altogether). It
     146could add additional information to the filecap (like a list of which servers
     147received the shares) in lieu of performing a search at download time, perhaps
     148at the expense of allowing a repairer to move shares to a new server after
     149the initial upload. It might make up for this by storing "location hints"
     150next to each share, to indicate where other shares are likely to be found,
     151and obligating the repairer to update these hints.
     152
     153The second purpose of this document is to explain the format of the file
     154capability string (or "filecap" for short). There are multiple kinds of
     155capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
     156cap, traverse-only, etc). There are multiple ways to represent the filecap
     157(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
     158but they must all contain enough information to reliably retrieve a file
     159(given some context, of course). It must at least contain the confidentiality
     160and integrity information from document #1 (i.e. the encryption key and the
     161UEB hash). It must also contain whatever additional information the
     162upload-time server-selection algorithm generated that will be required by the
     163downloader.
     164
     165For some server-selection algorithms, the additional information will be
     166minimal. For example, the 1.3.0 release uses the hash of the encryption key
     167as a storage index, and uses the storage index to permute the server list,
     168and uses an Introducer to learn the current list of servers. This allows a
     169"close-enough" list of servers to be compressed into a filecap field that is
     170already required anyways (the encryption key). It also adds k and N to the
     171filecap, to speed up the downloader's search (the downloader knows how many
     172shares it needs, so it can send out multiple queries in parallel).
     173
     174But other server-selection algorithms might require more information. Each
     175variant of this document will explain how to encode that additional
     176information into the filecap, and how to extract and use that information at
     177download time.
     178
     179These two purposes are interrelated. A filecap that is interpreted in the
     180context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
     181a specific peer-selection algorithm, a specific Introducer, and therefore a
     182fairly-specific set of servers to query for shares. A filecap which is meant
     183to be interpreted on a different sort of grid would need different
     184information.
     185
     186Some filecap formats can be designed to contain more information (and depend
     187less upon context), such as the way an HTTP URL implies the existence of a
     188single global DNS system. Ideally a tahoe filecap should be able to specify
     189which "grid" it lives in, with enough information to allow a compatible
     190implementation of Tahoe to locate that grid and retrieve the file (regardless
     191of which server-selection algorithm was used for upload).
     192
     193This more-universal format might come at the expense of reliability, however.
     194Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
     195an individual host might then impact file availability (however the
     196Introducer contains DNS names or IP addresses).
     197
     198#4: Directory Format
     199====================
     200
     201Tahoe directories are a special way of interpreting and managing the contents
     202of a file (either mutable or immutable). These "dirnode" files are basically
     203serialized tables that map child name to filecap/dircap. This document
     204describes the format of these files.
     205
     206Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
     207applying an additional layer of encryption to the list of child writecaps.
     208The key for this encryption is derived from the containing file's writecap.
     209This document must explain how to derive this key and apply it to the
     210appropriate portion of the table.
     211
     212Future versions of the directory format are expected to contain
     213"deep-traversal caps", which allow verification/repair of files without
     214exposing their plaintext to the repair agent. This document wil be
     215responsible for explaining traversal caps too.
     216
     217Future versions of the directory format will probably contain an index and
     218more advanced data structures (for efficiency and fast lookups), instead of a
     219simple flat list of (childname, childcap). This document will also need to
     220describe metadata formats, including what access-control policies are defined
     221for the metadata.
  • deleted file docs/specifications/outline.txt

    diff --git a/docs/specifications/outline.txt b/docs/specifications/outline.txt
    deleted file mode 100644
    index 204878e..0000000
    + -  
    1 = Specification Document Outline =
    2 
    3 While we do not yet have a clear set of specification documents for Tahoe
    4 (explaining the file formats, so that others can write interoperable
    5 implementations), this document is intended to lay out an outline for what
    6 these specs ought to contain. Think of this as the ISO 7-Layer Model for
    7 Tahoe.
    8 
    9 We currently imagine 4 documents.
    10 
    11 == #1: Share Format, Encoding Algorithm ==
    12 
    13 This document will describe the way that files are encrypted and encoded into
    14 shares. It will include a specification of the share format, and explain both
    15 the encoding and decoding algorithms. It will cover both mutable and
    16 immutable files.
    17 
    18 The immutable encoding algorithm, as described by this document, will start
    19 with a plaintext series of bytes, encoding parameters "k" and "N", and either
    20 an encryption key or a mechanism for deterministically deriving the key from
    21 the plaintext (the CHK specification). The algorithm will end with a set of N
    22 shares, and a set of values that must be included in the filecap to provide
    23 confidentiality (the encryption key) and integrity (the UEB hash).
    24 
    25 The immutable decoding algorithm will start with the filecap values (key and
    26 UEB hash) and "k" shares. It will explain how to validate the shares against
    27 the integrity information, how to reverse the erasure-coding, and how to
    28 decrypt the resulting ciphertext. It will result in the original plaintext
    29 bytes (or some subrange thereof).
    30 
    31 The sections on mutable files will contain similar information.
    32 
    33 This document is *not* responsible for explaining the filecap format, since
    34 full filecaps may need to contain additional information as described in
    35 document #3. Likewise it it not responsible for explaining where to put the
    36 generated shares or where to find them again later.
    37 
    38 It is also not responsible for explaining the access control mechanisms
    39 surrounding share upload, download, or modification ("Accounting" is the
    40 business of controlling share upload to conserve space, and mutable file
    41 shares require some sort of access control to prevent non-writecap holders
    42 from destroying shares). We don't yet have a document dedicated to explaining
    43 these, but let's call it "Access Control" for now.
    44 
    45 
    46 == #2: Share Exchange Protocol ==
    47 
    48 This document explains the wire-protocol used to upload, download, and modify
    49 shares on the various storage servers.
    50 
    51 Given the N shares created by the algorithm described in document #1, and a
    52 set of servers who are willing to accept those shares, the protocols in this
    53 document will be sufficient to get the shares onto the servers. Likewise,
    54 given a set of servers who hold at least k shares, these protocols will be
    55 enough to retrieve the shares necessary to begin the decoding process
    56 described in document #1. The notion of a "storage index" is used to
    57 reference a particular share: the storage index is generated by the encoding
    58 process described in document #1.
    59 
    60 This document does *not* describe how to identify or choose those servers,
    61 rather it explains what to do once they have been selected (by the mechanisms
    62 in document #3).
    63 
    64 This document also explains the protocols that a client uses to ask a server
    65 whether or not it is willing to accept an uploaded share, and whether it has
    66 a share available for download. These protocols will be used by the
    67 mechanisms in document #3 to help decide where the shares should be placed.
    68 
    69 Where cryptographic mechanisms are necessary to implement access-control
    70 policy, this document will explain those mechanisms.
    71 
    72 In the future, Tahoe will be able to use multiple protocols to speak to
    73 storage servers. There will be alternative forms of this document, one for
    74 each protocol. The first one to be written will describe the Foolscap-based
    75 protocol that tahoe currently uses, but we anticipate a subsequent one to
    76 describe a more HTTP-based protocol.
    77 
    78 == #3: Server Selection Algorithm, filecap format ==
    79 
    80 This document has two interrelated purposes. With a deeper understanding of
    81 the issues, we may be able to separate these more cleanly in the future.
    82 
    83 The first purpose is to explain the server selection algorithm. Given a set
    84 of N shares, where should those shares be uploaded? Given some information
    85 stored about a previously-uploaded file, how should a downloader locate and
    86 recover at least k shares? Given a previously-uploaded mutable file, how
    87 should a modifier locate all (or most of) the shares with a reasonable amount
    88 of work?
    89 
    90 This question implies many things, all of which should be explained in this
    91 document:
    92 
    93  * the notion of a "grid", nominally a set of servers who could potentially
    94    hold shares, which might change over time
    95  * a way to configure which grid should be used
    96  * a way to discover which servers are a part of that grid
    97  * a way to decide which servers are reliable enough to be worth sending
    98    shares
    99  * an algorithm to handle servers which refuse shares
    100  * a way for a downloader to locate which servers have shares
    101  * a way to choose which shares should be used for download
    102 
    103 The server-selection algorithm has several obviously competing goals:
    104 
    105  * minimize the amount of work that must be done during upload
    106  * minimize the total storage resources used
    107  * avoid "hot spots", balance load among multiple servers
    108  * maximize the chance that enough shares will be downloadable later, by
    109    uploading lots of shares, and by placing them on reliable servers
    110  * minimize the work that the future downloader must do
    111  * tolerate temporary server failures, permanent server departure, and new
    112    server insertions
    113  * minimize the amount of information that must be added to the filecap
    114 
    115 The server-selection algorithm is defined in some context: some set of
    116 expectations about the servers or grid with which it is expected to operate.
    117 Different algorithms are appropriate for different situtations, so there will
    118 be multiple alternatives of this document.
    119 
    120 The first version of this document will describe the algorithm that the
    121 current (1.3.0) release uses, which is heavily weighted towards the two main
    122 use case scenarios for which Tahoe has been designed: the small, stable
    123 friendnet, and the allmydata.com managed grid. In both cases, we assume that
    124 the storage servers are online most of the time, they are uniformly highly
    125 reliable, and that the set of servers does not change very rapidly. The
    126 server-selection algorithm for this environment uses a permuted server list
    127 to achieve load-balancing, uses all servers identically, and derives the
    128 permutation key from the storage index to avoid adding a new field to the
    129 filecap.
    130 
    131 An alternative algorithm could give clients more precise control over share
    132 placement, for example by a user who wished to make sure that k+1 shares are
    133 located in each datacenter (to allow downloads to take place using only local
    134 bandwidth). This algorithm could skip the permuted list and use other
    135 mechanisms to accomplish load-balancing (or ignore the issue altogether). It
    136 could add additional information to the filecap (like a list of which servers
    137 received the shares) in lieu of performing a search at download time, perhaps
    138 at the expense of allowing a repairer to move shares to a new server after
    139 the initial upload. It might make up for this by storing "location hints"
    140 next to each share, to indicate where other shares are likely to be found,
    141 and obligating the repairer to update these hints.
    142 
    143 The second purpose of this document is to explain the format of the file
    144 capability string (or "filecap" for short). There are multiple kinds of
    145 capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
    146 cap, traverse-only, etc). There are multiple ways to represent the filecap
    147 (compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
    148 but they must all contain enough information to reliably retrieve a file
    149 (given some context, of course). It must at least contain the confidentiality
    150 and integrity information from document #1 (i.e. the encryption key and the
    151 UEB hash). It must also contain whatever additional information the
    152 upload-time server-selection algorithm generated that will be required by the
    153 downloader.
    154 
    155 For some server-selection algorithms, the additional information will be
    156 minimal. For example, the 1.3.0 release uses the hash of the encryption key
    157 as a storage index, and uses the storage index to permute the server list,
    158 and uses an Introducer to learn the current list of servers. This allows a
    159 "close-enough" list of servers to be compressed into a filecap field that is
    160 already required anyways (the encryption key). It also adds k and N to the
    161 filecap, to speed up the downloader's search (the downloader knows how many
    162 shares it needs, so it can send out multiple queries in parallel).
    163 
    164 But other server-selection algorithms might require more information. Each
    165 variant of this document will explain how to encode that additional
    166 information into the filecap, and how to extract and use that information at
    167 download time.
    168 
    169 These two purposes are interrelated. A filecap that is interpreted in the
    170 context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
    171 a specific peer-selection algorithm, a specific Introducer, and therefore a
    172 fairly-specific set of servers to query for shares. A filecap which is meant
    173 to be interpreted on a different sort of grid would need different
    174 information.
    175 
    176 Some filecap formats can be designed to contain more information (and depend
    177 less upon context), such as the way an HTTP URL implies the existence of a
    178 single global DNS system. Ideally a tahoe filecap should be able to specify
    179 which "grid" it lives in, with enough information to allow a compatible
    180 implementation of Tahoe to locate that grid and retrieve the file (regardless
    181 of which server-selection algorithm was used for upload).
    182 
    183 This more-universal format might come at the expense of reliability, however.
    184 Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
    185 an individual host might then impact file availability (however the
    186 Introducer contains DNS names or IP addresses).
    187 
    188 == #4: Directory Format ==
    189 
    190 Tahoe directories are a special way of interpreting and managing the contents
    191 of a file (either mutable or immutable). These "dirnode" files are basically
    192 serialized tables that map child name to filecap/dircap. This document
    193 describes the format of these files.
    194 
    195 Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
    196 applying an additional layer of encryption to the list of child writecaps.
    197 The key for this encryption is derived from the containing file's writecap.
    198 This document must explain how to derive this key and apply it to the
    199 appropriate portion of the table.
    200 
    201 Future versions of the directory format are expected to contain
    202 "deep-traversal caps", which allow verification/repair of files without
    203 exposing their plaintext to the repair agent. This document wil be
    204 responsible for explaining traversal caps too.
    205 
    206 Future versions of the directory format will probably contain an index and
    207 more advanced data structures (for efficiency and fast lookups), instead of a
    208 simple flat list of (childname, childcap). This document will also need to
    209 describe metadata formats, including what access-control policies are defined
    210 for the metadata.
  • new file docs/specifications/servers-of-happiness.rst

    diff --git a/docs/specifications/servers-of-happiness.rst b/docs/specifications/servers-of-happiness.rst
    new file mode 100644
    index 0000000..7f0029b
    - +  
     1====================
     2Servers of Happiness
     3====================
     4
     5When you upload a file to a Tahoe-LAFS grid, you expect that it will
     6stay there for a while, and that it will do so even if a few of the
     7peers on the grid stop working, or if something else goes wrong. An
     8upload health metric helps to make sure that this actually happens.
     9An upload health metric is a test that looks at a file on a Tahoe-LAFS
     10grid and says whether or not that file is healthy; that is, whether it
     11is distributed on the grid in such a way as to ensure that it will
     12probably survive in good enough shape to be recoverable, even if a few
     13things go wrong between the time of the test and the time that it is
     14recovered. Our current upload health metric for immutable files is called
     15'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
     16
     17shares-of-happiness used the number of encoded shares generated by a
     18file upload to say whether or not it was healthy. If there were more
     19shares than a user-configurable threshold, the file was reported to be
     20healthy; otherwise, it was reported to be unhealthy. In normal
     21situations, the upload process would distribute shares fairly evenly
     22over the peers in the grid, and in that case shares-of-happiness
     23worked fine. However, because it only considered the number of shares,
     24and not where they were on the grid, it could not detect situations
     25where a file was unhealthy because most or all of the shares generated
     26from the file were stored on one or two peers.
     27
     28servers-of-happiness addresses this by extending the share-focused
     29upload health metric to also consider the location of the shares on
     30grid. servers-of-happiness looks at the mapping of peers to the shares
     31that they hold, and compares the cardinality of the largest happy subset
     32of those to a user-configurable threshold. A happy subset of peers has
     33the property that any k (where k is as in k-of-n encoding) peers within
     34the subset can reconstruct the source file. This definition of file
     35health provides a stronger assurance of file availability over time;
     36with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
     37to be available even if 4 peers fail.
     38
     39Measuring Servers of Happiness
     40==============================
     41
     42We calculate servers-of-happiness by computing a matching on a
     43bipartite graph that is related to the layout of shares on the grid.
     44One set of vertices is the peers on the grid, and one set of vertices is
     45the shares. An edge connects a peer and a share if the peer will (or
     46does, for existing shares) hold the share. The size of the maximum
     47matching on this graph is the size of the largest happy peer set that
     48exists for the upload.
     49
     50First, note that a bipartite matching of size n corresponds to a happy
     51subset of size n. This is because a bipartite matching of size n implies
     52that there are n peers such that each peer holds a share that no other
     53peer holds. Then any k of those peers collectively hold k distinct
     54shares, and can restore the file.
     55
     56A bipartite matching of size n is not necessary for a happy subset of
     57size n, however (so it is not correct to say that the size of the
     58maximum matching on this graph is the size of the largest happy subset
     59of peers that exists for the upload). For example, consider a file with
     60k = 3, and suppose that each peer has all three of those pieces.  Then,
     61since any peer from the original upload can restore the file, if there
     62are 10 peers holding shares, and the happiness threshold is 7, the
     63upload should be declared happy, because there is a happy subset of size
     6410, and 10 > 7. However, since a maximum matching on the bipartite graph
     65related to this layout has only 3 edges, Tahoe-LAFS declares the upload
     66unhealthy. Though it is not unhealthy, a share layout like this example
     67is inefficient; for k = 3, and if there are n peers, it corresponds to
     68an expansion factor of 10x. Layouts that are declared healthy by the
     69bipartite graph matching approach have the property that they correspond
     70to uploads that are either already relatively efficient in their
     71utilization of space, or can be made to be so by deleting shares; and
     72that place all of the shares that they generate, enabling redistribution
     73of shares later without having to re-encode the file.  Also, it is
     74computationally reasonable to compute a maximum matching in a bipartite
     75graph, and there are well-studied algorithms to do that.
     76
     77Issues
     78======
     79
     80The uploader is good at detecting unhealthy upload layouts, but it
     81doesn't always know how to make an unhealthy upload into a healthy
     82upload if it is possible to do so; it attempts to redistribute shares to
     83achieve happiness, but only in certain circumstances. The redistribution
     84algorithm isn't optimal, either, so even in these cases it will not
     85always find a happy layout if one can be arrived at through
     86redistribution. We are investigating improvements to address these
     87issues.
     88
     89We don't use servers-of-happiness for mutable files yet; this fix will
     90likely come in Tahoe-LAFS version 1.8.
  • deleted file docs/specifications/servers-of-happiness.txt

    diff --git a/docs/specifications/servers-of-happiness.txt b/docs/specifications/servers-of-happiness.txt
    deleted file mode 100644
    index 67c6d71..0000000
    + -  
    1 = Servers of Happiness =
    2 
    3 When you upload a file to a Tahoe-LAFS grid, you expect that it will
    4 stay there for a while, and that it will do so even if a few of the
    5 peers on the grid stop working, or if something else goes wrong. An
    6 upload health metric helps to make sure that this actually happens.
    7 An upload health metric is a test that looks at a file on a Tahoe-LAFS
    8 grid and says whether or not that file is healthy; that is, whether it
    9 is distributed on the grid in such a way as to ensure that it will
    10 probably survive in good enough shape to be recoverable, even if a few
    11 things go wrong between the time of the test and the time that it is
    12 recovered. Our current upload health metric for immutable files is called
    13 'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
    14 
    15 shares-of-happiness used the number of encoded shares generated by a
    16 file upload to say whether or not it was healthy. If there were more
    17 shares than a user-configurable threshold, the file was reported to be
    18 healthy; otherwise, it was reported to be unhealthy. In normal
    19 situations, the upload process would distribute shares fairly evenly
    20 over the peers in the grid, and in that case shares-of-happiness
    21 worked fine. However, because it only considered the number of shares,
    22 and not where they were on the grid, it could not detect situations
    23 where a file was unhealthy because most or all of the shares generated
    24 from the file were stored on one or two peers.
    25 
    26 servers-of-happiness addresses this by extending the share-focused
    27 upload health metric to also consider the location of the shares on
    28 grid. servers-of-happiness looks at the mapping of peers to the shares
    29 that they hold, and compares the cardinality of the largest happy subset
    30 of those to a user-configurable threshold. A happy subset of peers has
    31 the property that any k (where k is as in k-of-n encoding) peers within
    32 the subset can reconstruct the source file. This definition of file
    33 health provides a stronger assurance of file availability over time;
    34 with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
    35 to be available even if 4 peers fail.
    36 
    37 == Measuring Servers of Happiness ==
    38 
    39 We calculate servers-of-happiness by computing a matching on a
    40 bipartite graph that is related to the layout of shares on the grid.
    41 One set of vertices is the peers on the grid, and one set of vertices is
    42 the shares. An edge connects a peer and a share if the peer will (or
    43 does, for existing shares) hold the share. The size of the maximum
    44 matching on this graph is the size of the largest happy peer set that
    45 exists for the upload.
    46 
    47 First, note that a bipartite matching of size n corresponds to a happy
    48 subset of size n. This is because a bipartite matching of size n implies
    49 that there are n peers such that each peer holds a share that no other
    50 peer holds. Then any k of those peers collectively hold k distinct
    51 shares, and can restore the file.
    52 
    53 A bipartite matching of size n is not necessary for a happy subset of
    54 size n, however (so it is not correct to say that the size of the
    55 maximum matching on this graph is the size of the largest happy subset
    56 of peers that exists for the upload). For example, consider a file with
    57 k = 3, and suppose that each peer has all three of those pieces.  Then,
    58 since any peer from the original upload can restore the file, if there
    59 are 10 peers holding shares, and the happiness threshold is 7, the
    60 upload should be declared happy, because there is a happy subset of size
    61 10, and 10 > 7. However, since a maximum matching on the bipartite graph
    62 related to this layout has only 3 edges, Tahoe-LAFS declares the upload
    63 unhealthy. Though it is not unhealthy, a share layout like this example
    64 is inefficient; for k = 3, and if there are n peers, it corresponds to
    65 an expansion factor of 10x. Layouts that are declared healthy by the
    66 bipartite graph matching approach have the property that they correspond
    67 to uploads that are either already relatively efficient in their
    68 utilization of space, or can be made to be so by deleting shares; and
    69 that place all of the shares that they generate, enabling redistribution
    70 of shares later without having to re-encode the file.  Also, it is
    71 computationally reasonable to compute a maximum matching in a bipartite
    72 graph, and there are well-studied algorithms to do that.
    73 
    74 == Issues ==
    75 
    76 The uploader is good at detecting unhealthy upload layouts, but it
    77 doesn't always know how to make an unhealthy upload into a healthy
    78 upload if it is possible to do so; it attempts to redistribute shares to
    79 achieve happiness, but only in certain circumstances. The redistribution
    80 algorithm isn't optimal, either, so even in these cases it will not
    81 always find a happy layout if one can be arrived at through
    82 redistribution. We are investigating improvements to address these
    83 issues.
    84 
    85 We don't use servers-of-happiness for mutable files yet; this fix will
    86 likely come in Tahoe-LAFS version 1.8.
  • new file docs/specifications/uri.rst

    diff --git a/docs/specifications/uri.rst b/docs/specifications/uri.rst
    new file mode 100644
    index 0000000..91f8cc2
    - +  
     1==========
     2Tahoe URIs
     3==========
     4
     51.  `File URIs`_
     6
     7    1. `CHK URIs`_
     8    2. `LIT URIs`_
     9    3. `Mutable File URIs`_
     10
     112.  `Directory URIs`_
     123.  `Internal Usage of URIs`_
     13
     14Each file and directory in a Tahoe filesystem is described by a "URI". There
     15are different kinds of URIs for different kinds of objects, and there are
     16different kinds of URIs to provide different kinds of access to those
     17objects. Each URI is a string representation of a "capability" or "cap", and
     18there are read-caps, write-caps, verify-caps, and others.
     19
     20Each URI provides both ``location`` and ``identification`` properties.
     21``location`` means that holding the URI is sufficient to locate the data it
     22represents (this means it contains a storage index or a lookup key, whatever
     23is necessary to find the place or places where the data is being kept).
     24``identification`` means that the URI also serves to validate the data: an
     25attacker who wants to trick you into into using the wrong data will be
     26limited in their abilities by the identification properties of the URI.
     27
     28Some URIs are subsets of others. In particular, if you know a URI which
     29allows you to modify some object, you can produce a weaker read-only URI and
     30give it to someone else, and they will be able to read that object but not
     31modify it. Directories, for example, have a read-cap which is derived from
     32the write-cap: anyone with read/write access to the directory can produce a
     33limited URI that grants read-only access, but not the other way around.
     34
     35src/allmydata/uri.py is the main place where URIs are processed. It is
     36the authoritative definition point for all the the URI types described
     37herein.
     38
     39File URIs
     40=========
     41
     42The lowest layer of the Tahoe architecture (the "grid") is reponsible for
     43mapping URIs to data. This is basically a distributed hash table, in which
     44the URI is the key, and some sequence of bytes is the value.
     45
     46There are two kinds of entries in this table: immutable and mutable. For
     47immutable entries, the URI represents a fixed chunk of data. The URI itself
     48is derived from the data when it is uploaded into the grid, and can be used
     49to locate and download that data from the grid at some time in the future.
     50
     51For mutable entries, the URI identifies a "slot" or "container", which can be
     52filled with different pieces of data at different times.
     53
     54It is important to note that the "files" described by these URIs are just a
     55bunch of bytes, and that **no** filenames or other metadata is retained at
     56this layer. The vdrive layer (which sits above the grid layer) is entirely
     57responsible for directories and filenames and the like.
     58
     59CHK URIs
     60--------
     61
     62CHK (Content Hash Keyed) files are immutable sequences of bytes. They are
     63uploaded in a distributed fashion using a "storage index" (for the "location"
     64property), and encrypted using a "read key". A secure hash of the data is
     65computed to help validate the data afterwards (providing the "identification"
     66property). All of these pieces, plus information about the file's size and
     67the number of shares into which it has been distributed, are put into the
     68"CHK" uri. The storage index is derived by hashing the read key (using a
     69tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
     70physically present in the URI.
     71
     72The current format for CHK URIs is the concatenation of the following
     73strings::
     74
     75 URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)
     76
     77Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the
     78base32 encoding of the SHA-256 hash of the URI Extension Block,
     79(needed-shares) is an ascii decimal representation of the number of shares
     80required to reconstruct this file, (total-shares) is the same representation
     81of the total number of shares created, and (size) is an ascii decimal
     82representation of the size of the data represented by this URI. All base32
     83encodings are expressed in lower-case, with the trailing '=' signs removed.
     84
     85For example, the following is a CHK URI, generated from the contents of the
     86architecture.txt document that lives next to this one in the source tree::
     87
     88 URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
     89
     90Historical note: The name "CHK" is somewhat inaccurate and continues to be
     91used for historical reasons. "Content Hash Key" means that the encryption key
     92is derived by hashing the contents, which gives the useful property that
     93encoding the same file twice will result in the same URI. However, this is an
     94optional step: by passing a different flag to the appropriate API call, Tahoe
     95will generate a random encryption key instead of hashing the file: this gives
     96the useful property that the URI or storage index does not reveal anything
     97about the file's contents (except filesize), which improves privacy. The
     98URI:CHK: prefix really indicates that an immutable file is in use, without
     99saying anything about how the key was derived.
     100
     101LIT URIs
     102--------
     103
     104LITeral files are also an immutable sequence of bytes, but they are so short
     105that the data is stored inside the URI itself. These are used for files of 55
     106bytes or shorter, which is the point at which the LIT URI is the same length
     107as a CHK URI would be.
     108
     109LIT URIs do not require an upload or download phase, as their data is stored
     110directly in the URI.
     111
     112The format of a LIT URI is simply a fixed prefix concatenated with the base32
     113encoding of the file's data::
     114
     115 URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
     116
     117The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
     118file that contains the string "hello" is "URI:LIT:nbswy3dp".
     119
     120Mutable File URIs
     121-----------------
     122
     123The other kind of DHT entry is the "mutable slot", in which the URI names a
     124container to which data can be placed and retrieved without changing the
     125identity of the container.
     126
     127These slots have write-caps (which allow read/write access), read-caps (which
     128only allow read-access), and verify-caps (which allow a file checker/repairer
     129to confirm that the contents exist, but does not let it decrypt the
     130contents).
     131
     132Mutable slots use public key technology to provide data integrity, and put a
     133hash of the public key in the URI. As a result, the data validation is
     134limited to confirming that the data retrieved matches *some* data that was
     135uploaded in the past, but not _which_ version of that data.
     136
     137The format of the write-cap for mutable files is::
     138
     139 URI:SSK:(writekey):(fingerprint)
     140
     141Where (writekey) is the base32 encoding of the 16-byte AES encryption key
     142that is used to encrypt the RSA private key, and (fingerprint) is the base32
     143encoded 32-byte SHA-256 hash of the RSA public key. For more details about
     144the way these keys are used, please see docs/mutable.txt .
     145
     146The format for mutable read-caps is::
     147
     148 URI:SSK-RO:(readkey):(fingerprint)
     149
     150The read-cap is just like the write-cap except it contains the other AES
     151encryption key: the one used for encrypting the mutable file's contents. This
     152second key is derived by hashing the writekey, which allows the holder of a
     153write-cap to produce a read-cap, but not the other way around. The
     154fingerprint is the same in both caps.
     155
     156Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
     157"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
     158structure to provide mutable file access.
     159
     160Directory URIs
     161==============
     162
     163The grid layer provides a mapping from URI to data. To turn this into a graph
     164of directories and files, the "vdrive" layer (which sits on top of the grid
     165layer) needs to keep track of "directory nodes", or "dirnodes" for short.
     166docs/dirnodes.txt describes how these work.
     167
     168Dirnodes are contained inside mutable files, and are thus simply a particular
     169way to interpret the contents of these files. As a result, a directory
     170write-cap looks a lot like a mutable-file write-cap::
     171
     172 URI:DIR2:(writekey):(fingerprint)
     173
     174Likewise directory read-caps (which provide read-only access to the
     175directory) look much like mutable-file read-caps::
     176
     177 URI:DIR2-RO:(readkey):(fingerprint)
     178
     179Historical note: the "DIR2" prefix is used because the non-distributed
     180dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
     181
     182Internal Usage of URIs
     183======================
     184
     185The classes in source:src/allmydata/uri.py are used to pack and unpack these
     186various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and
     187IDirnodeURI) which are implemented by these classes, and string-to-URI-class
     188conversion routines have been registered as adapters, so that code which
     189wants to extract e.g. the size of a CHK or LIT uri can do::
     190
     191 print IFileURI(uri).get_size()
     192
     193If the URI does not represent a CHK or LIT uri (for example, if it was for a
     194directory instead), the adaptation will fail, raising a TypeError inside the
     195IFileURI() call.
     196
     197Several utility methods are provided on these objects. The most important is
     198``to_string()``, which returns the string form of the URI. Therefore
     199``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class
     200in source:src/allmydata/interfaces.py for more details.
     201
  • deleted file docs/specifications/uri.txt

    diff --git a/docs/specifications/uri.txt b/docs/specifications/uri.txt
    deleted file mode 100644
    index 5599fa1..0000000
    + -  
    1 
    2 = Tahoe URIs =
    3 
    4 Each file and directory in a Tahoe filesystem is described by a "URI". There
    5 are different kinds of URIs for different kinds of objects, and there are
    6 different kinds of URIs to provide different kinds of access to those
    7 objects. Each URI is a string representation of a "capability" or "cap", and
    8 there are read-caps, write-caps, verify-caps, and others.
    9 
    10 Each URI provides both '''location''' and '''identification''' properties.
    11 '''location''' means that holding the URI is sufficient to locate the data it
    12 represents (this means it contains a storage index or a lookup key, whatever
    13 is necessary to find the place or places where the data is being kept).
    14 '''identification''' means that the URI also serves to validate the data: an
    15 attacker who wants to trick you into into using the wrong data will be
    16 limited in their abilities by the identification properties of the URI.
    17 
    18 Some URIs are subsets of others. In particular, if you know a URI which
    19 allows you to modify some object, you can produce a weaker read-only URI and
    20 give it to someone else, and they will be able to read that object but not
    21 modify it. Directories, for example, have a read-cap which is derived from
    22 the write-cap: anyone with read/write access to the directory can produce a
    23 limited URI that grants read-only access, but not the other way around.
    24 
    25 source:src/allmydata/uri.py is the main place where URIs are processed. It is
    26 the authoritative definition point for all the the URI types described
    27 herein.
    28 
    29 == File URIs ==
    30 
    31 The lowest layer of the Tahoe architecture (the "grid") is reponsible for
    32 mapping URIs to data. This is basically a distributed hash table, in which
    33 the URI is the key, and some sequence of bytes is the value.
    34 
    35 There are two kinds of entries in this table: immutable and mutable. For
    36 immutable entries, the URI represents a fixed chunk of data. The URI itself
    37 is derived from the data when it is uploaded into the grid, and can be used
    38 to locate and download that data from the grid at some time in the future.
    39 
    40 For mutable entries, the URI identifies a "slot" or "container", which can be
    41 filled with different pieces of data at different times.
    42 
    43 It is important to note that the "files" described by these URIs are just a
    44 bunch of bytes, and that __no__ filenames or other metadata is retained at
    45 this layer. The vdrive layer (which sits above the grid layer) is entirely
    46 responsible for directories and filenames and the like.
    47 
    48 === CHI URIs ===
    49 
    50 CHK (Content Hash Keyed) files are immutable sequences of bytes. They are
    51 uploaded in a distributed fashion using a "storage index" (for the "location"
    52 property), and encrypted using a "read key". A secure hash of the data is
    53 computed to help validate the data afterwards (providing the "identification"
    54 property). All of these pieces, plus information about the file's size and
    55 the number of shares into which it has been distributed, are put into the
    56 "CHK" uri. The storage index is derived by hashing the read key (using a
    57 tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
    58 physically present in the URI.
    59 
    60 The current format for CHK URIs is the concatenation of the following
    61 strings:
    62 
    63  URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)
    64 
    65 Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the
    66 base32 encoding of the SHA-256 hash of the URI Extension Block,
    67 (needed-shares) is an ascii decimal representation of the number of shares
    68 required to reconstruct this file, (total-shares) is the same representation
    69 of the total number of shares created, and (size) is an ascii decimal
    70 representation of the size of the data represented by this URI. All base32
    71 encodings are expressed in lower-case, with the trailing '=' signs removed.
    72 
    73 For example, the following is a CHK URI, generated from the contents of the
    74 architecture.txt document that lives next to this one in the source tree:
    75 
    76 URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
    77 
    78 Historical note: The name "CHK" is somewhat inaccurate and continues to be
    79 used for historical reasons. "Content Hash Key" means that the encryption key
    80 is derived by hashing the contents, which gives the useful property that
    81 encoding the same file twice will result in the same URI. However, this is an
    82 optional step: by passing a different flag to the appropriate API call, Tahoe
    83 will generate a random encryption key instead of hashing the file: this gives
    84 the useful property that the URI or storage index does not reveal anything
    85 about the file's contents (except filesize), which improves privacy. The
    86 URI:CHK: prefix really indicates that an immutable file is in use, without
    87 saying anything about how the key was derived.
    88 
    89 === LIT URIs ===
    90 
    91 LITeral files are also an immutable sequence of bytes, but they are so short
    92 that the data is stored inside the URI itself. These are used for files of 55
    93 bytes or shorter, which is the point at which the LIT URI is the same length
    94 as a CHK URI would be.
    95 
    96 LIT URIs do not require an upload or download phase, as their data is stored
    97 directly in the URI.
    98 
    99 The format of a LIT URI is simply a fixed prefix concatenated with the base32
    100 encoding of the file's data:
    101 
    102  URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
    103 
    104 The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
    105 file that contains the string "hello" is "URI:LIT:nbswy3dp".
    106 
    107 === Mutable File URIs ===
    108 
    109 The other kind of DHT entry is the "mutable slot", in which the URI names a
    110 container to which data can be placed and retrieved without changing the
    111 identity of the container.
    112 
    113 These slots have write-caps (which allow read/write access), read-caps (which
    114 only allow read-access), and verify-caps (which allow a file checker/repairer
    115 to confirm that the contents exist, but does not let it decrypt the
    116 contents).
    117 
    118 Mutable slots use public key technology to provide data integrity, and put a
    119 hash of the public key in the URI. As a result, the data validation is
    120 limited to confirming that the data retrieved matches _some_ data that was
    121 uploaded in the past, but not _which_ version of that data.
    122 
    123 The format of the write-cap for mutable files is:
    124 
    125  URI:SSK:(writekey):(fingerprint)
    126 
    127 Where (writekey) is the base32 encoding of the 16-byte AES encryption key
    128 that is used to encrypt the RSA private key, and (fingerprint) is the base32
    129 encoded 32-byte SHA-256 hash of the RSA public key. For more details about
    130 the way these keys are used, please see docs/mutable.txt .
    131 
    132 The format for mutable read-caps is:
    133 
    134  URI:SSK-RO:(readkey):(fingerprint)
    135 
    136 The read-cap is just like the write-cap except it contains the other AES
    137 encryption key: the one used for encrypting the mutable file's contents. This
    138 second key is derived by hashing the writekey, which allows the holder of a
    139 write-cap to produce a read-cap, but not the other way around. The
    140 fingerprint is the same in both caps.
    141 
    142 Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
    143 "Sub-Space Keys" from the Freenet project, which uses a vaguely similar
    144 structure to provide mutable file access.
    145 
    146 == Directory URIs ==
    147 
    148 The grid layer provides a mapping from URI to data. To turn this into a graph
    149 of directories and files, the "vdrive" layer (which sits on top of the grid
    150 layer) needs to keep track of "directory nodes", or "dirnodes" for short.
    151 source:docs/dirnodes.txt describes how these work.
    152 
    153 Dirnodes are contained inside mutable files, and are thus simply a particular
    154 way to interpret the contents of these files. As a result, a directory
    155 write-cap looks a lot like a mutable-file write-cap:
    156 
    157  URI:DIR2:(writekey):(fingerprint)
    158 
    159 Likewise directory read-caps (which provide read-only access to the
    160 directory) look much like mutable-file read-caps:
    161 
    162  URI:DIR2-RO:(readkey):(fingerprint)
    163 
    164 Historical note: the "DIR2" prefix is used because the non-distributed
    165 dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
    166 
    167 == Internal Usage of URIs ==
    168 
    169 The classes in source:src/allmydata/uri.py are used to pack and unpack these
    170 various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and
    171 IDirnodeURI) which are implemented by these classes, and string-to-URI-class
    172 conversion routines have been registered as adapters, so that code which
    173 wants to extract e.g. the size of a CHK or LIT uri can do:
    174 
    175 {{{
    176 print IFileURI(uri).get_size()
    177 }}}
    178 
    179 If the URI does not represent a CHK or LIT uri (for example, if it was for a
    180 directory instead), the adaptation will fail, raising a TypeError inside the
    181 IFileURI() call.
    182 
    183 Several utility methods are provided on these objects. The most important is
    184 {{{ to_string() }}}, which returns the string form of the URI. Therefore {{{
    185 IURI(uri).to_string == uri }}} is true for any valid URI. See the IURI class
    186 in source:src/allmydata/interfaces.py for more details.
    187