wiki:FileTree

Version 2 (modified by warner, at 2007-08-09T20:32:12Z) (diff)

this is obsolete but still historically valuable

Obsolete

The structures described by the page were removed around June 2007, replaced by the simpler DirectoryNode approach. However, the concepts this page describes are still applicable, and some of this approach may yet live again.

Subtrees

In Tahoe, a virtual drive is made up of a collection of "subtrees". Each subtree has zero or more levels of internal directory nodes. Each directory node is a mapping from child name to either another directory node or to a "child specification string" that references another subtree or an actual file.

There are many different kinds of subtrees. The traditional kind is the !CHKDirectorySubTree, which holds a bunch of directory nodes and gets serialized/uploading using an immutable content-based URI. There is also the !SSKDirectorySubTree, which allows mutability without requiring that the parent subtree be updated each time the contents change. (In a !CHKDirectorySubTree, the identity of the serialized form changes every time the contents change. In an !SSKDirectorySubTree, the identity is constant.)

In addition to directory-bearing subtrees, there are redirection subtrees. These act a bit like symlinks, or slots. One form is the LocalFileRedirection, which looks in a specified local file and pulls a child specification string from it. Another is the QueenRedirection, which sends an identifier to the queen and gets back a child specification string. The root subtree of any given client's space will probably be one of these two types.

Subtrees Delineate Access Rights

Access to files within a subtree is an all-or-nothing affair. If you can read one file inside a subtree, you can read all of the files therein.

To provide access to some pieces of the subtree but not to others, the client or agent or node needs to copy the desired portion into a new subtree. I'm imagining a UI in which the user might somehow drag a folder to a "give a copy of this to someone else" action, and that indicates that the agent ought to break the existing single subtree into two pieces, and give the URI of the lower piece to the someone else. Or a different way of dragging which means that you want the other person to get to see any updates that you make (but not modify anything.. the difference between a single snapshot and continued access). This form would create an !SSKDirectorySubTree (or some sort of redirection node) and give a read capability to the other person. Or a still different way of dragging that means you want the other person to get write access to the shared subdirectory too, which means creating an !SSKDirectorySubTree and granting a write capability to the other user. In all cases the existing subtree would be updated to contain full access to the new child subtree.

Mutability

Subtrees come in three flavors of mutability. The first form is mutable in place: the user has right to modify the contents of the subtree, and the identity of the subtree does not change when they do this. This first form corresponds to an !SSKDirectorySubTree to which the user has a write capability. The second form is mutable by somebody else, but not by this particular user (an !SSKDirectorySubTree to which the user only has a read capability), which means this user may not modify it at all. The third form is not mutable in place, but can be replaced with new contents. After changing the subtree this way, the parent subtree must be modified to reflect the new identity of the child subtree. This corresponds to a !CHKDirectorySubTree.

This means that the user can usefully modify a subtree if either A) it is of the first form (mutable in place), or B) it is in the third form *AND* the user can usefully modify the parent subtree. The portions of the user's virtual drive that they can modify is therefore bounded on the top by !SSKDirectorySubTree or redirection nodes.

Non-Directory Subtrees

There could be a kind of subtree that implements a "versioned directory": it has current contents, but it also retains a record of old versions. It can then present these old versions as subdirectories with names like "yesterday" and "last week".

Some subtrees can also expand other types of files: an Outlook PST file could be represented by a special type of subtree which serializes the messages in separate files, but presents them in a custom directory structure that reflects other metadata extracted from the PST database.

Locating Files

When retrieving a file, the client is presented with a list of path name components. The search process walks through a number of subtrees before (hopefully) finding the target file. Each subtree handles some portion of the path.

In the current implementation, the root subtree is presented with the path and an "opener" object that has the ability to transform a child specification string into a new subtree object. The root subtree pulls off as many path name components as it can (possibly zero if it is only a redirection node), and does whatever lookups are necessary to figure out the child specification of the next subtree. It then uses the opener to acquire a reference to that child subtree, and recurses by passing the remaining path components to the new subtree. Eventually this process results in either the target node or a "file not found" error. The target node might be a directory node inside a subtree (with a method like "list" to get all the child names) or it might be a file node (with a method like "get" to retrieve the contents).

Adding Files

To add a file, we start by adding two entries to the WorkQueue?. The first uploads the file's data and stashes the resulting CHK URI in a named "box". The second performs an "addpath" operation which pulls the URI out of the given box and adds it to the file tree with a particular path.

The addpath operation walks the filetree to find the lower-most enclosing subtree: the last subtree that would be traversed when finding a file at the given path. This process results in a "subpath", which is a rightmost portion of the full path that resides within the last subtree. (the leftmost portion of the full path gets from the root to the subtree, the rightmost portion describes where within the subtree we should find the final node). It then asks that enclosing subtree to modify its internal directory structure to add the target URI at the given subpath, then has the subtree schedule WorkQueue? operations to serialize and upload the resulting directory structure. One of these operations is to perform an "addpath" that puts the serialized subtree's URI at a location specified by the leftmost portion of the full path.

In this fashion, an "addpath" operation effectively recurses upwards until one of the subtrees is modified but does *not* need to update its parent (i.e. it is either an !SSKDirectorySubTree or a redirection node of some sort). Each WorkQueue? operation is written to disk before the previous one is removed, so the whole process can be interrupted and restarted at any time.

Attachments (4)

Download all attachments as: .zip