Version 1 (modified by warner, at 2007-06-29T19:17:16Z) (diff) |
---|
The term "storage index" is used in tahoe to refer to a value (generally the output of a SHA-256 hash) which points to a given piece of data. This index is used for two purposes: to select the set of peers that will be queried, and to pass to those peers when retrieving the data.
The data in question may be an erasure-coded share, or the index of a directory node, or something else. When used for [CHKFile CHK files], each file has a separate StorageIndex, which is used to get access to a collection of "share buckets". When used for DirectoryNodes, each dirnode has a separate StorageIndex, but the read-only and read-write views of a given dirnode point to the same StorageIndex.
For distributed data, the StorageIndex is used in the ConsistentPermutation? algorithm to prioritize a list of peers. The intent is that the data referenced by the index is most likely to exist on the top-priority peers in this list. The index is then sent to each peer on that list, to ask them if they do indeed have the corresponding data.
For centralized data, the StorageIndex is simply sent to the server which hosts that data, where it is generally turned into a string and used to locate a file or directory on disk, which contains the data in question.
In capability terms, the StorageIndex represents the authority to see the encrypted form of the corresponding data.
In earlier designs, the VerifierId was used for this purpose, but we've since realized that this is not always desireable (in particular it requires that we know the full contents of the file before we can allocate buckets, whereas we might be willing to give up convergence to reduce the memory+storage footprint of a web-based streaming upload). Now we say that in earlier releases, we always set the StorageIndex equal to the VerifierId, but in newer releases it is free to be whatever value we like.
In practice, to reduce the amount of data we need to keep around in the URI, the StorageIndex is derived by hashing some stronger capability. For example, for [CHKFile CHKFiles], the StorageIndex is the hash of the readkey, so that anyone who knows the decryption key is also able to retrieve the encrypted data that it operations upon. However, a verifier (who only knows the StorageIndex) is only able to deal with the encrypted data, not the plaintext. Likewise, for DirectoryNodes, the StorageIndex is derived by hashing the readkey, which is itself derived by hashing the writekey. This establishes a chain of successively-weaker capabilities.