[tahoe-dev] future Tahoe ideas
Brian Warner
warner-tahoe at allmydata.com
Tue Mar 24 12:11:04 PDT 2009
I just wanted to do a quick brain-dump on what my personal plans are
for Tahoe over the next few months: just to publish the projects that
are sitting on my longer-term todo list. Many of these have tickets..
I'll add tickets for the ones that don't.
* overhaul immutable upload (no format or protocol changes):
* parallelize peer-selection work, instead of spending one RTT per
server
* handle existing shares better: the current uploader usually ends up
placing duplicate shares and doubling up shares on the same server.
Any evidence of a previous upload should trigger an immediate
search for all existing shares, and new-share placement should be
smarter.
* pipeline segments, maybe just two deep, to improve performance
* merge small write requests, in particular the various hashes. Small
files should use a single remote write request.
* overhaul immutable download (no format or protocol changes):
* download/populate minimal hash chain instead of whole hash tree (we
can't currently download 10GB files because our alacrity is longer
than a lot of HTTP clients' timeouts)
* rewrite in terms of random-access, Producer/Consumer, prefetch one
segment. Remove big Deferred chain, replace with state machine.
* merge small read requests, small files should use a single remote
read
* overhaul immutable repairer: improve share-placement, add
share-rebalancing, consider adding peer-to-peer share transfer
(requires protocol additions)
* consider changing immutable lease-renewal to expire duplicate
shares
* new uploads should be a special case of repair, in which there are
no existing shares and the plaintext is available locally
* new immutable upload format/protocol:
* separate "server-selection index" from "storage index", use UEB
hash as storage index, enable improved server-side local share
verification (#654)
* add slow-crawler to verify all local shares on a periodic basis
* add "upload identifier" to storage-server protocol, allow resumed
uploads, allow storage index to be decided at the end, enable
streaming uploads
* add copy-share-from-peer method to storage-server protocol (for
share rebalancing that doesn't use the controller's bandwidth)
* new human-readable cap format with less baggage, smaller UEB hash
* add binary cap format
* new mutable format/protocol:
* make it easier for servers to refuse mutable shares, to honor the
'read-only' setting better
* DSA-based mutable files, traversal caps, pubkey-as-storage-index
* new human-readable cap format: smaller. new binary cap format
* consider storage-server validity check to replace write-enabler
(accept write iff new share would be valid and seqnum is higher
than before): measure CPU cost for server.
* MDMF (medium-sized mutable files: modify one segment without
touching other segments). LDMF (large-sized: revlog based,
append-only backend shares, efficient insert/delete span, revision
graph, readcaps for individual revisions, writecap for the whole
thing).
* new directory format:
* traversal caps
* faster to parse: encrypt each column separately. Maybe one
unencrypted column with [(offset,size), ..] references to the other
columns, then one column each for names, traversalcaps, readcaps,
writecaps, metadata. Use binary caps.
* new human-readable cap format: smaller. new binary cap format.
* immutable dirnodes: may contain only filecaps and immutable-dircaps.
Put stats information in the dirnode. Add tool to create "virtual
CDs".
* enhance "tahoe backup" CLI tool to share old directories. Create
immutable dirnodes when available.
* enhance "tahoe cp" to use backupdb. Use backupdb in tahoe-to-local
copies. Enable cron-driven "tahoe cp -r" to do minimal work.
* Accounting. Add management tools, status web pages.
* use ECDSA pubkeys to manage leases instead of shared secrets (each
lease is stored with a verifying key, lease renewal is based upon
signed messages instead of matching renewal-secrets)
* use Accounting privileges to manage leases instead of pubkeys (the
right to control a label includes the right to manage leases with
that label. Individual lease records could still have a pubkey for
delegation to a renewal agent).
* Lease-tracking cache database. One-shot share crawler to populate or
regenerate DB. Once populated, use it for DYHB queries instead of
filesystem access.
* non-Foolscap-based storage server. Consider one embedded in HTTP,
try to reuse connections. Make life easier for implementors.
Removing the write-enabler and renew-secrets would remove the need
for transport-layer confidentiality. Consider a
cert-chain+signed-request message, messageid for responses, replay
protection.
* imagine an Apache mod_tahoe_server, mod_tahoe_client
* imagine a firefox extension
* Reorganize node classes, improve Service tree layout. Node should be
the parent, with StorageServer and Client as children. Replace
"tahoe create-client" with "tahoe create client" and "tahoe create
client,storage,helper" (the client is just one thing you might
create, the construction command takes a list).
* create plugin architecture. Add an App Store (hey, everybody's
doing it)
* local file access through webapi, based upon a ~/.tahoe/private/
-based secret. Enable JS tool to drag+drop between local disk and
tahoe FS. Enable web-based configuration of backup process. Add CLI
command to launch the JS tool with the secret from .tahoe/private,
encourage folks to bookmark it. Consider a "login page" to bounce
user to that bookmark.
* improve automated performance testing: compare "tahoe cp -r" against
tar+netcat, scp -r, rsync.
* add content-identifying metadata: hashes, rsync signatures
* improve "tahoe cp" to capture local metadata, like "tahoe backup"
does. Tolerate/capture symlinks.
* webapi to retrieve subtree as .tar/.tar.gz/.zip files. Represent
cycles as symlinks. Populate owner/mode from "tahoe cp/backup"
metadata.
* add symlinks: mutable redirection file, special cap format, webapi
GET automatically follows. And/or establish convention for "sharing
slots", which an uploader can give to a downloader before the upload
is finished, to indicate upload progress, eventual filecap, and
subsequent revocation.
* webapi for verifycaps: GET returns ciphertext, PUT accepts
ciphertext. Enable checking+repair. Imagine JS code to perform
decryption. webapi for traversalcaps.
* improved Introducer. Use Introducer for disk-watcher,
stats-gatherer, helpers. Distributed Introducer, gossip-based
dissemination.
* consider improving storage-server-side logging: timestamp, client
ipaddr, remote tubid, storageop, size.
* maintenance daemon: configure with rootcaps (or .com DB to pull
rootcaps from), it does: manifest, check, repair, lease-renewal,
stats. Give it a squad of worker nodes and configure relative
priorities. Web status pages with progress, expected cycle times,
queue depths, account stats, etc.
Whew! That was quite a list. I guess the next step is to go through and
locate or add ticket numbers for each.
Comments welcome!
-Brian
More information about the tahoe-dev
mailing list