Version 18 (modified by meejah, at 2016-11-29T23:44:54Z) (diff) |
---|
The (November) 2016 Tahoe-LAFS Summit
This is a loosely organized gathering of Tahoe-LAFS developers, enthusiasts, and perhaps some hecklers.
TODO:
- need the updated "servers of happiness" doc
- need to actually-make the changes we talked about to Rain Hill 3
Dates / Location
Tuesday+Wednesday 2016-11-08 and 2016-11-09, 9am-5pm.
We're meeting in the "Board Room" of the Mechanics Institute Library, 57 Post St, San Francisco, CA (next to the Montgomery St BART station). When you arrive, take the elevator to the 4th(?) floor, and there should be signs. If you tell the front-desk staff that you're attending a meeting in the Board Room, they should be able to direct you. Don't arrive before 9am, I need to get there first and check us in.
The room has wifi, a whiteboard paper flip-board, and a projector screen. Brian will bring a small projector. There are plenty of restaurants and bars nearby for later in the evening.
We'll try to set up videochat so remote folks can join in the fun: contact us on IRC (#tahoe-lafs on freenode) and we'll make something work.
Attendees
(please add your name!)
For sure:
- Brian
- Daira
- meejah
- Zooko
- David
- Liz
Nope:
- str4d
Schedule
- Tuesday AM: applications, use-cases, productization, integration (with other apps)
- Tuesday PM: accounting, provisioning (including magic-wormhole, allow/deny storage servers), new GUI/WUI/CLI/API
- over beers in the evening: Sphinx/remailer crypto
- Wednesday AM: magic-folder -ish protocols, refresh our brains on #1382 (peer-selection / servers-of-happiness)
- Wednesday PM: new caps / encoding formats (chacha20, rainhill/elk-point, etc), mutable 2-phase commit, storage protocols, deletion/revocation
Agenda Items
(please add things you want to talk about)
- 1.12 release items
- (daira) 2-phase commit
- (warner) (meejah) (zooko) Accounting
- (warner) compelling applications
- (meejah) magic-wormhole based "setup" flow
- (meejah) integration (with other applications)
- write down use-cases
- e.g. what sorts of grids are there etc.
- (meejah) GUI (/WUI/CLI) things (e.g moar JSON endpoints, ...)
- (meejah) allow/deny storage servers (i.e. I want a grid where only "my" storage servers are used)
- (zooko) #1382 (We don't need to talk about it, we just need to do it. Brian and I can sit elbow to elbow until it is done, if he wants. ☺)
- (david) asymmetric crypto "caps"
- (david) chacha20 crypto caps
- (david) (meejah) data structures and or caps that support group revocation schemes via threshold of valid signatures
- (meejah) general "deletion" stuff: the different use-cases/scenarios and brainstorm ways to do this
- (meejah) magic-folder datamodel improvements (e.g. "leif's design" etc)
- (david) high level mixnet discussions of attacks and mitigations; n-1 attack mitigation via integration with reputation systems for increased reliability or heartbeat onions addressed to sender for verification of mix reliability etc.
- (david) code review of pylioness and go-lioness; clean up api design for parameterizing crypto primitives and thereby genericizing the lioness cipher construct
- (david) api design review of sphinxmixcrypto a fork of Ian's sphinx reference python code but modified to be more pep8 compliant and to parameterize the crypto primitives.
- (david) review of crypto primitive selection for sphinx mixnet packet header format; offered security bits versus packet header overhead etc.
- (david) post-quantum crypto modification to the sphinx mixnet packet format
- (david) formal verification of cryptographic protocols?
mixnet reading list: http://freehaven.net/anonbib/cache/mix-acc.pdf http://freehaven.net/anonbib/cache/danezis:wpes2003.pdf http://freehaven.net/anonbib/cache/DBLP:conf/sp/DanezisG09.pdf http://freehaven.net/anonbib/cache/trickle02.pdf
Raw Notes
- https://pad.lqdn.fr/p/tahoe-lafs-summit-2016
- Attendees: daira, dawuud, meejah, liz, warner, zooko, secorp. remote: exarkun
Tuesday AM: applications, use-cases, productization, integration with other apps
- use cases
- travelling through dangerous places: erase the laptop first, travel, then restore your home directory
- lawyers, business travel rules: to guarantee client confidentiality, they forbid proprietary data from being accessible on stealable devices
- journalists: even the suggestion of encrypted data on a laptop could be dangerous in some regimes
- medical information
- whistleblowing
- digital will: longer-term preservation of data
- erasure-coding slightly more relevant, for long-term reliability
- repair service is more relevant
- backup
- sysadmin/devops secret/credential management
- password manager among an ops team, also ssh keys, AWS creds
- who gets to see what: admin control
- could include revocation management, integration with AWS/etc (automatically roll creds when a user is revoked)
- business information: sensitive client data protection
- lawyers, organizations, medical records, activists, journalists
- technical secrets/proprietary-information
- sometimes run server yourself, sometimes pay commodity/cloud provider, or friendnet
- generalized communication tool: Slack-like UI, chat, file-sharing, directory-syncing
- enterprise document sharing
- some folks use SVN for this
- git-over-tahoe
- back up git repos
- https://git-annex.branchable.com/special_remotes/tahoe/
- use git to share, but backed by tahoe
- on a personal VPS
- can do a Dropbox-like thing
- start with at least 3 participants
- slack-like in-band file-sharing
- new chat app which includes file-sharing UI: maybe base on RocketChat?
- plugins for existing apps to share large files via tahoe
- Thunderbird large-file-attachment upload
- gmail suggesting attachments go into Google Docs instead of embedding in email
- basic key-value database
- would probably need to emulate an existing API (etcd? gconf?)
- backup of ~/.config/* (LSB?)
- travelling through dangerous places: erase the laptop first, travel, then restore your home directory
- features that (some of) these use cases need
- better multi-writer support
- existing tools
- subversion-based enterprise document-sharing
- separate fs-explorer app, "check-out" button, copies document to tempdir and launches application. "check-in" copies it back into SVN
- TortiseSVN
- RocketChat?: open-source slack-alike
- subversion-based enterprise document-sharing
- sketching out enterprise document-sharing tool
- provisioning:
- admin gives app, and maybe a provisioning string, to each client
- client installs app
- on windows, maybe app reads from windows registry, maybe user does name+password, to get provisioning string installed
- daira doesn't like this
- alan fairless from spideroak suggested this
- let's not do this *until* somebody asks for it. and maybe after they pay for it.
- maybe admin mints a new copy of each app with the provisioning data baked in
- probably confusing if user A shares their app with user B
- on windows, maybe app reads from windows registry, maybe user does name+password, to get provisioning string installed
- type in a provisioning string (provided by admin) maybe in argv
- provisioning string: maybe a full JSON file, maybe a (meta)filecap, maybe magic-wormhole invitation code
- provides:
- grid information: which servers to contact
- accounting: authority to write (maybe also read) to storage servers
- initial shared directory cap
- provisioning:
- one-shot (single-file, single-directory) sharing case - "all in one configuration"
- need a string that includes: grid info, read (maybe write) authority, readcap
- in IPFS and other "one true grid" architectures, this is a single hash
- how to deliver the access authority?
- friendnet/accounts vs agoric/payments
- long digression about pay-per-read as DDoS mitigation ("put a nickel in your router each month")
- grid info / gridcaps
- also see https://tahoe-lafs.org/trac/tahoe-lafs/ticket/403
- "grid id": signing keypair, metagrid/DHT holding signed grid-membership rosters
- maybe tahoe-lafs.org runs the durable seeds for the DHT
- contents could be equal to servers.yaml, maybe include introducers
- server operators: tend to allow up to N bytes for free, and only then need to charge (or even pay attention)
- integration with existing apps
- their main focus is not document storage/sharing, but they could use a plugin to help with it
- push the data to somewhere more convenient
- add nice crypto feels
- tahoe is generally not visible to those users
- thunderbird: file attachments
- slack: drag file into chat window
- their main focus is not document storage/sharing, but they could use a plugin to help with it
- accounting priorities:
- first: permissions: should a given client send shares to a given server, should a server accept shares from a given client
- two: measuring usage
- three: limiting: cut someone off when they're using too much (mark read-only, or delete all data)
- four: in-band payment
Tuesday PM: accounting, provisioning (including magic-wormhole, allow/deny storage servers), new GUI/WUI/CLI/API
- zooko's proposed sequence: what's the simplest thing that would work, then identify the likely attacks, then figure out the next step
- 1: everything is free
- attack: spam, freeloaders, tragedy-of-commons
- 2: servers charge for upload and download. storage (once uploaded) is free. Assume payment efficiency is good enough to allow one-payment-per-operation. No global reputation system, but individual clients remember servers
- global pool of servers, any server can add themselves to this advertisement list
- clients (for each upload) use 10 known-good old servers and 10 new-unknown servers from the list
- servers have an advantage: evil-server behavior is to accept the upload fee and then run away
- server can charge enough for the upload to pay the data-retention costs for some amount of time, then if nobody has downloaded/paid for it, delete the data
- server is never at a disadvantage
- client disadvantage is: if they have known-good servers, then half (10/(10+10)) of their money goes to evil-servers
- the more new servers they use, the faster they can find good ones
- if client pays for download at the end, client has advantage (they can download and then not pay)
- if client pays at the beginning of download, server has advantage (they can accept payment and then send random data)
- maybe do incremental payment: XYZ btc per chunk of data
- or some kind of partially-refundable deposit
- possible next step: payment amortization
- every time you send a coin, include a pubkey, establish a deposit. later if you need to ask the same server to do something, reference the deposit
- another possible next step: ask one of the servers that you've already paid to find the share and download it (and pay for it) for you
- 1: everything is free
- why do agoric/pay-for-service over choose-a-grid/account/relationship-based storage?
- sharing is easier when there's less context/hierarchy ("one true grid" is the best for sharing)
- one-true-grid is easier for clients to connect to (fewer things to provision), easier for clients to understand (one fewer concept to learn)
- OH: "how spoffy do you want it to be?" "that's spiffy!"
- define "spiffy" (resiliency/redundancy) as the opposite of "spoffy"
- "One True Grid" OTGv1
- 5 introducers run by 5 different orgs, introducers.yaml points to all of them
- anybody can run a server (which charges for uploads/downloads as above)
- clients learn about all servers
- one predictable problem: once too many servers appear, clients are talking to too many of them
- once too many clients appear, servers are talking to too many, cannot accept new clients
- idea (warner): introducer charges both clients and servers, charges more when more of them connect (client_price = N * len(clients))
- idea (zooko): reject clients after some fixed limit
- introducer could be moved to HTTP, probably scale just fine. client->server foolscap connections are the problem
- how do we tell that we're overloaded?
- servers running out of memory
- requests taking too long to complete
- clients unable to reach servers
- clients running out of memory
- how to limit growth?
- closed beta? issue tickets, one batch at a time, tokens that clients/servers must deliver to introducer
- or introducers are all under our control, they reject requests after some number
- or give tokens to people who pay a nominal BTC/ZEC fee, and the fee grows when we near the scaling limit
- again, how to tell that we're overloaded
- have servers report metrics to introducers: current-client-connections, response times, rate of requests
- have clients report request rates/success-rates
- servers pay (introducers) to get published
- if there are too many servers, clients are overloaded: this throttles it
- money goes to tahoe project, to pay programmers to write code to fix the congestion problem
- since this particular problem needs to be fixed in code/architecture
- clients pay servers per connection?
- servers advertise price (via introducer)
- closed beta? issue tickets, one batch at a time, tokens that clients/servers must deliver to introducer
- v1: servers pay tahoe to be advertised, clients get tokens (first N are free, then a nominal charge) to use introducer
- servers accept anybody who learns about them, clients connect to anyone they learn about
- 5 introducers run by 5 different orgs, introducers.yaml points to all of them
- raspberry pi with a barcode printer running storage server that dumps pile of ZEC private keys on your livingroom floor
- server price curve, client price curve: how to achieve stable/convergent share placement?
- (zooko): if a server hasn't received requests in a while, lower the price. if it receives lots of requests, raise the price.
- maybe track uploads and downloads separately
- if link is saturated and requests can't get through, it will look like no requests -> lower price -> more traffic -> oops
- principle: if you're failing, you probably can't tell. only other people can tell, and they might not be incentivized to tell you
- if you're getting paid, you should raise your prices
- admin configures a lower bound on price (based on e.g. their S3 costs)
- (zooko): don't even do that, let server admins decide at the end of the month whether they made money or not, whether to continue or not
- (warner): eek, unbounded S3 costs, server admins need to be responsible (write extra limiting code, don't use S3, find a pre-paid cloud provider)
- server starts with a completely random price. whee!
- clients: ignore top 10% or 50% of server prices
- is mostly convergent, only increases search cost by 10/50%
- deposit a nickle (in BTC), pay whatever the servers ask
- (warner): put half the shares on the cheapest server, half on the "right" (convergent-placement) servers
- pay half up front, half when upload/download is complete
- (meejah): start with an arbitrary (org-selected) price (maybe 2x S3) (maybe absolute minimum: 1 satoshi per something)
- (zooko): if a server hasn't received requests in a while, lower the price. if it receives lots of requests, raise the price.
- run an experiment, figure out rough max concurrent connection, call that M
- for server tokens, pay 1 zatoshi for first 0.5*M tokens, then start paying more to limit congestion
- because we believe max-concurrent-connections will be the first bottleneck, also it's probably a crashy/non-scaling limit (accepted load drops drastically once capacity is hit)
- v1: server requires 1 zatoshi up front, 1 at end, for both uploads and downloads
- if txn fee is 0.1 pennies, you get like 5000 operations for a $5 investment
- v2: amortize by establishing a deposit, send pubkey+minimal money (10x txn fee). spend complexity on protocol to stop spending money on miners
- over beers later:
- "deposit" / short-term not-necessarily-named "accounts": useful for both amortizing payment fees, settlement time, and for friendnet (preauthorized pubkeys)
- would there still be leases?
- what "serious" use case would tolerate the uncertainty of storage without some kind of SLA or expected lease period?
- over breakfast later:
- OneTrueGrid? is one product, other things (with explicit provisioning) "powered by Tahoe" for more durable/"professional" applications
- "ThePublicGrid?" "OnePublicGrid?"
Wednesday AM: magic-folder -ish protocols, refresh our brains on #1382 (peer-selection / servers-of-happiness)
- #1382 "servers of happiness"
- current error message is.. bad
- (exarkun) audience is someone who has just tried an upload, which failed. are they in a position to understand and act upon it?
- can we make the error message more actionable?
- "i only see N servers" or "only N servers were willing to accept shares", and "but you asked me to require H"
- maybe use a Foolscap "incident" to report this (in a managed environment) to an admin
- especially if the admin is the only one who can fix it
- "I was unable to place shares with enough redundancy (N=x/k=x/H=x/etc)"
- searching for that phrase should get people to the tahoe docs that explain the issue and expand N/k/H/etc
- rewriting the algorithm spec (warner's paraphrasal)
- find all the pre-existing shares on readonly servers. choose one of the best mappings, call it M1
- find all new pre-existing shares on readwrite servers (ignore shares that are in M1, since we can't help anything by placing those shares in additional places). choose one of the best mappings of this, call it M2
- find all potential placements of the remaining shares (to readwrite servers that aren't used in M2). choose one of the best mappings of this, call it M3. Prefer earlier servers.
- renew M1+M2, upload M3
- see also: https://github.com/warner/tahoe-lafs/commit/7a46e6047df73372c558339ef8e008537df00422
- options for PR140 (#573): https://github.com/tahoe-lafs/tahoe-lafs/pull/140
- the import weirds us out
- twisted plugins: the function declares ISomething, config file stores qualified name
- maybe hard-code a list of algorithms, tahoe.cfg specifies a name, big switch statement
- the goal of #573 is to enable more kinds of placement, e.g. "3 shares per colo, no more than 1 share per rack"
- probably needs to merge with #1382: one plugin that does both
- needs to do network calls, no synchronous
- current PR140 is sync
Wednesday PM: new caps / encoding formats (chacha20, rainhill/elk-point, etc), mutable 2-phase commit, storage protocols, deletion/revocation
- looking at Rainhill: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/NewCaps/Rainhill
- needs 2+1 passes: one to compute keys, second to encrypt+encode and produce SI, third to push actual shares
- we're (but not zooko) probably ok with non-streaming / save-intermediates-to-disk these days, because of SSDs
- diagram/protocol needs updating to:
- omit plaintext hash tree (assume decryption function works correctly)
- include/explain ciphertext hash tree, share hash tree
- show information/encoding/decoding flow (swirly arrows)
- maybe we can throw out P (needed for diversity/multicollision defense?)
- deletion:
- long time ago, we discussed "deletecap -> readcap -> verifycap"
- or for mutables: petrifycap -> writecap -> readcap -> verifycap
- zooko preferred petrifycap==writecap
- use cases:
- I screwed up: upload of sensitive data, omg delete now
- short-term sharing, which then expires
- digital will, revoke with confirmation of was-read or never-read
- share with group
- what should the server do if one person wants to delete it, one wants to delete it
- either preservationist wins or deletionist wins
- uncontested deletion should just work
- zooko's old mark-and-sweep explicit-deletion (as opposed to timed GC) idea #1832
- for each rootcap, build a manifest of childcaps. give whole set to storage server, then server immediately deletes anything removed from that manifest (zooko is uncertain this is accurate)
- actually: client fetches a "garbage collection marker" from the server. then client adds storage-index values to that marker. after adding everything they like, they say "flush everything not included in this marker", and the server deletes them
- markers are scoped to some sort of (accounting identifier, rootcap/machine identifier) pair
- still some race conditions, but probably fail-safe (fail-preservationist)
- this approach has another use: if I could push a list of verifycaps to my locals server it could download them if not present, which would allow for download-then-open workflow which will works way better on low-bandwith or flaky internet connections. I think this is very important use-case that's currently missed.
concurrent writes talking ideas
- 2PC might help in some cases:
- the servers written to by various clients overlap
- either because servers-of-happiness is more than half of the size of grid
- or because of some algorithm making the clients choose the same servers for writing
- servers do compare-and-swap, so if there was write by another node inside read/modify/write it's not easily overwritten
- partition of size of size-of-happiness will cause split-brain behaviour (bad for large grids!)
- the servers written to by various clients overlap
- locking (either for writes, or for certain operations for more concurrency-aware caps)
- we can have list of nodes to act as lock arbitrators, on more than half of the nodes the locking has to succeed
- if the quora goes down then the files become read-only unless unsafe write is manually foced
- we could maintain this list per-grid or store it in the cap itself
- locking could use the write keypair as identifier to perform locking on
- concurrency-aware capabilities
- message queue (insertcap / retrievecap) could be realized by a keypair
- storage servers store all writes (unordered), each having UUID
- reader removes processed messages using this UUID (optionally locking if there's more than one reader)
- use-case: single message recipient that treats messages as pull requests and manages by itself a mutable data as a sole writer
- use-case: email-like inbox with encryption
- append-only sets/dirs
- use-case: backup storage (until you run out of cakeWspace and need to erase old ones)
- rotation can be done by creting new append-cap
- use-case: backup storage (until you run out of cakeWspace and need to erase old ones)
- CRDT storage
- always needs to present client all the updates, storage can't merge encrypted data
- client could potentially merge the updates
- merge them all using locking
- merge particular ones using UUIDs
- use-case: directories, various application data (eg. caldav/carddav, possibly imap-like storage with flags)
- we can look at how eg. http://www.coda.cs.cmu.edu/ deals with it (it supports merging back offline-modified cached directories and files)
- message queue (insertcap / retrievecap) could be realized by a keypair