| 17 | |
| 18 | |
| 19 | == Web Control-Panel == |
| 20 | - explained goals to drewp, he found the full-page iframe weird (removing |
| 21 | an insecure-looking URL from the address bar to hide the fact that |
| 22 | we're using an insecure-looking URL for the iframe?) |
| 23 | |
| 24 | == leasedb crawler == |
| 25 | - leasedb: SQLite database which holds information about each share, all |
| 26 | leases, all Accounts. Used for fast computation of quotas, thresholds, |
| 27 | enforcement of server policy. Might be modified externally (give |
| 28 | ourselves room to use an external expiration/policy process). |
| 29 | - goals: discover manually added/new shares (including bootstrap and |
| 30 | after deletion of corrupt DB), discover manually deleted shares, |
| 31 | tolerate backends which only offer async operations (deletion, |
| 32 | share-is-present queries) |
| 33 | - approach: have a slow Crawler which notices new/old shares, updates DB. |
| 34 | New (manually-added) shares get a "starter lease" to keep share alive |
| 35 | until normal Account leases are established. |
| 36 | - expiration: done periodically, remove all leases that have expired, |
| 37 | then check their shares to see if numleases==0, then remove shares |
| 38 | - concerns: race conditions between manually-added shares and expired |
| 39 | leases, or manually-deleted shares and added leases |
| 40 | - solution: "garbage" flag on the share-table entry |
| 41 | - four state bits per Share entry: S=share-is-present, |
| 42 | ST=share-table-has-entry, L=leases-present, G=share-table-is-garbage |
| 43 | - event-triggered, either by a crawler tick, or by the backend responding |
| 44 | to a deletion request, or by the backend responding to a |
| 45 | is-the-share-there query |
| 46 | - if S but !ST !L !G, new share: add ST, add starter lease to L |
| 47 | - if S and ST but !L !G, set G and send a deletion request |
| 48 | - if ST and G but !S, that means we just deleted it, so clear ST and G |
| 49 | - if an upload request arrives while G is set, defer: cannot accept |
| 50 | upload until share is really gone, will be able to accept again in the |
| 51 | future |
| 52 | |
| 53 | == Kevan: #1382 share-placement algorithm == |
| 54 | - current algorithm doesn't always find a placement that meets H criteria |
| 55 | - there are situations (covered in unit tests) where a valid placement |
| 56 | exists, but the algorithm cannot find it |
| 57 | - first refactoring step: consolidate share-placement into a single object |
| 58 | - you send in data about the state of the world, it does some |
| 59 | computation, then tells you what you need to do, including asking for |
| 60 | more servers |
| 61 | - Kevan has a better placement algorithm built, his master's thesis |
| 62 | (almost done) proves it is sound and complete |
| 63 | - new algorithm effectively starts with a full graph, trims edges until |
| 64 | only the necessary ones are left |
| 65 | - brian is concerned about performance, doesn't want this to limit our |
| 66 | ability to scale to thousands of servers. His performance desiderata: |
| 67 | 200 shares, 1000 servers, algorithm should complete within a second |
| 68 | - davidsarah says no problem |
| 69 | |
| 70 | == David-Sarah: Rainhill == |
| 71 | - first explained current CHK |
| 72 | - Brian explained current dirnodes, transitive-readonly, super-encryption |
| 73 | of writecap column, utility of deep-verify-caps |
| 74 | - David-Sarah then explained "Railhill-3x", an incremental step |
| 75 | (immutable-only) which has readcap and verifycap, and went through the |
| 76 | security requirements in each piece |
| 77 | - the length of various intermediate fields directly affects the |
| 78 | ability to perform various attacks |
| 79 | - finding a collision on the readcap would enable variations of |
| 80 | Christian Grothoff's "Two-Face" attack (the one fixed in v1.2.0) |
| 81 | - finding a pre-image on the readcap would allow an attacker to create |
| 82 | new files that matched an existing readcap |
| 83 | - finding a pre-image on the verifycap (to the extent that can be |
| 84 | checked by the server) would enable "roadblock" attacks, where |
| 85 | attackers could fill SI slots with junk data and prevent the upload |
| 86 | of real shares |
| 87 | - then expanded to the full Rainhill-3 (although still just immutable, |
| 88 | not mutable) |
| 89 | - lots of discussion. We've been over a lot of these issues before, two |
| 90 | or three years ago, so a lot of it was paging tertiary memories back |
| 91 | into our brains. Some concerns: |
| 92 | - upload flow vs convergence: in most short-cap approaches, the full SI |
| 93 | isn't known until the end of the upload. That makes it hard to |
| 94 | discover pre-existing shares. A smaller number of bits (hash of P, if |
| 95 | I remember the diagram correctly) can be stored in the share and |
| 96 | queried at the start of upload |
| 97 | - server would need to retain table mapping P to full SI |
| 98 | - everyone acknowledged tradeoff/exclusivity between convergence and |
| 99 | streaming (one-pass) upload. Goal is to enable uploader to choose |
| 100 | which they want. |
| 101 | - integrity checks on the decrypted write-column (or read-column, in |
| 102 | the presence of deep-verify). In simpler designs, having a plaintext |
| 103 | hash tree (with the merkle root encrypted by the readcap, to prevent |
| 104 | Drew Perttula's partial-guessing attack) also lets us detect failures |
| 105 | in the encryption code (i.e. a fencepost error in AES CTR mode |
| 106 | causing corrupted decryption). It'd be nice to have similar |
| 107 | protection against decryption failures of each separate column. We |
| 108 | concluded that the current Rainhill design doesn't have that, and |
| 109 | readers would need to compare the e.g. writecap-column contents |
| 110 | against the readcap-column as a verification step. |
| 111 | |
| 112 | == add-only caps == |
| 113 | - lots of discussion about how "add-record" cap would work |
| 114 | - lattice of caps: petrify->write, write->add, add->verify, write->read, |
| 115 | read->deepverify, deepverify->verify |
| 116 | - preventing rollback |
| 117 | - overall goal is to enable a single honest server, or a single |
| 118 | client-retained value, to prevent rollback |
| 119 | - also to prevent selective-forgetfulness: adding record A, then adding |
| 120 | record B, server should not be able to pretend they've only heard of |
| 121 | record B |
| 122 | - general idea was a form of Lamport vector clocks, but with hash of |
| 123 | current set of known records instead of counters. Client might get |
| 124 | current state hash from server 1, send update to server 2 which |
| 125 | includes that hash (and gets S2's hash), etc around the ring through |
| 126 | serverN, then one last addition to server 1 (including SN's hash). To |
| 127 | rollback, attacker would need to compromise or block access to lots |
| 128 | of servers. (brian's mental image is a cylindrical braid/knit: |
| 129 | servers lie on the circle, time/causal-message-ordering extends |
| 130 | perpendicular to the circle, each {{{S[n].add(S[n-1].gethash())}}} |
| 131 | message adds another stitch, which would have to be unwound to unlink |
| 132 | an addition). OTOH we want to avoid making it necessary to talk to |
| 133 | all servers to retrieve/verify the additions |
| 134 | - general sense was that Bitcoin's block chain is somehow related. |
| 135 | Maybe servers could sign additions. |
| 136 | |
| 137 | == general project direction == |
| 138 | - various use cases, competitors, developer interests |
| 139 | - Tahoe is slower than many new/popular distributed storage systems: |
| 140 | Riak / Cassandra / Mongo / S3(?) / etc. |
| 141 | - many things didn't exist when we started.. now that other folks have |
| 142 | developed technology, we could take advantage of it |
| 143 | - core-competencies/advantages of tahoe: |
| 144 | - provider-independent security |
| 145 | - ease of self-reliant setup (running a personal storage server), vs |
| 146 | establishing an S3 account or building a Hbase server |
| 147 | - fine-grained sharing |
| 148 | - disadvantages (relative to some competitors, not all): |
| 149 | - slower |
| 150 | - not really POSIX, requires application-level changes |
| 151 | - mutable-file consistency management |
| 152 | - perceived complexity (either by users or by developers) |
| 153 | - potentially-interesting hybrids: |
| 154 | - use S3 for reliability, k=N=1, but encryption/integrity frontend |
| 155 | (this is what LAE offers, but it could be more built-in) |
| 156 | - use Dropbox for reliability and distribution, but with |
| 157 | encryption/integrity frontend |
| 158 | - different developers have different because-its-cool interests |
| 159 | - brian: Invitation-based grid setup, easy commercial backends, |
| 160 | universally-retrievable filecaps with/without a browser plugin, |
| 161 | Caja-based confined browser-resident web-app execution, confined |
| 162 | remote code execution services, bitcoin-based social accounting, |
| 163 | eventually work back up to most of the original mojonation vision |
| 164 | - zooko?: LAE |
| 165 | - davidsarah?: mentioned Rainhill, mutable-file consistency |
| 166 | - jmoore?: add-only caps for logging |
| 167 | |
| 168 | |