#795 new enhancement

add-only sets

Reported by: warner Owned by:
Priority: major Milestone: undecided
Component: code-mutable Version: 1.5.0
Keywords: newcaps revocation research Cc: jeremy@…, leif@…
Launchpad Bug:

Description (last modified by zooko)

It's a long-term goal, but it would be nice to have append-only mutable files in Tahoe. Specifically, that means some sort of mutable slot that has the following caps:

  • writecap (permits arbitrary modification)
  • appendcap (permits appending new data, not removing existing data)
  • readcap (permits reading arbitrary data)
  • verifycap

Note that the appendcap and readcap are peers: neither can be derived from the other.

       writecap
       |      |
       v      v
 appendcap   readcap
       |      |
       v      v
       verifycap

This requires some tricky encryption, and will rely upon the servers to enforce certain types of message updates. (this means that a sufficiently-large set of colluding servers will be able to roll back appended messages until someone with the writecap comes along and flattens them all into the base file). I've got some notes on this scheme somewhere, which I'll eventually transcribe into this ticket. The basic approach is that the appendcap gets you a signing key, and the server will accept signed append messages, and put them next to all of the other messages. The writecap gets you a different signing key, with which the servers will accept arbitrary rewrite operations. I also have notes on revocation schemes (where you can create multiple "subwritecaps" for an object, but revoke them later), which involves even more sets of signing keys.

This could be integrated with LDMF mutable files, in which the servers hold a series of append-only revision nodes, forming a version-history graph (which I still think should be based on Mercurial's "revlog" format). In this case, writecap holders get to add revision nodes, but not remove older ones.

Change History (17)

comment:1 Changed at 2009-08-22T23:28:17Z by warner

oh, and of course, if the appendcap truely doesn't give you the ability to read any data, then this needs a public encryption key (like RSA or El-Gamal, not DSA). Each "append" message would have the data encrypted with a randomly-generated symmetric key, and then the key would be encrypted to the readcap's RSA decryption privkey.

There might be some other sort of "append-and-read-cap", which gives you both the ability to append messages and to read the existing messages (but not to remove anything: that is reserved for the writecap holder). I can imagine use-cases for both. This sort of cap would have a more straight-line derivation: writecap -> append-and-read-cap -> readcap.

comment:2 follow-up: Changed at 2009-08-23T00:45:17Z by warner

More notes:

Basically, each append-only-able file would have three keypairs:

  • A: the whole-file-changing DSA keypair (As/Av)
  • B: an appending DSA keypair (Bs/Bv)
  • C: an appending RSA (encryption) keypair (Ce/Cd)

The writecap would get you everything, the appendcap gets you Bs+Ce, and the readcap gets you just the three verification/decryption keys Av+Bv+Cd. The server must know (and enforce with) the verification keys Av+Bv.

The server would be responsible for:

  • only allowing full-file rewrites that are signed by As
  • accepting "append" messages signed by Bs and appending them to the file (or putting them in a unique file in the share directory, or something)

Andy the Appendcap-holder would need to encrypt his message with Ce, sign it with Bs, then send it to the servers. Rose the readcap-holder would check all the signature and then use Cd to decrypt the append messages. Val the verifycap-holder needs to be able to check the hashes and signatures on the append messages, but not decrypt them (so we need encrypt-then-sign, instead of sign-then-encrypt). And Rob the Repairer needs to be able to make new shares out of everything, so the body being signed should contain the share hash tree root. Each append message will be erasure-coded separately.

Every once in a while, Wally the writecap holder might merge all the append messages down into the "base object". Until this happened, a colluding set of servers could discard arbitrary append messages (meaning the file would not be monotonically increasing). There might be ways to have each message include the hash of the previous ciphertext to detect this sort of thing (forcing the server to discard all-or-nothing, restoring monotonicity, but introducing race conditions).

Maybe "add-only collection" would be a better model for this, instead of implying that the object contains a linear sequence of bytes. In fact, we might say that the objects stored in this collection are arbitrary strings, with IDs based upon hashing their contents, and make the append(X) operation be idempotent, and give readers an unsorted set of these strings. This would make things like the add-only directory easier to model.

To turn this into a dirnode, the append message would probably contain a "proposed child name" and child readcap/writecap. If/when Wally merges down the file, any name conflicts would be resolved (by appending "(1)", or whatever) then.

Preserving transitive-readonlyness probably requires that both writecap holders and appendcap holders (Wally and Andy) be able to get to the same symmetric key (the child-writecap superencryption key). One noteable difference with the normal mutable-file layout is that this child-writecap key must *not* let you derive the read key. Rose the readcap holder can see the (encrypted) child-writecap column but can't decrypt it. Andy the appendcap holder can't see the child-writecap column ciphertext, but if he could, he could decrypt it (since he's supposed to be able to put child-writecaps somewhere that Wally can read them). This reveals a collusion attack: Andy and Rose could work together to learn the child writecaps.

So child-writecaps in append messages must be encrypted to the writecap holder, and not to anybody else. This suggests we need another encryption keypair, De/Dd?, in which the writecap holder gets Dd, and everybody else gets De (but only Andy will use it, because those messages must also be signed by Bs, held only by Andy and Wally). This can't be a combined sign+encryption key, which might save space, since Val and Rob need Bv to verify those append messages later.

Last edited at 2013-08-13T18:54:05Z by zooko (previous) (diff)

comment:3 in reply to: ↑ 2 Changed at 2009-09-01T23:56:10Z by davidsarah

Replying to warner:

Every once in a while, Wally the writecap holder might merge all the append messages down into the "base object". Until this happened, a colluding set of servers could discard arbitrary append messages (meaning the file would not be monotonically increasing). There might be ways to have each message include the hash of the previous ciphertext to detect this sort of thing (forcing the server to discard all-or-nothing, restoring monotonicity, but introducing race conditions).

... or if we embrace the add-only collection abstraction, a hash of some subset of the existing entries.

Maybe "add-only collection" would be a better model for this, instead of implying that the object contains a linear sequence of bytes. In fact, we might say that the objects stored in this collection are arbitrary strings, with IDs based upon hashing their contents, and make the append(X) operation be idempotent, and give readers an unsorted set of these strings. This would make things like the add-only directory easier to model.

This abstraction makes a lot of sense to me. I had been thinking of add-only directories as being implemented in terms of append-only files, but actually an add-only capability for a set of byte strings is directly applicable to all of the use cases I can think of. For example,

  • to implement the immutable backup scenario, write each backup as a set of new files (ideally using immutable directories, although you could make do without them). Since they are not attached to an existing directory structure, this does not require any authority other than the necessary storage quota. Then, use an add-only set to record the root caps for each backup. If the backups are incremental, make them dependent on (by including a hash of) previous backups. The read key for the set is kept off-line, or generated from a passphrase.
  • to implement a tamper-resistant "log file", use an add-only set to represent the set of timestamped log entries (perhaps entries at about the same time could be batched for efficiency).
  • an add-only cap can represent the authority to submit a transaction, with eventual consistency. This can represent any change to application-level data structures, not just an addition. Multiple holders of the add-only cap can submit transactions without being able to interfere with each other.

The set abstraction also has the advantage of not appearing to give consistency properties that are actually unimplementable. When you read the set, you get some subset of its entries. Additions to the set are only eventually seen by readers. You can make an addition dependent on some of the existing entries, in which case anyone who sees that entry is guaranteed to also see its dependent entries. This sounds like a robust abstraction that you could imagine building useful higher-level distributed systems on top of.

Can anyone think of important use cases where the add-only set semantics are not sufficient?

comment:4 Changed at 2009-10-28T03:31:19Z by davidsarah

  • Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:5 Changed at 2010-03-07T07:15:22Z by jsgf

  • Cc jeremy@… added

comment:6 Changed at 2010-07-12T17:47:58Z by zooko

See also #796 (write-only backup caps). I'm not sure but think this ticket and #796 are subtly different use cases that might hopefully be addressed with a single mechanism.

comment:7 Changed at 2010-07-12T17:48:16Z by zooko

See also the notes about append-only caps in NewCapDesign.

comment:8 Changed at 2010-07-17T06:58:56Z by davidsarah

Nikita Borisov wrote on tahoe-dev:

The basic structure we're thinking of for our decentralized social networking is to have essentially objects with "annotations" (objects). Then, for example, a Facebook profile would be an object that describes the owner, with annotations corresponding to wall posts. Each wall post could further be annotated by comments, and possibly comments on comments, etc. etc.

The access control rules I'd like to have is that, for each object, there are the standard read/write/verify capabilities, as in Tahoe, but also an annotate capability, which allows you to add an annotation. This may be something that could be implemented with an append capability, but you would also need to ensure some sort of structural integrity; i.e., if an annotation is, say, a read capability, you would not want someone to be able to append half a capability; the annotation must be self-contained.

This sounds like a very good fit to the add-only collection approach.

comment:9 Changed at 2012-09-10T19:54:41Z by zooko

  • Keywords revocation added

comment:10 Changed at 2013-01-22T14:23:59Z by zooko

  • Keywords research added

comment:11 Changed at 2013-04-11T05:59:01Z by zooko

  • Summary changed from append-only files to add-only sets

comment:12 Changed at 2013-04-11T22:49:54Z by leif

  • Cc leif@… added

comment:13 Changed at 2013-11-13T07:06:06Z by zooko

  • Description modified (diff)

Here's my rendition of our discussion of add-only sets at the Tahoe-LAFS Summit today. (As usual, I altered and embellished this story significantly while writing it down, and people who were present to witness the original discussion are invited to chime in.)

An add-only cap doesn't have to also be a write-only cap. It might be good for some use cases that you can give someone a cap that lets them read the whole set, and add elements into the set, without letting them remove elements or change previously-added elements. It might be good in some other use cases to have an "add-only&write-only" cap, which allows you to add elements into the set but doesn't let you read elements of the set, nor remove nor change previously-added elements. We agreed to focus on the former case for now, because it is easier to design and implement a solution to it. (See #796 for discussion of write-only caps.)

We agreed to forget about erasure-coding, which makes an already-confusing problem (how to implement add-only sets without allowing a few malicious servers, adders, or set-repairers to perform rollback attack or selection attack), into a very-confusing problem that exceeded my brain's ability to grapple with it.

So, for now, assume that add-only sets don't use erasure-coding at all.

Now, the basic design we came up with is like this. I'll explain it in multiple passes of successive refinement of the design.

FIRST PASS: DESIGN "0"

An authorized adder (someone who holds an add-cap) can generate "elements", which are bytestrings that can be added into the set. (I mispronounced "elements" as "elephants" at one point, and from that point forward the design was expressed in terms of a circus act involving elephants.)

Elephants have an identity as well as a value (bytestring), so:

    aos = DistributedSecureAddOnlySet()
    aos.add_elephant(b"\xFF"*100)
    aos.add_elephant(b"\xFF"*100)

results in aos containing two elephants, not one, even though each elephant has the same value (the bytestring with one hundred 0xFF bytes in it).

aos.add_elephant() generates a random 256-bit nonce to make this elephant different from any other elephant with the same value. I call this "putting a tag on the elephant's ear" — a "tagged elephant" is a value plus a nonce. Even if two elephants are identical twins, they can be distinguished by the unique nonce written on their ear-tags.

aos.add_elephant() then puts a digital signature on the tagged-elephant (using the add-only-cap, which contains an Ed25519 private key), and sends a copy of the tagged-elephant to every one of N different servers. Putting a digital signature on a tagged-elephant is called "wrapping a net around it".

A reader downloads all the tagged-elephants from all the servers, checks all the signatures, takes the union of the results, and returns the resulting set of elephants.

Design "A" relies on at least one of the servers that you reach to save you from rollback or selection attacks. Such a server does this by knowing, and honestly serving up to you, a fresh and complete set of tagged-elephants. “Rollback” is serving you a version of the set that existed at some previous time, so the honest server giving you a copy of the most recent set protects you from rollback attack. “Selection” is omitting some elephants from the set, so the honest server giving you a complete copy of the set protects you from selection attack.

SECOND PASS: DESIGN "1"

We can extend Design "0" to make it harder for malicious servers to perform selection attacks on readers, even when the reader doesn't reach an honest server who has a complete copy of the most recent set.

The unnecessary vulnerability in Design "0" is that each tagged-elephant is signed independently of the other tagged-elephants, so malicious servers can deliver some tagged-elephants to a reader and withhold other tagged-elephants, and the reader will accept the resulting set, thus falling for a selection attack. To reduce this vulnerability, adders will sign all of the current tagged-elephants along with their new tagged-elephant with a single signature. More precisely, let the "identity" of a tagged-elephant be the secure hash of the tagged-elephant (i.e. the secure hash of the nonce concatenated with the value). The signature on a new tagged-elephant covers the identity of that tagged-elephant, concatenated with the identities of all extant tagged-elephants, under a single signature. In circus terms, you add the new tagged-elephant into a pile of tagged-elephants and throw a net over the entire pile, including the new tagged-elephant.

Now, malicious servers can't omit any of the older tagged-elephants without also omitting the new tagged-elephant. Readers will not accept the new tagged-elephant unless they also have a copy of all of the other tagged-elephants that were signed with the same signature. This limits the servers's options for selection attacks.

THIRD PASS: DESIGN "2"

We can refine Design "1" to make it cleaner and more CPU-efficient and network-efficient. This will also lay the groundwork for an efficient network protocol.

The unnecessary "dirtiness" in Design "1" is that the digital signatures on older tagged-elephants become extraneous once you add a new digital signature. We have a mass of tagged-elephants, we throw a net over the whole mass, then later when we add a new tagged-elephant to the pile, we throw a new net on top of the new (slightly larger) pile. Now the underlying net has become redundant: once you've verified the signature of the outermost net, there is no need to check the signature of the inner net. In fact, if one implementation checks the signature of the inner net and another implementation does not check it, then a malicious adder colluding with a malicious server could cause the implementations to differ in their results, by putting an invalid net (an invalid signature) topped by a new tagged-elephant with a valid net. (Daira was the one who noticed that issue.)

To make this cleaner and more efficient, we will never put a net around a net, and instead we'll keep each tagged-elephant in a box. When you want to add a new tagged-elephant to a set, you rip off and throw away any extant nets, then you put the new tagged-elephant in a box which is nailed on top of the previous topmost box. Then you wrap a net around the new topmost box. "Nailing" box Y on top of box X means taking the secure hash of box X and appending that to box Y (before signing box Y). A "box" is a tagged-elephant concatenated with any number of "nails", each of which is the secure hash of a previous box.

(Note that you can sometimes have two or more boxes precariously perched at the top of a stack, when two adders have simultaneously added a box before each saw the other's new box. That's okay — the next time an adder adds a box on top of this stack, he'll nail his new box to each of the previous topmost boxes.)

Boxes are a lot more CPU-efficient than nets, and more importantly nobody (neither readers, adders, nor servers) needs to revisit a lower-down box in order to add a new top-most box. Once you nail box Y on top of box X, then you can later add box Z just by taking the hash of box Y, without revisiting box X.

Note that we need two different secure hashes here: one is the identity of a tagged-elephant, which is the secure hash of: the nonce concatenated with the value. The other is the hash of the box, which is the secure hash of: the identity of a tagged-elephant concatenated with the hashes of any previous boxes. We need the identity of a tagged-elephant for finding out whether a certain tagged-elephant already exists in a stack (regardless of what position it occupies within that stack), and we need the hash of the box for efficiently verifying that all the tagged-elephants in a stack were approved by an authorized adder.

This also leads to the efficient network protocol: an adder can remember (cache) the Directed Acyclic Graph of boxes which a given server previously told the adder about. When the adder wants to add a new tagged-elephant or a set of new tagged-elephants to that server, he can send just the boxes which would be new to that server, assuming that the server hasn't learned anything new since the last time they talked. Readers can do likewise, remembering what each server previously told them about, and asking the server to just tell them about things that are not already covered the topmost box(es) that the reader already knows about.

CONCLUSION

Okay, that's it! I think Design "2" is a good one. It has good security against rollback or selection attacks by malicious servers (assuming some kind of whitelisting of servers! Which is ticket #467 and is not yet implemented.) And, it doesn't go too far over the top in terms of complexity; it seems more intuitive to me than (my vague memories of) previous attempts to design add-only sets for LAFS.

(By the way, there are a few other possible ways to strengthen defenses against rollback attack, which we've previously considered in the context of mutable files, but they probably also apply to add-only sets.)

I'm excited about this design being feasible, because I think add-only sets could be a critical building block in valuable use-cases such as secure logging, secure email, secure backup, and more.

Version 0, edited at 2013-11-13T07:06:06Z by zooko (next)

comment:14 Changed at 2013-11-29T15:12:18Z by joepie91

While #796 is my particular usecase, this seems to be the canonical thread for the base implementation, so I'll comment here; I was also looking for write-only caps (in particular append-only directories where you can only append files and not change, move/rename or remove files), for a write-only backup mechanism where no single server has the ability to remove any backups.

In my case, the threat model would be an attacker gaining (full or partial) access to a server that is being automatically backed up with Duplicity, intending to irrevocably wipe all data he can. A write-only cap would make the attacker unable to remove existing backups; only I personally would be able to do so, using a writecap that is not stored on any server.

I should also note that there'd be a potential confidentiality issue with non-read write-only caps (that I expect at least zooko to have already thought of): if you have a cap that is supposedly intended to prevent any discovery of data (including a directory listing), how would you reject an attempt at overwriting an entry (ie. a conflict) without revealing that the original entry exists, thus causing an information leak and confidentiality breach?

Either way, I very strongly +1 this write-only-cap suggestion - and I believe 'appendcap' would be an apt name for it :)

comment:15 Changed at 2014-07-25T18:06:26Z by nejucomo

We're chatting about use cases during Tesla Coils & Corpses. These use cases imply slightly different semantics and I imply this by the capability names in environment variables:

  1. syslog case
  • install tahoe, then run tahoe syslog-collector $APPEND_ONLY_CAP
  • elsewhere someone can do tahoe get-records $READ_RECORD_CAP (-note this is not a byte-oriented read.)
  • Even when the syslog host is compromised, the attacker cannot remove "sufficiently" written records.
  1. backup case
  • tahoe backup - Similar to syslog, except the records refer to immutable directories. Also, maybe ordering is less important?
  1. drop inbox
  • A directory-like thing where multiple writers may add new entries, but never overwrite entries.
  • For example, the names are universally unique opaque identifiers which fit into users' notions of filesystem names so they can "naturally" do things like ls, tar, cp etc...
  • This could be used to have a grid-wide publication capability distributed next to the introducer furl. (This ignores storage accounting issues.)

I've missed many spoken use cases while writing this.

edit: Minor markup changes to list delimiters and dropped some parentheses.

Last edited at 2014-07-25T18:47:17Z by nejucomo (previous) (diff)

comment:16 Changed at 2014-10-28T04:27:59Z by zooko

There was a really good discussion about this in a Tesla Coils & Corpses meeting, and I wrote pretty good notes from that meeting here: pipermail/tahoe-dev/2014-August/009141.html

comment:17 Changed at 2018-12-11T04:05:02Z by tlhonmey

I needed something like this to prevent ransomware from dumping/corrupting my backups.

Not wanting to let the perfect be the enemy of the good, I wrote a terrible tool that does the job with existing data types.

#! /bin/bash

if [[ -z "$1" ]]; then
  #no data, create new store
  WCAP=`tahoe mkdir`
  LASTNODE=`tahoe mkdir`
  tahoe ln $LASTNODE $WCAP/lastnode
  NEXTNODE=`tahoe ls --readonly-uri $WCAP | grep lastnode | awk '{ print $2 }'`
  tahoe ln $NEXTNODE $WCAP/nextnode
  echo $WCAP
elif [[ -z "$2" ]]; then
  #no data to push, so read chain
  walknodes() {
    tahoe ls $1 | grep data > /dev/null && tahoe get $1/data && walknodes $1/nextnode
  }
  walknodes $1/nextnode
else
  #new data, push chunk and increment
  WCAP="$1"
  DATA="$2"
  NEWNODE=`tahoe mkdir`
  DATANODE=`echo "$2" | tahoe put `
  echo "linking data"
  tahoe ln $DATANODE $WCAP/lastnode/data
  echo "linking nextnode"
  tahoe ln $NEWNODE $WCAP/lastnode/nextnode
  echo "linking lastnode"
  tahoe rm $WCAP/lastnode
  tahoe ln $NEWNODE $WCAP/lastnode
fi

Now obviously it's slow, and using raw URIs on the command line isn't the most secure thing in the world, but it only stores the write cap for the end of the linked list, so any data recorded prior to an intruder getting in to where they can log writecaps is effectively immutable. Store the first data node's URI as your immutable recovery start point.

Note: See TracTickets for help on using tickets.