[tahoe-dev] switching from introducers to gossip?
Brian
warner at lothar.com
Mon Jul 2 00:51:30 UTC 2012
On 6/14/12 1:02 PM, Zooko Wilcox-O'Hearn wrote:
> Brian has been posting patches that move away from using introducers
> at all in favor of "gossip". Now if I understand correctly, gossip is
> simply "every node is an introducer (in addition to whatever other
> jobs it does)".
Yeah, the general idea is that all nodes provide the "grid-control"
service, in addition to the "storage" service they might be providing
right now. Nodes announce both "grid-control" and "storage" via the same
introducer Announcements as before. The old Introducer becomes a node
that only provides "grid-control", on a pre-published FURL.
"grid-control" lets you publish Announcements (either your own or ones
you're forwarding from others), and subscribe to the same.
Once that is in place, and we have some code to prevent infinite
flooding loops, there are several different approaches you could take:
* fully-connected mesh: every node makes a Foolscap connection to every
grid-control provider they hear about, subscribe to hear about all
announcements, and publish any announcements that the other side
doesn't already know about.
* opportunistic: clients only connect to storage servers, and storage
servers don't make outbound connections to anybody, but if you *do*
happen to be connected to someone who also offers "grid-control", then
connect to their grid-control object too and exchange Announcements
* cluster-of-Introducers: normal nodes don't offer grid-control, but
multiple Introducers do, and all of them know about each other. All
nodes connect to all grid-control providers (which means all
Introducers).
* one Introducer: this is just a degenerate cluster-of-Introducers
> Hrm. This idea of gossip conflicts with my idea that each server
> should attempt to connect to all clients -- and only to clients -- and
> that each client should attempt to connect to all servers -- and only
> to servers (#344, #1086).
I think we can probably accomodate that. I'm optimizing for our two main
use cases: friendnet and paid-service.
In the friendnet, nearly all nodes are both a client *and* a server.
Client-only nodes (like the one I occasionally connect to VG2 to
investigate bug reports), or server-only nodes (imagine a paid storage
server, the "rent-a-friend" idea I've talked about before) are rare. So
ruling out C->C or S->S connections doesn't change very much.
In the paid-service case (allmydata), we don't want clients talking to
each other (they're all behind NAT anyways). But we could allow S->S
connections without problems, and if all servers know about all other
servers, then we could add new servers to the grid by just connecting
them to at least one existing server, and knowledge of them would flood
quickly and reliably to everyone else.
> It would also interact somewhat poorly with #444
Note that we don't need active+online connections to all other nodes all
the time. Connecting with less than 100% duty cycle would still get the
information distributed eventually. What I'm really expecting is that
we'll use Zooko's clever log-scaling flooding techniques (from Mnet) to
limit the amount of traffic and connections but still achieve
rapid+reliable diffusion of knowledge.
> In fact, why do we need to switch from introducers to gossip at all?
> Could we finish the rest of the #466 new-introduction-protocol and
> related accounting infrastructure while leaving the current
> centralized introducer (or the #68 multiple introducers) alone?
They aren't interdependent, for sure. Now that #466 is in trunk, we've
got a handle on Announcements (i.e. the node key that signs each one) so
recipients can make decisions about whether they'll accept the thing
being introduced or not, independently of the channel by which they
received the announcement. *That* is important to unlock alternate
introduction topologies: without signed announcements, the only form of
grid control you can get is to limit who gets access to the Introducer
(as the VG2 folks accomplish by changing the introducer.furl each time
it is accidentally leaked). But with signed announcements, you don't
need control over the channel to retain control over which servers your
client uses, or over which clients your server will serve. You could
even safely use a single massive universe-spanning broadcast channel, if
you could make it efficient enough.
And the first steps of Accounting don't require changes to introduction
at all. These steps will enable tracking of who-uses-what, and manual
control (probably by pasting nodeids into tahoe.cfg) over both
which-servers-should-I-use and which-clients-should-I-accept. This needs
signed announcements (to get a strong nodeid of a server) and signed
accounting-facet-of-storage-server FURLification messages (so clients
can demonstrate control of a key). The main question is whether nodes
which are both clients and servers should have a single key, or two
separate keys (I prefer a single key, because it makes reciprocal
storage-permission grants easier).
The second steps of Accounting, where we try to make things easy and
automatic for our common use cases, is where we start getting into my
Invitation scheme, and is where gossip becomes more interesting. What I
really want is to make it super-easy for a new user to get their node
running and connected to their friend's existing grid. And, more
importantly, for that *first* friend to set up that grid.
Imagine for a moment that we have a nicely-packaged OS-X or debian app,
already distributed via the mac App Store or through apt/etc. And also
imagine that we've got uPnP working (or something equivalent, maybe
involving a relay or some helper service that we run), so NAT isn't a
problem. Then this is my goal:
The first friend (Alice) hears about Tahoe from her favorite blog, and
installs it with her favorite package manager. She lauches it for the
first time, and it asks "start your own grid, or join someone
else's?", and she picks "start your own". Her node starts up,
establishes an external IP address, sets itself up to restart at
reboot, and announces that Alice is now the proud member of a 1-node
grid, and that she should invite a few friends to join before she'll
get more than educational value out of the system. She hits the
"Invite A Friend" button, types Bob's (pet)name and email address into
the box, and the node sends Bob a message with links to the
application, instructions, and an invitation code.
Bob gets Alice's email, downloads+installs the app, and pastes in the
invitation code. The next thing he sees is a picture of the two-node
grid, with the Alice and Bob nodes labeled, and he can upload files
and either retrieve them locally or share them with Alice.
Later, Alice and Bob invite other people to join in their grid. The
only grid-specific coordinates that each new member needs is a
single-use invitation code like "d77hbsmkgeufjpwacu3ywkbwem".
Eventually, Alice leaves the grid, but her departure doesn't affect
the remaining members: they can still connect and exchange shares as
usual.
All grid members get a control panel where they can see who else is
using their storage, allow/deny access, and control where their own
node places shares. By default, anyone who gets invited to join the
grid gets full access to storage on all members' servers, but access
can be revoked at any time.
The corresponding story with an AllMyData-like paid-service is:
Alice visits allmydata.com, signs up for the service with a credit
card, downloads the client app and gets an invitation code for her
account. She pastes the invitation code into the "accept an
invitation" box when her app starts up.
Her app connects to all AllMyData storage servers and is allowed
storage access. New servers can be added without Alice's involvement.
Any subset of the servers can go away without affecting her ability to
connect to (or learn about) the rest.
To support those stories, I don't want Alice (or AllMyData) to be
running a single Introducer, or even a cluster of Introducers. Alice,
Bob, and the other members of the friendnet should *all* be helping each
other connect to the rest of their grid.. otherwise they have to pay
attention to how many Introducers are present, and who's responsible for
them, and make sure there are enough left available to accomodate
changes.
Does that help explain my interest in gossip-based introduction?
cheers,
-Brian
More information about the tahoe-dev
mailing list