#1765 new enhancement

gossip-introducer should forget about old nodes somehow

Reported by: warner Owned by: warner
Priority: normal Milestone: soon
Component: code-nodeadmin Version: 1.9.1
Keywords: gossip introduction Cc:
Launchpad Bug:

Description (last modified by daira)

Just a note-to-self: when #68 gets working, and decentralized gossip-based introduction is implemented, we should make sure the announcements are:

  • 1: refreshed periodically
  • 2: are dropped by clients when they're stale

The idea is that a server who has left the grid permanently should eventually be forgotten by everyone else. Gossip never forgets (even if you forget it locally, you'll be reminded by your cohorts, and if you don't remember what you forgot, you'll fail to forget it again).

The simplest way to accomplish this is with a timestamp in the announcement, and to prune entries more than maybe a month old. (but wait a few minutes after startup to do that, so if you leave your node offline for several months, it still has a chance to connect to somebody and fetch fresh announcements).

We aren't usually keen on timestamps, in particular comparing time from different nodes (in this case, the announcement's timestamp plus one month versus the client's clock). But I think this would be a reasonable use of clocks. As of yesterday, the announcement record includes a timestamp, named "seqnum" (so named because I didn't want to make any claims about it's use as a timestamp, but merely as a mostly-monotonically increasing number, used to decide when one announcement may replace another).

Maybe I should rename that to "when" or "announcement-time"?

The Introducer Client still needs code to refresh its announcements periodically (once a week would be fine). Currently it only refreshes them at node boot, and we don't want live-and-connected nodes with good uptime to start being ignored merely because they weren't rebooted frequently enough.

Change History (10)

comment:1 Changed at 2012-06-12T23:52:03Z by davidsarah

+1 for renaming "seqnum" to "announcement-time".

comment:2 Changed at 2012-06-13T03:52:21Z by zooko

I would be kind of sad to make tahoe-lafs require synchronization between clocks of different computers. As far as I know, it doesn't currently do so. There isn't any way to be sure that your computer's clock is synchronized with the clock of another computer (the one you are gossiping with), except by relying on a trusted third party -- an NTP server.

Except, the above is no longer true, now that Bitcoin exists. So I retract my longstanding objection against relying on synchronized clocks, and replace it with a suggested policy that the only remote-clock-synchronization protocol that a tahoe-lafs node is allowed to rely on is the Bitcoin blockchain.

P.S. Also in all seriousness I don't like the proposed design that much. Not only the part about requiring clock synchronization (and by the way in practice, clocks are often more than a month out of sync with each other, especially in some of the "different" deployment targets that people are increasingly interested in, such as embedded systems and Windows clients). I am concerned about relying on that, because our defenses against data deletion, rollback attack on mutables, and (hopefully in the future) unadd-attack on add-only-sets rely on the client connecting to a sufficient number of good servers. This seems to add another path by which accident or malice could prevent clients from connecting to good servers, which I think deserves careful risk analysis, both now and whenever we change the server-selection behavior.

But in addition to that, also the part about waiting for "a few minutes after starting up" sounds kind of fragile.

Let me try to think of a reasonable alternative to consider. What do you think of this:

  1. When telling other people gossip about servers, you don't tell them about servers that you aren't currently connected to.
  2. Remember the fact that you were unable to connect to a server last time you tried. When you start up, don't try reconnecting to that guy right away until you've finished trying to reconnect to more-likely-to-work ones. (Because of a bug that is really important on Windows: #605 (two-hour delay to connect to a grid from Win32, if there are many storage servers unreachable))
  3. If it has been more than a month on your local clock since you were able to connect to that guy, and you are currently able to connect to lots of other guys, then forget about that guy.

We need to carefully revisit 3 when changing anything to do with server selection, but at least there is less of a path for remote attackers to manipulate this than with the remote-clock-synchronization approach.

What do you say? This sounds not much more complicated than the initial proposal, and maybe less complicated. It is certainly less complicated if you include the fact that you have to think about the clock-synchronization protocol in that one and you don't in this one. Does this proposal satisfy the same values as the initial post does -- i.e. not letting dead servers pile up indefinitely in the gossip network?

comment:3 follow-ups: Changed at 2012-06-13T05:22:30Z by warner

  • Summary changed from gossip-introducer should include timeouts to gossip-introducer should forget about old nodes somehow

Great response!

I would be kind of sad to make tahoe-lafs require synchronization between clocks of different computers. As far as I know, it doesn't currently do so.

Yeah, I'm not keen on requiring synchronized clocks either. I was considering how we might have the recipient note the difference between their local clock and the sender's clock (or however that'd map to the flooded announcement scheme, where messages are being delivered by third parties minutes or days after they were created) and using that to correct for a static offset in future messages. But that feels fragile.

  1. When telling other people gossip about servers, you don't tell them about servers that you aren't currently connected to.
  2. Remember the fact that you were unable to connect to a server last time you tried. When you start up, don't try reconnecting to that guy right away until you've finished trying to reconnect to more-likely-to-work ones. (Because of a bug that is really important on Windows: #605 (two-hour delay to connect to a grid from Win32, if there are many storage servers unreachable))
  3. If it has been more than a month on your local clock since you were able to connect to that guy, and you are currently able to connect to lots of other guys, then forget about that guy.

Hey, that sounds great! Let's see, the first rule prevents the "persistent nonsense" problem, as long as any grid-control-only nodes (i.e. what the Introducer becomes in the new gossip world) follow this rule too. The only concern I can think of is that partial connectivity might prevent a new client from learning about nodes that they could normally connect to. In particular, could this interact with NAT in some way that might produce a less-connected grid than our current central Introducer? I don't think so, but I'd have to study it more.

The second rule is really about implementing connection throttling, which might want to be a Foolscap feature (maybe expressed as tub.setOption("pending-connection-limit", 10) or similar), and then asking for connections in a specific order (most-recently-seen first). Seems like a good idea, but not as critical as the other two.

The third rule prevents local nonsense from sticking around forever. It also ties into a more general "connection history" mechanism that I think we want: something to hold historic uptime, RTT, speeds, and overall reliability for each server we know about. This could be used to decide how long to wait for a response from the server before declaring it "overdue" (and switching to an alternate), and could eventually be published and aggregated to provide some sort of collaborative reliability-prediction metric to influence share placement or even storage prices (servers that everyone agrees have been highly available might command higher fees).

I like it! I'll update this ticket to reflect the new scheme.

Would you still be in favor of changing the Announcement field from "seqnum" to "announcement-time", even if we don't plan to use it for that purpose? The specific purpose of that field (which is inside the signed announcement body) is to prevent replay and rollback attacks (feeding an old announcement into some client in the hopes of changing their behavior in some useful way).

The publishing node could indeed just use a sequence number (incremented by one for each new message), but:

  • the counter would need to be stored and recovered safely, such as when rebuilding the node after a hard drive failure, otherwise peers would not believe new announcements until the new node's counter naturally incremented beyond the other values.
  • This would require periodic backup copies of the counter. In contrast, the other information needed to rebuild a node (node.privkey, node.pem) would be static.

I can imagine arguments against using time.time() instead of an actual counter:

  • more entropy for a de-anonymizing attacker to correlate
  • providing a potentially high-resolution timestamp (the current code uses all significant digits of time.time(), frequently microseconds) that might reveal time consumed during boot, which might help a timing attack on e.g. key generation or signature generation.
  • timequakes causing temporary disbelief of new announcements, requiring period refresh to make sure the disbelief is eventually overcome (imagine setting your clock back a day and then rebooting: you need to have at least one announcement more than one day after reboot to catch up)

Oh, wait, here's an idea: use a counter, remember it somewhere like NODEDIR/private/announcement.counter, initialize it to zero upon node creation. But: listen for your own announcements too. If you hear a valid announcement with a higher seqnum than what you're currently publishing, increase your counter to match. (if the announcement is different than what you're currently publishing, increase it one more.. that ought to converge).

What do you think about that? And, given your thoughts about that, what are your new thoughts about seqnum vs announcement-time? Can you think of any reason that we'd really like actual (possibly erroneous and/or malicious) wallclock values in Announcements?

comment:4 in reply to: ↑ 3 Changed at 2012-06-14T03:01:42Z by davidsarah

Replying to warner:

Replying to zooko:

  1. When telling other people gossip about servers, you don't tell them about servers that you aren't currently connected to.
  2. Remember the fact that you were unable to connect to a server last time you tried. When you start up, don't try reconnecting to that guy right away until you've finished trying to reconnect to more-likely-to-work ones. (Because of a bug that is really important on Windows: #605 (two-hour delay to connect to a grid from Win32, if there are many storage servers unreachable))
  3. If it has been more than a month on your local clock since you were able to connect to that guy, and you are currently able to connect to lots of other guys, then forget about that guy.

Hey, that sounds great!

[...]

I like it! I'll update this ticket to reflect the new scheme.

+1

I can imagine arguments against using time.time() instead of an actual counter:

  • more entropy for a de-anonymizing attacker to correlate
  • providing a potentially high-resolution timestamp (the current code uses all significant digits of time.time(), frequently microseconds) that might reveal time consumed during boot, which might help a timing attack on e.g. key generation or signature generation.
  • timequakes causing temporary disbelief of new announcements, requiring period refresh to make sure the disbelief is eventually overcome (imagine setting your clock back a day and then rebooting: you need to have at least one announcement more than one day after reboot to catch up)

If you use (time of last restart, # of announcements since restart) ordered lexicographically, that would solve the first two problems. It wouldn't solve the timequake problem: if you restarted the server at a local time earlier than the local time of some previous restart, you wouldn't recover until you restarted again.

comment:5 in reply to: ↑ 3 Changed at 2012-06-14T20:01:03Z by zooko

Replying to warner:

The only concern I can think of is that partial connectivity might prevent a new client from learning about nodes that they could normally connect to. In particular, could this interact with NAT in some way that might produce a less-connected grid than our current central Introducer? I don't think so, but I'd have to study it more.

Hrm. This idea of gossip conflicts with my idea that each server should attempt to connect to all clients -- and only to clients -- and that each client should attempt to connect to all servers -- and only to servers (#344, #1086).

It would also interact somewhat poorly with #444

In fact, why do we need to switch from introducers to gossip at all? Could we finish the rest of the #466 new-introduction-protocol and related accounting infrastructure while leaving the current centralized introducer (or the #68 multiple introducers) alone?

I think this discussion needs to move to the mailing list...

comment:6 Changed at 2012-06-14T20:11:11Z by warner

moved the discussion about whether to use sequence numbers (and how to recover from quakes) to #1767. Leaving the discussion about gossip and how-to-forget here, since they aren't as time-critical as #1767 (which I want to get resolved for 1.10)

comment:8 Changed at 2012-06-14T22:49:21Z by warner

Replying to zooko:

Hrm. This idea of gossip conflicts with my idea that each server should attempt to connect to all clients -- and only to clients -- and that each client should attempt to connect to all servers -- and only to servers (#344, #1086).

It would also interact somewhat poorly with #444

Hm. We could set it up so that grid-control announcements flow along all sorts of connections (instead of having nodes subscribe to a specific "grid-control" servicename). Then servers would learn about other servers even though they don't connect to each other, and new clients could learn about all servers from any one server. That might make it hard to prune uninteresting/bogus data, though (i.e. throw out records for things you don't care about, and rely on that mechanism to keep the overall dataset smaller).

Remember that "learning about node X" doesn't mean "connecting to node X". The Announcements are just data, they can be transported by anything (including some designated node that just gathers and serves up the current announcement list on demand). No long-term connections necessary.

Could we finish the rest of the #466 new-introduction-protocol and related accounting infrastructure while leaving the current centralized introducer (or the #68 multiple introducers) alone?

Yeah, sure, that's the plan. I'm just anticipating the future.

In fact, why do we need to switch from introducers to gossip at all?

Well, I think it'd be more robust, and would make grid setup easier. If we can embed a default relay (hosted on tahoe-lafs.org somewhere), then joining an existing grid could be as easy as:

  • install tahoe
  • run "tahoe [create+run+accept] INVITECODE"

And that gets you all of the following:

  • your new nodes learns all the servers it needs, no large introducer.furl to manage
  • learnine about new servers that are added in the future
  • acquiring storage rights on those servers (traceable to you and to the person who invited you, which I think is Ostrom-ideal)
  • granting storage rights on your server to others

and nobody else ever had to set up an Introducer either.

I'm not in favor of multiple-introducers (specifically the introducer.furls GSoC design) because I think introducers are a nuisance to set up, FURLs are a nuisance to transfer, and multiple introducers would still be multiple-points-of-centralization (instead of being properly decentralized like the title of #68 suggests). Multiple-introducers are an easier short-term target, but we've done without them for years now, so I'd rather push forwards on a better solution than add complication and maintenance burden for a partial solution.

I think this discussion needs to move to the mailing list...

Yeah, good idea. I'll try to write up more about the gossip/invitation scheme tonight.

comment:9 Changed at 2012-11-22T01:15:43Z by davidsarah

  • Keywords gossip introduction added

comment:10 Changed at 2016-03-31T14:34:28Z by daira

  • Description modified (diff)

Something like this is being worked on in #467.

Note: See TracTickets for help on using tickets.