#4097 new defect

1.19.0 node connection issues.

Reported by: tlhonmey Owned by:
Priority: normal Milestone: undecided
Component: unknown Version: n/a
Keywords: Cc:
Launchpad Bug:

Description

I recently decided to update my grid. It was running a mix of 1.14, 1.15, and 1.17. I had upgraded one of the nodes to 1.19 and it started complaining about SSL bad certificate issues when trying to communicate with other nodes.

After some discussion with meejah on IRC, it seemed like the best way to deal with the certificate mismatches was to just rebuild the grid, and then copy in the old storage folder.

After rebuilding the grid, things are... Strange.

The introducer node, can talk to everyone. That's good. Node No. 1, which is running on the same machine as the introducer, with a different port, can talk to everyone as well. That's good.

All the other nodes in the grid can only talk to one or maybe two different nodes, and that doesn't necessarily include themselves for some reason.

What's more, the helpful connection error report on the web status page has been replaced with opaque stack traces -- without even any line breaks -- like:

failure: [Failure instance: Traceback: <class 'allmydata.util.deferredutil.MultiFailure'>: /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:916:errback /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:984:_startRunCallbacks /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:1078:_runCallbacks /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:1949:_gotResultInlineCallbacks --- <exception caught here> --- /home/annie/.local/lib/python3.12/site-packages/twisted/internet/defer.py:1078:_runCallbacks /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:809:convertCancelled /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:292:_cancelledToTimedOutError /home/user/.local/lib/python3.12/site-packages/twisted/python/failure.py:481:trap /home/user/.local/lib/python3.12/site-packages/twisted/python/failure.py:505:raiseException /home/user/.local/lib/python3.12/site-packages/twisted/internet/defer.py:1999:_inlineCallbacks /home/user/.local/lib/python3.12/site-packages/twisted/python/failure.py:519:throwExceptionIntoGenerator /home/user/.local/lib/python3.12/site-packages/allmydata/storage_client.py:1348:_pick_server_and_get_version /home/user/.local/lib/python3.12/site-packages/allmydata/storage_client.py:1338:get_istorage_server ]

The stdout of the half-connected nodes contains nothing but messages about factories being started and stopped, with no real indication about why.

Meejah seemed to think this may have something to do with GBS. I'd be happy to do some diagnostics if there's some way we can coax something useful out of the system.

Change History (1)

comment:1 Changed at 2024-04-09T19:09:05Z by meejah

One thing to try, in case it's "something GBS related" or something HTTP related would be to turn off GBS. In tahoe.cfg you can do this in both the [storage] and [client] sections with a line force_foolscap = true

Note: See TracTickets for help on using tickets.