Opened at 2014-03-22T17:48:54Z
Closed at 2017-08-15T13:04:53Z
#2204 closed defect (fixed)
cloud branch: allmydata.storage.leasedb.NonExistentShareError
Reported by: | CyberAxe | Owned by: | exarkun |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-storage | Version: | cloud-branch |
Keywords: | renew leases leasedb blocks-cloud-merge | Cc: | |
Launchpad Bug: |
Description (last modified by exarkun)
processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'.. re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' re-using old directory for 'c:\BackMeUp\WesternColorado_files' creating directory for 'c:\BackMeUp\' Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run put_child(archives_url, now, new_backup_dircap) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child raise HTTPError("Error during put_child", resp) HTTPError: Error during put_child: 500 Internal Server Error "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback ( failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r emote host -- Traceback (most recent call last):\\n File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n cb(*args, **kwargs)\\n File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n op(res)\\n Fil e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n self ._startRunCallbacks(result)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\ ", line 490, in _startRunCallbacks\\n self._runCallbacks()\\n--- <exception caught here> ---\\n File \"/usr/local/lib/python2.7/dist-pac kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n current.result = callback(current.re sult, *args, **kw)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n account. add_or_renew_default_lease(self.storage_index, shnum)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n File \"/home/custom er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n renewal_time, expiration_time)\\n File \"/home/cust omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None) "
Extracting and reformatting the traceback contained in that error:
Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/foolscap/eventual.py", line 26, in _turn cb(*args, **kwargs) File "/home/customer/LAFS_source/src/allmydata/util/deferredutil.py", line 55, in _with_log op(res) File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 382, in callback self._startRunCallbacks(result) File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 490, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/customer/LAFS_source/src/allmydata/storage/backends/base.py", line 170, in _update_lease account.add_or_renew_default_lease(self.storage_index, shnum) File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 58, in add_or_renew_default_lease return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time) File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 63, in add_or_renew_lease renewal_time, expiration_time) File "/home/customer/LAFS_source/src/allmydata/storage/leasedb.py", line 267, in add_or_renew_leases raise NonExistentShareError(si_s, shnum) allmydata.storage.leasedb.NonExistentShareError: can't find SI='o2ofk2psfahdhyf5uq3nqdmrwe' shnum=0 in `shares` table
Change History (11)
comment:1 Changed at 2014-03-25T02:08:18Z by CyberAxe
comment:2 Changed at 2014-04-15T16:06:30Z by zooko
- Summary changed from in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy to data.storage.leasedb.NonExistentShareError
comment:3 Changed at 2014-04-15T16:07:55Z by zooko
- Keywords leasedb added; NonExistenShareError removed
comment:4 Changed at 2014-04-15T16:22:31Z by daira
- Component changed from unknown to code-storage
- Keywords leases added; lease removed
- Summary changed from data.storage.leasedb.NonExistentShareError to cloud branch: allmydata.storage.leasedb.NonExistentShareError
- Version changed from 1.10.0 to cloud-branch
comment:5 Changed at 2014-04-15T16:23:47Z by daira
- Keywords cloud-merge added
comment:6 Changed at 2014-04-15T16:24:55Z by daira
Possible duplicate of #1921.
comment:7 Changed at 2014-05-15T15:59:20Z by daira
- Keywords blocks-cloud-merge added; cloud-merge removed
comment:8 Changed at 2017-08-08T16:25:41Z by exarkun
- Description modified (diff)
comment:9 Changed at 2017-08-08T16:26:52Z by exarkun
- Owner changed from daira to exarkun
- Status changed from new to assigned
comment:10 Changed at 2017-08-10T19:43:34Z by exarkun
I've encountered this now as well.
The comment about starting with a new SQLite3 lease database is appreciated - that agrees with my situation and supports my hypothesis about what's going wrong here.
My understanding of the shares table of the lease database is that it is populated in one of two ways:
- A new share is created.
- The accounting crawler finds a share not in the lease database.
The code is all meant to tolerate lose of the lease database, too.
When _locked_testv_and_readv_and_writev is done applying writes, it attempts to add_or_renew_default_lease for the share (via the account). This eventually calls down to add_or_renew_leases on the lease database. add_or_renew_leases requires that the share being leased already exists in the shares table.
So, if the lease database is ever lost, there is a period while the accounting crawler is re-populating the shares table when it is not possible to write to mutable files.
Because the problem depends on both the implementation of mutable storage, the accounting system, the lease database, and on an external event causing the loss of the lease database, it's not clear to me what testing approach will work.
Also, the implementation of _locked_testv_and_readv_and_writev is sufficiently complex that the idea of touching it at all is pretty scary. It seems like a reasonable fix would be to cause it to _always_ account.add_share before _update_lease (rather than the current behavior of only account.add_shareing if shnum not in sharemap).
comment:11 Changed at 2017-08-15T13:04:53Z by exarkun
- Resolution set to fixed
- Status changed from assigned to closed
Fixed in the #2237 branch(es):
I started with a new SQLite.db but get the same result.