Opened at 2014-06-27T18:59:33Z
Closed at 2020-10-30T12:35:44Z
#2251 closed defect (wontfix)
Database full or disk full error, stops backups
Reported by: | CyberAxe | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-storage | Version: | 1.10.0 |
Keywords: | database full space logging | Cc: | zooko |
Launchpad Bug: |
Description (last modified by daira)
http://sebsauvage.net/paste/?86d37574a47fd008#Xcj23YbPlxSMZ63yIVfune6X+kxkHQSCQ9Hm7kJmqkI=
[...] C:\allmydata-tahoe-1.10.0\bin>Backup-Now C:\allmydata-tahoe-1.10.0\bin>ECHO OFF Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 211, in process newdircap = mkdir(create_contents, self.options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 47, in mkdir raise HTTPError("Error during mkdir", resp) HTTPError: Error during mkdir: 500 Internal Server Error "Traceback (most recent call last): File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 753, in receiveClose self.request.fail(f) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 95, in fail self.deferred.errback(why) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 42 2, in errback self._startRunCallbacks(fail) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 48 9, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 57 6, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 604, in _got_response return self._loop() File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 516, in _loop return self._failed(msg) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 617, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload 66ww6>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the f ile. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being f ull and 1 placed none due to an error)) (last failure (from <ServerTracker for server xfia4afr and SI 66ww6>) was: [Failure instance: Traceb ack (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from re mote host -- Traceback (most recent call last): File \"/home/customer/LAFS_source/src/allmydata/storage/server.py\", line 258, in <lambda> canary)) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in make_bucket_writer d.addCallback(lambda ign: BucketWriter(account, immsh, canary)) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 306, in addCallback callbackKeywords=kw) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 295, in addCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in <lambda> d.addCallback(lambda ign: BucketWriter(account, immsh, canary)) File \"/home/customer/LAFS_source/src/allmydata/storage/bucket.py\", line 28, in __init__ share.get_allocated_data_length(), SHARETYPE_IMMUTABLE) File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 54, in add_share self._leasedb.add_new_share(storage_index, shnum, used_space, sharetype) File \"/home/customer/LAFS_source/src/allmydata/storage/leasedb.py\", line 176, in add_new_share (si_s, shnum, prefix, backend_key, used_space, sharetype, STATE_COMING)) sqlite3.OperationalError: database or disk is full ]'> ]) "
This was trying to backup 1 very small test file. I have 38GB of space locally. On windows
Change History (7)
comment:1 Changed at 2014-06-30T19:48:09Z by zooko
comment:2 Changed at 2014-06-30T22:07:43Z by Zancas
After the issue was reported to me I began investigating the state of CyberAxe's server. My investigation began at ~ 2014-06-27T19:45Z.
I ssh'd in to the server and ran:
Last login: Mon Apr 7 21:32:11 2014 from 184-96-237-182.hlrn.qwest.net ubuntu@ip-10-185-214-61:~$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 8256952 7838464 0 100% / udev 299036 8 299028 1% /dev tmpfs 60948 160 60788 1% /run none 5120 0 5120 0% /run/lock none 304720 0 304720 0% /run/shm
Here's how big the storageserver/storage db's were:
customer@ip-10-185-214-61:~/storageserver/storage$ ls -l ./* -rw------- 1 customer customer 0 May 1 09:05 ./accounting_crawler.state -rw------- 1 customer customer 16384 Apr 30 14:27 ./bucket_counter.state -rw-r--r-- 1 customer customer 21267456 Apr 14 05:16 ./leasedb.sqlite -rw-r--r-- 1 customer customer 32768 May 26 08:46 ./leasedb.sqlite-shm -rw-r--r-- 1 customer customer 1246104 May 25 08:55 ./leasedb.sqlite-wal
Here's what I pasted into IRC as the result of an experiment using 'du':
<zancas> customer@ip-10-185-214-61:~$ du -sh {storageserver,introducer}/logs [14:01] <zancas> 6.1G storageserver/logs <zancas> 20K introducer/logs <zancas> <zooko> aha <zancas> customer@ip-10-185-214-61:~/storageserver/logs$ pwd <zancas> /home/customer/storageserver/logs <zancas> customer@ip-10-185-214-61:~/storageserver/logs$ ls -l twistd.log.999 <zancas> -rw------- 1 customer customer 1000057 Apr 26 20:43 twistd.log.999
I bzipp2'd and tape-archived the log files, and scp'd them to my local dev machine.
Here's a 'df' I ran on Saturday, before informing CyberAxe that we had a fix to test:
customer@ip-10-185-214-61:~/storageserver/logs$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.5G 6.1G 19% / udev 293M 8.0K 293M 1% /dev tmpfs 60M 160K 60M 1% /run none 5.0M 0 5.0M 0% /run/lock none 298M 0 298M 0% /run/shm
Here's the result of a df run at about 2014-06-30T22:0930Z:
customer@ip-10-185-214-61:~/storageserver/logs$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.5G 6.1G 20% / udev 293M 8.0K 293M 1% /dev tmpfs 60M 160K 60M 1% /run none 5.0M 0 5.0M 0% /run/lock none 298M 0 298M 0% /run/shm customer@ip-10-185-214-61:~/storageserver/logs$
If I Understand Correctly the second df run above occurred *after* CyberAxe attempted another backup, and it failed in ticket:2254.
comment:3 Changed at 2014-07-11T17:26:50Z by CyberAxe
I tested this and can now backup/restore files after Za cleaned out the servers log files.
comment:4 Changed at 2014-07-24T00:25:47Z by daira
- Cc daira removed
- Component changed from unknown to code-frontend-cli
- Description modified (diff)
- Keywords file removed
comment:5 Changed at 2014-07-24T00:26:46Z by daira
- Component changed from code-frontend-cli to code-storage
- Keywords logging added
comment:6 Changed at 2014-07-24T00:29:06Z by daira
The underlying problem is #1927.
comment:7 Changed at 2020-10-30T12:35:44Z by exarkun
- Resolution set to wontfix
- Status changed from new to closed
The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.
If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).
If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.
Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.
This is probably related to #2254 -- it is happening on the same server as #2254 is and both error messages have to do with the server's filesystem being full.