#2251 closed defect (wontfix)

Database full or disk full error, stops backups

Reported by: CyberAxe Owned by: daira
Priority: normal Milestone: undecided
Component: code-storage Version: 1.10.0
Keywords: database full space logging Cc: zooko
Launchpad Bug:

Description (last modified by daira)

http://sebsauvage.net/paste/?86d37574a47fd008#Xcj23YbPlxSMZ63yIVfune6X+kxkHQSCQ9Hm7kJmqkI=

[...]
C:\allmydata-tahoe-1.10.0\bin>Backup-Now

C:\allmydata-tahoe-1.10.0\bin>ECHO OFF
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 211, in process
    newdircap = mkdir(create_contents, self.options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 47, in mkdir
    raise HTTPError("Error during mkdir", resp)
HTTPError: Error during mkdir: 500 Internal Server Error
"Traceback (most recent call last):
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 753, in receiveClose
    self.request.fail(f)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 95, in fail
    self.deferred.errback(why)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 42
2, in errback
    self._startRunCallbacks(fail)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 48
9, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 57
6, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 604, in _got_response
    return self._loop()
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 516, in _loop
    return self._failed(msg)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 617, in _failed
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload 66ww6>: shares could be placed or
found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the f
ile. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to
recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being f
ull and 1 placed none due to an error)) (last failure (from <ServerTracker for server xfia4afr and SI 66ww6>) was: [Failure instance: Traceb
ack (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from re
mote host -- Traceback (most recent call last):
  File \"/home/customer/LAFS_source/src/allmydata/storage/server.py\", line 258, in <lambda>
    canary))
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in make_bucket_writer
    d.addCallback(lambda ign: BucketWriter(account, immsh, canary))
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 306, in addCallback
    callbackKeywords=kw)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 295, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in <lambda>
    d.addCallback(lambda ign: BucketWriter(account, immsh, canary))
  File \"/home/customer/LAFS_source/src/allmydata/storage/bucket.py\", line 28, in __init__
    share.get_allocated_data_length(), SHARETYPE_IMMUTABLE)
  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 54, in add_share
    self._leasedb.add_new_share(storage_index, shnum, used_space, sharetype)
  File \"/home/customer/LAFS_source/src/allmydata/storage/leasedb.py\", line 176, in add_new_share
    (si_s, shnum, prefix, backend_key, used_space, sharetype, STATE_COMING))
sqlite3.OperationalError: database or disk is full
]'>
])
"

This was trying to backup 1 very small test file. I have 38GB of space locally. On windows

Change History (7)

comment:1 Changed at 2014-06-30T19:48:09Z by zooko

This is probably related to #2254 -- it is happening on the same server as #2254 is and both error messages have to do with the server's filesystem being full.

comment:2 Changed at 2014-06-30T22:07:43Z by Zancas

After the issue was reported to me I began investigating the state of CyberAxe's server. My investigation began at ~ 2014-06-27T19:45Z.

I ssh'd in to the server and ran:

Last login: Mon Apr  7 21:32:11 2014 from 184-96-237-182.hlrn.qwest.net
 ubuntu@ip-10-185-214-61:~$ df
 Filesystem     1K-blocks    Used Available Use% Mounted on
 /dev/xvda1       8256952 7838464         0 100% /
 udev              299036       8    299028   1% /dev
 tmpfs              60948     160     60788   1% /run
 none                5120       0      5120   0% /run/lock
 none              304720       0    304720   0% /run/shm

Here's how big the storageserver/storage db's were:

customer@ip-10-185-214-61:~/storageserver/storage$ ls -l ./*
-rw------- 1 customer customer        0 May  1 09:05 ./accounting_crawler.state
-rw------- 1 customer customer    16384 Apr 30 14:27 ./bucket_counter.state
-rw-r--r-- 1 customer customer 21267456 Apr 14 05:16 ./leasedb.sqlite
-rw-r--r-- 1 customer customer    32768 May 26 08:46 ./leasedb.sqlite-shm
-rw-r--r-- 1 customer customer  1246104 May 25 08:55 ./leasedb.sqlite-wal

Here's what I pasted into IRC as the result of an experiment using 'du':

<zancas> customer@ip-10-185-214-61:~$ du -sh {storageserver,introducer}/logs
                                                                        [14:01]
<zancas> 6.1G    storageserver/logs
<zancas> 20K     introducer/logs
<zancas> 
<zooko> aha
<zancas> customer@ip-10-185-214-61:~/storageserver/logs$ pwd
<zancas> /home/customer/storageserver/logs
<zancas> customer@ip-10-185-214-61:~/storageserver/logs$ ls -l twistd.log.999
<zancas> -rw------- 1 customer customer 1000057 Apr 26 20:43 twistd.log.999

I bzipp2'd and tape-archived the log files, and scp'd them to my local dev machine.

Here's a 'df' I ran on Saturday, before informing CyberAxe that we had a fix to test:

customer@ip-10-185-214-61:~/storageserver/logs$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.9G  1.5G  6.1G  19% /
udev            293M  8.0K  293M   1% /dev
tmpfs            60M  160K   60M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            298M     0  298M   0% /run/shm

Here's the result of a df run at about 2014-06-30T22:0930Z:

customer@ip-10-185-214-61:~/storageserver/logs$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.9G  1.5G  6.1G  20% /
udev            293M  8.0K  293M   1% /dev
tmpfs            60M  160K   60M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            298M     0  298M   0% /run/shm
customer@ip-10-185-214-61:~/storageserver/logs$ 

If I Understand Correctly the second df run above occurred *after* CyberAxe attempted another backup, and it failed in ticket:2254.

comment:3 Changed at 2014-07-11T17:26:50Z by CyberAxe

I tested this and can now backup/restore files after Za cleaned out the servers log files.

comment:4 Changed at 2014-07-24T00:25:47Z by daira

  • Cc daira removed
  • Component changed from unknown to code-frontend-cli
  • Description modified (diff)
  • Keywords file removed

comment:5 Changed at 2014-07-24T00:26:46Z by daira

  • Component changed from code-frontend-cli to code-storage
  • Keywords logging added

comment:6 Changed at 2014-07-24T00:29:06Z by daira

The underlying problem is #1927.

comment:7 Changed at 2020-10-30T12:35:44Z by exarkun

  • Resolution set to wontfix
  • Status changed from new to closed

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.

If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).

If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.

Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.

Note: See TracTickets for help on using tickets.