Opened at 2012-11-18T06:42:54Z
Last modified at 2012-11-18T07:06:26Z
#1865 new defect
backup has over-general exception swallowing that leads to assertion failures.
Reported by: | nejucomo | Owned by: | davidsarah |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | unknown | Version: | 1.9.2 |
Keywords: | error tahoe-backup | Cc: | |
Launchpad Bug: |
Description
Symptom
I had a long-running tahoe backup command which I suspended. Then I started another tahoe backup and saw this traceback:
Traceback (most recent call last): File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module> load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner rc = cli.dispatch[command](so) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup rc = tahoe_backup.backup(options) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload bdb_results.did_upload(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload self.mtime, self.ctime, self.size) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 276, in did_upload_file fileid = self.get_or_allocate_fileid_for_cap(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 270, in get_or_allocate_fileid_for_cap assert foundrow AssertionError
Diagnosis:
After examining the code with much help from Zooko, I tweaked the installed source like this:
--- ./allmydata-tahoe-1.9.2/src/allmydata/scripts/backupdb.py 2012-05-14 03:07:24.000000000 -0700 +++ /home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py 2012-11-17 22:15:15.000000000 -0800 @@ -261,7 +261,8 @@ c = self.cursor try: c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,)) - except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError): + except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError), e: + print 'HACKY DEBUG HACK: %r %r' % (e, e.args) # sqlite3 on sid gives IntegrityError # pysqlite2 on dapper gives OperationalError pass
I reran the backup command to see this:
HACKY DEBUG HACK: OperationalError('attempt to write a readonly database',) ('attempt to write a readonly database',) Traceback (most recent call last): File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module> load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner rc = cli.dispatch[command](so) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup rc = tahoe_backup.backup(options) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload bdb_results.did_upload(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload self.mtime, self.ctime, self.size) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 277, in did_upload_file fileid = self.get_or_allocate_fileid_for_cap(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 271, in get_or_allocate_fileid_for_cap assert foundrow AssertionError
backupdb's get_or_allocate_fileid_for_cap() is swallowing exceptions with the intent of allowing key collisions to implement the semantics of: "Insert, but if it's already there, that's ok." However, it is actually swallowing a more general set of exceptional conditions. In this case, a lock is held by the suspended backup process so the new process times out.
Resolution:
Zooko suggests this fix:
- Ensure the sql commands are in a transaction (See caveat below)
- query for the cap to get the fileid
- if it is not found:
- insert it without swallowing exceptions
- query for the cap to get the fileid
Caveat:
The transaction scope of the db backend is important. Zooko did some quick searching to determine sqlite3 is table-scoped which should be safe, but in order to support other databases more research needs to be done to determine if this approach is feasible.
Here's a patch which I think will do the right thing while not catching exceptions that it shouldn't catch. This is safe in sqlite, because sqlite always locks whole tables. For databases with finer-grained transactions, it would be necessary to use isolation level "serializable" or stricter for this to be safe.