Opened at 2010-01-13T18:44:55Z
Closed at 2010-02-15T19:38:43Z
#899 closed defect (fixed)
UncoordinatedWriteError on prod grid
Reported by: | zooko | Owned by: | kmarkley86 |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-mutable | Version: | 1.5.0 |
Keywords: | availability reliability upload | Cc: | kmarkley86 |
Launchpad Bug: |
Description
Kyle Markley reported this on the tahoe-dev list:
http://allmydata.org/pipermail/tahoe-dev/2010-January/003554.html
It could be related to #540, #877, or #893.
I'll ask Kyle to supply more diagnostic info on this ticket.
Attachments (2)
Change History (12)
comment:1 Changed at 2010-01-13T18:45:13Z by zooko
- Owner set to kmarkley86
comment:2 Changed at 2010-01-13T19:41:35Z by davidsarah
- Keywords upload added
Changed at 2010-01-14T06:32:41Z by kmarkley86
comment:3 Changed at 2010-01-14T06:34:19Z by kmarkley86
- Cc kmarkley86 added
allmydata-tahoe: 1.5.0, foolscap: 0.4.2, pycryptopp: 0.5.17, zfec: 1.4.5, Twisted: 8.2.0, Nevow: 0.9.33-r17222, zope.interface: 3.5.2, python: 2.6.2, platform: OpenBSD-4.6-amd64-Genuine_Intel-R-_CPU_000_@_2.93GHz-64bit-ELF, sqlite: 3.6.13, simplejson: 2.0.9, argparse: 0.9.1, pyOpenSSL: 0.9, pyutil: 1.3.34, zbase32: 1.1.1, setuptools: 0.6c12dev, pysqlite: 2.4.1
Mutable File Publish Status
- Started: 00:04:12 13-Jan-2010
- Storage Index: mcw73tlgpejftxf55c5bjmiczi
- Helper?: No
- Current Size: 470
- Progress: 20.0%
- Status: UncoordinatedWriteError?
Retrieve Results
- Encoding: 3 of 10
- Sharemap:
o 0 -> Placed on [ehnfmjtc] o 4 -> Placed on [5q4fx2pb] o 5 -> Placed on [ctchgzgn]
- Timings:
o Total: 1.24s (380Bps)
+ Setup: 581us + Encrypting: 37us (12.40MBps) + Encoding: 55us (8.53MBps) + Packing Shares: 9.0ms (52.1kBps)
# RSA Signature: 8.0ms
+ Pushing: 1.23s (383Bps)
o Per-Server Response Times:
+ [ctchgzgn]: 77ms + [ehnfmjtc]: 67ms + [fjsasmll]: 1.18s + [gi3daw4h]: 1.12s + [xc3w2uzy]: 1.19s + [5q4fx2pb]: 1.18s + [6m245fmk]: 103ms
comment:4 Changed at 2010-01-14T15:23:14Z by zooko
Andrej Falout couldn't attach his incident reports to this ticket because trac doesn't let you upload attachments larger than 1,000,000 bytes. I bunzip2'ed them and 7z'ed them and they came out half as big, so here they are.
Changed at 2010-01-14T15:23:53Z by zooko
comment:5 Changed at 2010-01-14T15:24:40Z by zooko
Oh, and I reconfigured trac to allow attachments of up to 10 MB.
comment:6 Changed at 2010-01-17T01:25:39Z by kmarkley86
I'm continuing to hit this UncoordinatedWriteError? very frequently on the production grid. I think it happens most often when creating directories. I can provide lots of additional incident reports if that would be useful.
This has made it almost impossible for me to run a 'tahoe backup' command to the production grid; should the priority of this ticket be raised?
comment:7 Changed at 2010-01-17T04:20:15Z by zooko
allmydata.com is continuing to repair servers and configuration issues on the allmydata.com prod grid, so that might be the way that your problem gets solved. However, at the very least your Tahoe-LAFS client is reporting something with a wrong error message. It may also be buggy in some way that leads to this problem.
One thing that you could do that would help is to try the same thing with a newer version of Tahoe-LAFS. Could you try installing the latest version http://allmydata.org/source/tahoe/tarballs/?C=M;O=D , per these install instructions: http://allmydata.org/source/tahoe/trunk/docs/install.html ?
comment:8 Changed at 2010-01-17T17:23:08Z by kmarkley86
I haven't seen one of these errors since upgrading from tahoe 1.5.0 to 1.5.0-r4160. Between that and general repair of the grid, the problem has gone away for me.
comment:9 Changed at 2010-01-26T20:00:54Z by warner
I glanced through a couple of these Incidents, and all the ones I looked at were that artifact that we fixed in which DeadReferenceError is logged too severely by accident (the one where the ServerFailure that wrapped the DeadReferenceError, preventing the errback code from identifying it as a DeadReferenceError). This got fixed with the overhaul of the add-lease code.
comment:10 Changed at 2010-02-15T19:38:43Z by davidsarah
- Resolution set to fixed
- Status changed from new to closed
UncoordinatedWriteError? log