#1223 closed defect

got 'WrongSegmentError' during repair — at Version 5

Reported by: francois Owned by: somebody
Priority: major Milestone: 1.8.1
Component: code-encoding Version: 1.8.0
Keywords: regression repair performance news-needed Cc: francois@…
Launchpad Bug:

Description (last modified by francois)

As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair.

Encoding parameters of this file were M=66 and K=22.

The complete debug log as extracted by 'flogtool' is attached to this ticket.

$ tahoe check --repair URI:CHK:XXXXX
ERROR: 500 Internal Server Error
Traceback (most recent call last):
  File \"/usr/lib/pymodules/python2.6/foolscap/eventual.py\", line 26, in _turn
    cb(*args, **kwargs)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/node.py\", line 472, in _deliver
    d.callback(result) # might actually be an errback
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 280, in callback
    self._startRunCallbacks(result)
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 354, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/lib/python2.6/dist-packages/twisted/internet/defer.py\", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File \"/home/francois/dev/tahoe-upstream/src/allmydata/immutable/downloader/segmentation.py\", line 116, in _got_segment
    raise WrongSegmentError(\"I was given the wrong data.\")
allmydata.immutable.downloader.common.WrongSegmentError: I was given the wrong data.

Change History (6)

Changed at 2010-10-07T21:20:02Z by francois

comment:1 Changed at 2010-10-07T21:43:54Z by warner

  • Description modified (diff)

(reformatted the 'tahoe check' output a bit for easier display)

comment:2 Changed at 2010-10-07T21:44:37Z by warner

Francois notes that the filesize was 135 bytes.

comment:3 Changed at 2010-10-07T22:10:22Z by warner

gleaned so far: the file has one segment. The repairer starts with a get_segsize(), which is currently lazily-implemented as get_segment(0). Log messages up through 2864211 are the get_segment(0), at which point the upload process starts, and spends through 2864212 performing upload-share-placement.

The weird bit starts on message 2864212, where the repairer performs a 7-byte read. It's as if the repairer is confused about the segment size (or the repairer's uploader is confused about what a good chunksize should be), and does a bunch of tiny reads instead of one whole segment. That's the first problem, but it's merely a performance issue, not fatal.

The fatal problem is some sort of fencepost error. Grepping for "Segmentation got data" shows a series of 7-byte reads that ends badly (remembering that this is a 135-byte file):

22:47:17.975 L20 []#2864526 Segmentation got data: want [0-7), given [0-135), for segnum=0
22:47:18.088 L20 []#2864841 Segmentation got data: want [7-14), given [0-135), for segnum=0
...
22:47:30.757 L20 []#2869881 Segmentation got data: want [119-126), given [0-135), for segnum=0
22:47:31.694 L20 []#2870196 Segmentation got data: want [126-133), given [0-135), for segnum=0
22:47:32.807 L20 []#2870511 Segmentation got data: want [133-135), given [0-135), for segnum=0
22:47:32.953 L20 []#2870826 Segmentation got data: want [140-135), given [0-135), for segnum=0

The [133-135) should have been the last read, but for some reason it went further and did that bogus [140-135) read. The "140" offset is beyond the end of the file, and of course having a negative size is also a problem.

comment:4 Changed at 2010-10-07T22:17:04Z by francois

  • Keywords regression repair added
  • Milestone changed from undecided to 1.8.1

The same file repair worked perfectly well with 1.7.1.

comment:5 Changed at 2010-10-07T22:20:50Z by francois

  • Description modified (diff)
Note: See TracTickets for help on using tickets.