[tahoe-dev] [tahoe-lafs] #1154: mplayer triggers two bugs in Tahoe's new downloader

Thu Aug 5 17:40:53 UTC 2010

#1154: mplayer triggers two bugs in Tahoe's new downloader
------------------------------+---------------------------------------------
     Reporter:  francois      |       Owner:  warner                                 
         Type:  defect        |      Status:  assigned                               
     Priority:  critical      |   Milestone:  1.8.0                                  
    Component:  code-network  |     Version:  1.8β                                   
   Resolution:                |    Keywords:  download regression random-access error
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by warner):

 Oh, nevermind, I think I figured it out. There's actually three bugs
 overlapping here:

  1. the {{{Spans/DataSpans}}} classes used {{{__len__}}} methods that
 returned {{{long}}}s instead of {{{int}}}s, causing an exception during
 download. (my [4664] fix was incorrect: it turns out that
 {{{__nonzero__}}} is not allowed to return a {{{long}}} either).
  1. there is a lost-progress bug in {{{DownloadNode}}}, where a failure in
 one segment-fetch will cause all other pending segment-fetches to hang
 forever
  1. a {{{stopProducing}}} that occurs during this hang-forever period
 causes an exception, because there is no active segment-fetch in place

 The bug1 fix is easy: replace {{{self.__len__}}} with {{{self.len}}} and
 make {{{__nonzero__}}} always return a {{{bool}}}. The bug3 fix is also
 easy: {{{DownloadNode._cancel_request}}} should tolerate
 {{{self._active_segment}}} being {{{None}}}.

 The bug2 fix is not trivial but not hard. The start-next-fetch code in
 {{{DownloadNode}}} should be factored out, and
 {{{DownloadNode.fetch_failed}}} code should invoke it after sending
 errbacks to the requests which failed. This will add a nice property: if
 you get unrecoverable bit errors in one segment, you might still be able
 to get valid data from other segments (as opposed to giving up on the
 whole file because of a single error). I think there are some other
 changes that must be made to really get this property, though.. when we
 get to the point where we sort shares by "goodness", we'll probably clean
 this up. The basic idea will be that shares with errors go to the bottom
 of the list but are not removed from it entirely: if we really can't find
 the data we need somewhere else, we'll give the known-corrupted share a
 try, in the hopes that there are some uncorrupted parts of the share.

 I've got a series of test cases to exercise these three bugs.. I just have
 to build them in the right order to make sure that I'm not fixing the
 wrong one first (and thus hiding one of the others from my test).

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1154#comment:10>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage