Opened at 2011-09-07T04:35:33Z
Last modified at 2012-03-12T19:08:45Z
#1529 new defect
corrupted filesize in CHK filecap causes unexpected "bad hash error"
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-encoding | Version: | 1.9.0a1 |
Keywords: | error download immutable | Cc: | |
Launchpad Bug: |
Description
IRC user "joepie91" noticed that changing a filecap like:
URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29680
to
URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29681
(by changing the last character, raising the filesize by one) causes the following exception to be raised:
2011-09-06 21:24:27-0700 [-] Unhandled Error Traceback (most recent call last): File "/Users/warner2/stuff/tahoe/tahoe-git-hanford/src/allmydata/immutable/downloader/node.py", line 492, in _check_ciphertext_hash raise BadCiphertextHashError(msg) allmydata.immutable.downloader.common.BadCiphertextHashError: hash failure in ciphertext_hash_tree: segnum=0, SI=lfkjg4jkgzee
I was able to duplicate it locally against current trunk.
It looks like 1: we're relying upon something in the filecap that should merely be a hint, and 2: we're throwing a particularly weird error message. A hash mismatch in the ciphertext_hash_tree either indicates a failure in zfec decoding, or a deliberately malformed file (in which the original ciphertext_hash_tree was computed over the wrong data, then included in the UEB). So my guess is that we're computing the predicted segment size from the filecap hint, but then not updating it when the shares' UEBs tell us that the hint was wrong, and trying to decode with the wrong data.
The test case for this should be pretty straightforward: upload a file with a couple of segments, produce a mangled filecap, attempt a download. The download ought to succeed. If we want to be picky about the filecap being the source of truth, then we're allowed to fail, but we need a better error message than BadHashError (maybe something about "corrupt filecap" or "file size mismatch"). But I think success is arguably more correct. (note: the same argument may apply to mismatches in k and N).
Change History (6)
comment:1 Changed at 2011-09-07T04:56:35Z by zooko
comment:2 Changed at 2011-09-07T05:49:18Z by zooko
I reproduced this by uploading a file to Test Grid and downloading it and then changing the file size and re-downloading it. Then I ran the verifier on it and got an informative error message:
$ tahoe put /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424 $ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424 > testdl $ sha256sum testdl /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf 4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be testdl 4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat?.pdf $ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 > testdl Error during GET: 410 Gone "NotEnoughSharesError: This indicates that some servers were unavailable, or that shares have been lost to server departure, hard drive failure, or disk corruption. You should perform a filecheck on this object to learn more.\x0a\x0aThe full error message is:\x0aran out of shares: complete= pending=Share(sh4-on-sroojqcx) overdue= unused= need 3. Last failure: None" $ tahoe check URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 Summary: Healthy storage index: krwpuniahfvg3oivybopc3h2eu good-shares: 5 (encoding is 3-of-5) wrong-shares: 0 $ tahoe check --verify URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 ERROR: 500 Internal Server Error "Traceback (most recent call last):\x0a File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 674, in _done\x0a self.request.complete(res)\x0a File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 60, in complete\x0a self.deferred.callback(res)\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 361, in callback\x0a self._startRunCallbacks(result)\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 455, in _startRunCallbacks\x0a self._runCallbacks()\x0a--- <exception caught here> ---\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 542, in _runCallbacks\x0a current.result = callback(current.result, *args, **kw)\x0a File \"/usr/local/lib/python2.7/dist-packages/allmydata/immutable/checker.py\", line 154, in _parse_and_validate\x0a self.segment_size, self._verifycap.needed_shares))\x0aallmydata.immutable.checker.BadURIExtension: inconsistent erasure code params: utcpss: 38424 != self.tail_segment_size: 3, self._verifycap.size: 38425, self.segment_size: 38424, self._verifycap.needed_shares: 3\x0a"
The informative part on the end there says:
allmydata.immutable.checker.BadURIExtension: inconsistent erasure code params: utcpss: 38424 != self.tail_segment_size: 3, self._verifycap.size: 38425, self.segment_size: 38424, self._verifycap.needed_shares: 3
comment:3 follow-up: ↓ 4 Changed at 2011-09-07T06:38:21Z by warner
After discussing it with Zooko, I agree with his description: the filecap is authoritative/canonical. If we fetch a share that contains a UEB with a filesize that disagrees with the filecap, we should treat that share as invalid, and throw a useful error (FilecapFilesizeMismatchError?, although BadURIExtension is even better). The filecap's filesize is not just a hint.
The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw FilecapFilesizeMismatchError, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place.
Note that the root issue here is that the filesize is redundant: the file can be completely specified by just the readkey and the UEB hash. But it's awfully handy to include the filesize, k, and N in there, both to speed up downloads (parallelizing an appropriate number of DYHB queries) and to let deep-size run without actually fetching file data. Ideally, filecaps would only contain the minimal set of data necessary to retrieve the file. Having extra data there means we have to specify what happens when the extra data in the filecap disagrees with the data in a validated share, which means doing more work during download that can only possibly cause a failure.
comment:4 in reply to: ↑ 3 Changed at 2012-03-12T19:08:08Z by davidsarah
- Keywords error added
Replying to warner:
The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw FilecapFilesizeMismatchError, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place.
+1.
comment:5 Changed at 2012-03-12T19:08:19Z by davidsarah
- Keywords download added
comment:6 Changed at 2012-03-12T19:08:45Z by davidsarah
- Keywords immutable added
Brian: I think we *are* supposed to rely on that number in the cap--it is canonical and not just a hint.
Although actually now that I think about it, that was one of those things that I "tightened" and that you, I think, changed back to some degree when you write Brian's New Downloader.
IIRC my "tightened" interpretation of all those fields is still intact in the verifier but not in your new downloader. Might be interesting to see what the verifier says about that cap.