Opened at 2008-06-03T04:49:39Z
Closed at 2012-05-13T07:50:58Z
#443 closed enhancement (fixed)
set ETag on immutable directories, short-circuit on cache hit
Reported by: | warner | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | 1.10.0 |
Component: | code-frontend-web | Version: | 1.0.0 |
Keywords: | etag performance news-needed | Cc: | jeremy@…, zooko |
Launchpad Bug: |
Description
When retrieving an immutable file, the ETag header should be set to the file's URI. When retrieving a mutable file, it should be set to the current roothash value. When retrieving a directory, it should also be set to the roothash value.
The server should also use the right sort of If-Modified-Since dance, using the ETag to let browser caches do the right thing.
Attachments (2)
Change History (29)
comment:1 Changed at 2008-11-06T19:43:22Z by warner
comment:2 Changed at 2010-01-16T00:22:01Z by davidsarah
- Keywords etag performance test added
#844 references http://allmydata.org/pipermail/tahoe-dev/2009-November/003221.html which claims that the immutable-file ETag support wasn't doing the right thing for (at least) Squid to recognize immutable files as unchanged. Let's make it the goal of this ticket to make sure that we have this support works properly for a representative sample of web proxies and browsers, and open another ticket to add ETag support for mutable files.
comment:3 Changed at 2010-01-16T00:22:51Z by davidsarah
- Summary changed from webapi should provide an ETag to Check that webapi support for ETags on immutable files works properly
comment:4 follow-up: ↓ 5 Changed at 2010-02-27T06:34:22Z by zooko
Can we add into this file to also check that it does the right thing for immutable directories? Or is that sufficiently different that it should be a separate ticket?
comment:5 in reply to: ↑ 4 Changed at 2010-02-27T08:31:08Z by davidsarah
- Milestone changed from undecided to 1.7.0
- Summary changed from Check that webapi support for ETags on immutable files works properly to Check that webapi support for ETags on immutable files/directories works properly
Replying to zooko:
Can we add into this file to also check that it does the right thing for immutable directories?
We can indeed.
comment:6 Changed at 2010-03-11T18:16:46Z by jsgf
- Cc jeremy@… added
- Owner set to jsgf
I'll have a look at this, since I'm mucking around in that area anyway (at least for immutable; I'll need someone to explain the situation with mutable files to me).
comment:7 Changed at 2010-03-11T23:12:11Z by jsgf
For immutable files:
Basic If-Match appears to be completely ignored; it always returns 200 rather than 412 (precondition failed), regardless of the provided etag.
If-None-Match will return "304 not modified" and no body, as it is supposed.
This is consistent with twisted web's server, which only checks for if-none-match. I'm not sure quite how the body is getting suppressed. We should be checking the result of setETag() to see if we have a etag hit or not; it may be its doing too much work.
Will check immutable dirs next.
comment:8 Changed at 2010-06-16T04:01:19Z by davidsarah
- Keywords review-needed added
- Milestone changed from 1.7.0 to 1.7.1
comment:9 Changed at 2010-07-08T05:16:13Z by zooko
comment:10 Changed at 2010-07-08T08:42:51Z by jsgf
I think if-match is for PUT/POST operations to make sure the entity at the specified URI is still in the expected state.
I'm not sure why squid might have problems with Tahoe. The posting referred to in comment:2 is pretty vague about what might be going wrong; it would be nice to see some actual analysis (or even just an HTTP protocol dump).
comment:11 Changed at 2010-07-08T13:40:00Z by zooko
- Owner changed from jsgf to zooko
- Status changed from new to assigned
- Summary changed from Check that webapi support for ETags on immutable files/directories works properly to set ETag on immutable directories, short-circuit on cache hit
comment:12 Changed at 2010-07-08T14:24:53Z by zooko
- Cc zooko added
- Keywords review-needed removed
- Owner changed from zooko to jsgf
- Status changed from assigned to new
Well, in handling a filenode these patches say:
+ if si and req.setETag(base32.b2a(si)):
but in handling a dirnode these patches say:
+ if si and req.setETag('DIR:%s-%s' % (base32.b2a(si), t or "")):
Shouldn't the filenode also have t appended to its ETag so that people who request somefile?t=json don't get the same thing as the previous person who requested somefile? jsgf: please reply to this and, if appropriate, re-add the "review-needed" flag on this ticket.
comment:13 Changed at 2010-07-08T18:41:50Z by jsgf
Yes, that looks pretty dubious. The ETag should be computed by a single function everywhere, rather than redone in a bespoke fashion at each point. But let me double-check that there isn't something else going on there...
comment:14 Changed at 2010-07-08T18:43:48Z by jsgf
No, I see. The first one is for plain immutable files which only have one representation - their actual binary contents. The second is for directories, which are interpreted by Tahoe itself and can have multiple representations. If you access the directory as a just the raw immutable file then you'll end up with an ETag of the first form.
comment:15 Changed at 2010-07-08T23:54:52Z by zooko
But won't the current patch make the following thing go wrong, causing the second request to cache-hit when instead the second request needs to returning something entirely different.
(If I'm right, then this probably goes to show that our docs/frontends/webapi.txt needs to be reorganized so that people like jsgf can easily find this stuff.)
$ wget http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124 --2010-07-08 17:39:41-- http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124 Connecting to 127.0.0.1:3456... connected. HTTP request sent, awaiting response... 200 OK Length: 124 [text/plain] Saving to: `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124.1' 100%[================================================================================================================================================================>] 124 --.-K/s in 0s 2010-07-08 17:39:42 (7.88 MB/s) - `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124.1' saved [124/124] $ hexdump -C URI\:CHK\:u7xfz46l7lpwaxw6yjeqnotqk4\:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za\:3\:10\:124 00000000 47 49 46 38 39 61 0a 00 3c 00 b3 ff 00 c0 c0 c0 |GIF89a..<.......| 00000010 cb cb e5 b1 b1 d8 f8 f8 fb 85 85 c2 e9 e9 f4 b5 |................| 00000020 b5 da dd dd ee ff ff ff 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 21 f9 04 |.............!..| 00000040 01 00 00 00 00 2c 00 00 00 00 0a 00 3c 00 00 04 |.....,......<...| 00000050 29 10 c9 59 0c 00 13 0d 72 ef 3c 5d 27 09 a1 18 |)..Y....r.<]'...| 00000060 94 22 da 9d 6a eb be 70 2c cf 74 6d df 78 ae ef |."..j..p,.tm.x..| 00000070 7c ef ff c0 a0 70 48 2c 12 23 00 3b ||....pH,.#.;| 0000007c $ wget http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json --2010-07-08 17:39:51-- http://127.0.0.1:3456/uri/URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json Connecting to 127.0.0.1:3456... connected. HTTP request sent, awaiting response... 200 OK Length: 298 [text/plain] Saving to: `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json.1' 100%[================================================================================================================================================================>] 298 --.-K/s in 0s 2010-07-08 17:39:51 (18.9 MB/s) - `URI:CHK:u7xfz46l7lpwaxw6yjeqnotqk4:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za:3:10:124?t=json.1' saved [298/298] $ hexdump -C URI\:CHK\:u7xfz46l7lpwaxw6yjeqnotqk4\:i55qdey7cwaqbfurafx73f4flxmyj6lhq7lynbkfcx5us4g7l5za\:3\:10\:124\?t\=json 00000000 5b 0a 20 22 66 69 6c 65 6e 6f 64 65 22 2c 20 0a |[. "filenode", .| 00000010 20 7b 0a 20 20 22 6d 75 74 61 62 6c 65 22 3a 20 | {. "mutable": | 00000020 66 61 6c 73 65 2c 20 0a 20 20 22 76 65 72 69 66 |false, . "verif| 00000030 79 5f 75 72 69 22 3a 20 22 55 52 49 3a 43 48 4b |y_uri": "URI:CHK| 00000040 2d 56 65 72 69 66 69 65 72 3a 79 62 70 6f 64 6e |-Verifier:ybpodn| 00000050 62 35 69 62 67 71 62 6b 61 74 6b 6d 68 64 65 6f |b5ibgqbkatkmhdeo| 00000060 76 33 36 6d 3a 69 35 35 71 64 65 79 37 63 77 61 |v36m:i55qdey7cwa| 00000070 71 62 66 75 72 61 66 78 37 33 66 34 66 6c 78 6d |qbfurafx73f4flxm| 00000080 79 6a 36 6c 68 71 37 6c 79 6e 62 6b 66 63 78 35 |yj6lhq7lynbkfcx5| 00000090 75 73 34 67 37 6c 35 7a 61 3a 33 3a 31 30 3a 31 |us4g7l5za:3:10:1| 000000a0 32 34 22 2c 20 0a 20 20 22 72 6f 5f 75 72 69 22 |24", . "ro_uri"| 000000b0 3a 20 22 55 52 49 3a 43 48 4b 3a 75 37 78 66 7a |: "URI:CHK:u7xfz| 000000c0 34 36 6c 37 6c 70 77 61 78 77 36 79 6a 65 71 6e |46l7lpwaxw6yjeqn| 000000d0 6f 74 71 6b 34 3a 69 35 35 71 64 65 79 37 63 77 |otqk4:i55qdey7cw| 000000e0 61 71 62 66 75 72 61 66 78 37 33 66 34 66 6c 78 |aqbfurafx73f4flx| 000000f0 6d 79 6a 36 6c 68 71 37 6c 79 6e 62 6b 66 63 78 |myj6lhq7lynbkfcx| 00000100 35 75 73 34 67 37 6c 35 7a 61 3a 33 3a 31 30 3a |5us4g7l5za:3:10:| 00000110 31 32 34 22 2c 20 0a 20 20 22 73 69 7a 65 22 3a |124", . "size":| 00000120 20 31 32 34 0a 20 7d 0a 5d 0a | 124. }.].| 0000012a
comment:16 Changed at 2010-07-10T23:38:03Z by zooko
- Keywords test-needed added
I asked jsgf about this in email. He replied:
Yes the etag should distinguish different representations.
Which I believe means that comment:15 is right that the current patch is wrong. I guess this shows that we need a unit test which would be red with the current patch--a unit test that loads a file with one representation (GET /uri/$FILEREADCAP) and then loads it again with a different representation (GET /uri/$FILEREADCAP?t=json) and flunks the code under test if it gave back the same thing on the second query.
comment:17 Changed at 2010-07-17T07:33:35Z by zooko
- Milestone changed from 1.7.1 to 1.8β
This probably just needs one more unit test (to demonstrate the problem mentioned in comment:16) and one small tweak and it will be ready.
comment:18 Changed at 2010-07-20T04:57:58Z by zooko
- Keywords news-needed added; test removed
Oh and a NEWS snippet.
comment:19 Changed at 2010-09-11T01:02:12Z by davidsarah
- Milestone changed from 1.8β to 1.9.0
Out of time for 1.8.
comment:20 Changed at 2011-07-16T21:23:50Z by davidsarah
If anyone wants to add a unit test and fix the issue in comment:15, they should do so in the next week in order for this improvement to get into 1.9.
comment:21 Changed at 2011-07-27T18:23:10Z by zooko
- Milestone changed from 1.9.0 to soon
comment:22 Changed at 2012-03-31T20:53:23Z by amiller
I added unit tests and a minor tweak.
In jsgf's original patches, the short-circuit behavior was added to the FileDownloader? class and would therefore only apply to the "t=None" case. So the problem in comment:15 does not occur.
However, we might as well short-circuit when an etag matches for "t=json" etc., in which case the tags do need to be distinguished.
comment:23 Changed at 2012-03-31T20:53:58Z by amiller
- Keywords review-needed added; test-needed removed
comment:24 Changed at 2012-04-01T01:12:30Z by davidsarah
- Milestone changed from soon to 1.10.0
- Owner changed from jsgf to davidsarah
- Status changed from new to assigned
comment:25 Changed at 2012-05-13T07:44:53Z by warner
- Keywords review-needed removed
- Owner changed from davidsarah to warner
- Status changed from assigned to new
I've merged together jsgf's changes with amiller's fixes, and added a couple of new tests. I also fixed the implementation to not use an etag for t=info or t=rename-form, since both actually provide variable output. "", "json", "uri", "readonly-uri" are all eligible.
I'll push the changes in a moment.
comment:26 Changed at 2012-05-13T07:47:23Z by Brian Warner <warner@…>
In 5d404db898e1e6dc:
comment:27 Changed at 2012-05-13T07:50:58Z by warner
- Resolution set to fixed
- Status changed from new to closed
This landed in a series of patches ending with 5d404db89.
I added code recently to set an ETag on immutable files (using the storage index.. I'm not sure whether to use this or the URI). It needs review, however, because the twisted.web req.setETag method needs to be called at a specific point in the response process (to make it interact with the if-etag-equals HTTP header), and I'm not sure we're doing it quite right.
We still don't have etags on mutable files.