Opened at 2021-08-16T18:51:56Z
Closed at 2021-08-19T19:03:04Z
#3763 closed task (fixed)
Potential issues with `PUT /v1/lease/:storage_index` in GBS protocol
Reported by: | itamarst | Owned by: | exarkun |
---|---|---|---|
Priority: | normal | Milestone: | HTTP Storage Protocol |
Component: | unknown | Version: | n/a |
Keywords: | Cc: | ||
Launchpad Bug: |
Description (last modified by exarkun)
- It says "If there are no shares for the given storage_index then do nothing and return NO CONTENT." 204 is a success code... this seems like maybe more like an error, and should therefore have 4xx code?
- What should be returned if there are shares? Anything need to be in the response body? Just 200?
- "If the renew-secret value matches an existing lease then that lease will be renewed instead." Why is this, given there exists a separate renew API endpoint? This seems like it could mask certain categories of client bugs.
Change History (9)
comment:1 Changed at 2021-08-16T18:55:58Z by itamarst
comment:2 Changed at 2021-08-16T20:30:37Z by itamarst
Another question: Why does this endpoint exist at all?
Creating an immutable storage index also creates a lease, per the docs: "A lease is also created for the shares." So maybe this endpoint doesn't need to exist at all?
comment:3 Changed at 2021-08-17T10:48:48Z by exarkun
- Description modified (diff)
comment:4 follow-up: ↓ 6 Changed at 2021-08-17T14:21:41Z by exarkun
- It says "If there are no shares for the given storage_index then do nothing and return NO CONTENT." 204 is a success code... this seems like maybe more like an error, and should therefore have 4xx code?
In e9563ebc02a67314060f38e8533e2e1e551ce9a6 warner changes the Foolscap-based implementation to return None instead of raising an exception. The commit message is:
change RIStorageServer.remote_add_lease to exit silently in case of no-such-bucket, instead of raising IndexError?, because that makes the upcoming --add-lease feature faster and less noisy
I failed to find much more detail about this. Is it only about the fact that exceptions are slow? Maybe transferring exceptions via Foolscap is even slower? If that is all then it seems like we don't have to mirror this choice in GBS. It seems like a good idea to tell the client that they tried to do something and the server did not do it, rather than let them believe the server did do it.
I think 410 is meant for cases where the server specifically knows the resource used to exist. I don't think the server will necessarily know this. A 404 might make the most sense?
- What should be returned if there are shares? Anything need to be in the response body? Just 200?
The Foolscap-based implementation returns None. That makes me think an empty body makes sense. For the status code, either 200 or 201. 201 seems like the right fit to me, since we *did* just "create" a lease. Except https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/201 says its for *resource* creation and we should return Location in the response header ... so I guess not. Just a 200.
- "If the renew-secret value matches an existing lease then that lease will be renewed instead." Why is this, given there exists a separate renew API endpoint? This seems like it could mask certain categories of client bugs.
I looked around the codebase a bit. I tried to find code that uses add_lease and see what the consequences of it being mistakenly used instead of renew_lease might be. I also looked for users of renew_lease and tried to understand why they were correct (as opposed to a use of add_lease). I didn't come up with much.
One part of the server code has this:
def add_or_renew_lease(self, lease_info): try: self.renew_lease(lease_info.renew_secret, lease_info.expiration_time) except IndexError: self.add_lease(lease_info)
where the difference between the two is flagrantly ignored...
I think the high-level view here might be that leases are a very lightly used Tahoe feature and garbage collection may be even more lightly used - so the consequences of getting lease logic wrong are minimal or non-existent.
I *feel* like the overlap between add_lease and renew_lease is a mistake. It would be convenient if the renew_lease functionality were *100%* encompassed by that overlap (because then renew_lease could be deleted) but there is the annoying cancel_secret...
Trying to enumerate options...
- Leave it as-is (and/or fix it later)
- Remove renew_lease from the network and Python APIs; adjust all callers of the Python renew_lease API to use the add_lease Python API instead.
- Remove renew_lease from the network API; (re-)implement the Python API to use the add_lease network API instead; invent a cancel_secret along the way (maybe some kind of null value that indicates an uncancellable lease).
- Remove renew_lease from the network API; add cancel_secret to the Python API; (re-)implement the Python API to use add_lease instead and to provide a good cancel_secret.
- Remove the renew semantics from the add_lease Python and network API; update callers to provide appropriate error handling on a case-by-case basis
- Remove the renew semantics from the add_lease network API; (re-)implement the Python API to provide error handling that automatically falls back to using the renew_lease network API.
Probably should stop typing at this point and see if anyone else finds any of this coherent.
comment:5 Changed at 2021-08-17T14:39:07Z by exarkun
Another question: Why does this endpoint exist at all?
Creating an immutable storage index also creates a lease, per the docs: "A lease is also created for the shares." So maybe this endpoint doesn't need to exist at all?
There does need to be a "create a new lease" endpoint. The strongest case is made by this scenario, I think:
- Alice uploads some immutable shares. A lease is implicitly created and keeps the data alive.
- Alice shares the cap with Bob.
- Bob creates his own lease.
- Alice abandons the content and her lease expires.
- Bob's lease keeps the data alive.
whether is pattern is actually supported by Tahoe as currently implemented, I'm not sure (I can't remember if leases with distinct renew secrets actually get tracked separately from each other ... but I think they might and if they don't, they should).
comment:6 in reply to: ↑ 4 Changed at 2021-08-17T19:35:31Z by itamarst
Replying to exarkun:
I *feel* like the overlap between add_lease and renew_lease is a mistake. It would be convenient if the renew_lease functionality were *100%* encompassed by that overlap (because then renew_lease could be deleted) but there is the annoying cancel_secret...
Well, but the cancellation secret isn't actually used by anything. So for implementation purposes we can just ignore it for now, have a single API call, and worry about cancellation later?
On the other hand, for audit purposes having a cancellation secret spec in place would be useful, seems like a thing where you really want to get the security right...
comment:7 Changed at 2021-08-18T14:39:42Z by exarkun
Some more background related to lease cancellation:
commit 5476f67dc1177a26b69fd67fbe589842f065d482 Author: Zooko O'Whielacronx <zooko@zooko.com> Date: Mon Sep 12 15:23:31 2011 -0700 storage: remove the storage server's "remote_cancel_lease" function We're removing this function because it is currently unused, because it is dangerous, and because the bug described in #1528 leaks the cancellation secret, which allows anyone who knows a file's storage index to abuse this function to delete shares of that file. fixes #1528 (there are two patches that are each a sufficient fix to #1528 and this is one of them) commit 65de17245da26a4ce00aa7c441d6bf81464a6e65 Author: Zooko O'Whielacronx <zooko@zooko.com> Date: Mon Sep 12 15:23:24 2011 -0700 storage: test that the storage server does *not* have a "remote_cancel_lease" function We're removing this function because it is currently unused, because it is dangerous, and because the bug described in #1528 leaks the cancellation secret, which allows anyone who knows a file's storage index to abuse this function to delete shares of that file. ref. #1528 commit cffc98780414760c8d5f751c5841856b3207cce3 Author: Zooko O'Whielacronx <zooko@zooko.com> Date: Mon Sep 12 15:12:01 2011 -0700 immutable: test whether the server allows clients to read past the end of share data, which would allow them to learn the cancellation secret Also test whether the server explicitly declares that it prevents this problem. ref #1528
comment:8 Changed at 2021-08-18T15:55:50Z by itamarst
#3773 covers some API changes to address some of the above.
comment:9 Changed at 2021-08-19T19:03:04Z by GitHub <noreply@…>
- Resolution set to fixed
- Status changed from new to closed
In 9a5c417/trunk:
Possibly "410 Gone" makes more sense than "204 No content"? Or maybe I'm misunderstanding the order of operations.