Opened at 2011-08-08T17:18:54Z
Closed at 2012-12-05T20:33:21Z
#1471 closed enhancement (fixed)
Make Crawlers Compatible With Pluggable Backends
Reported by: | Zancas | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-storage | Version: | 1.8.2 |
Keywords: | s3-backend crawler | Cc: | warner, zancas |
Launchpad Bug: |
Description
The ShareCrawler class (and children) were designed under the assumption of a single "Disk"-type backend. In the future Crawlers will support multiple possible backends. We'll (probably) use the composition idiom where the Crawlers learn about the backend by being passed the relevant backend object in their constructor.
Change History (4)
comment:1 Changed at 2011-08-08T18:26:02Z by warner
- Summary changed from Make Crawler's Compatible With Pluggable Backends to Make Crawlers Compatible With Pluggable Backends
comment:2 Changed at 2011-08-11T04:42:09Z by Zancas
- Owner changed from Zancas to zancas
comment:3 Changed at 2011-12-16T16:25:12Z by davidsarah
- Cc warner zancas added
- Keywords s3-backend crawler added; backend S3 removed
- Owner zancas deleted
comment:4 Changed at 2012-12-05T20:33:21Z by zooko
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
(fixed title: http://www.angryflower.com/bobsqu.gif)
I'd like to point out that the use of a Crawler at all is deeply intertwined with the way the shares are being stored. We decided early on that we'd prefer a storage scheme in which the share files are the primary source of truth, and that anything else is merely a volatile performance-enhancing cache that could be deleted at any time without long-term information loss. The idea was to keep the storage model simple for server-admins, letting them correctly assume that shares could be migrated by merely copying sharefiles from one box to another. (write-enablers violate this assumption, but we're working on that).
Those Crawlers exist to manage things like lease-expiration and stats-gathering from a bunch of independent sharefiles, both handling the initial bootstrap case (i.e. you've just upgraded your storage server to a version that knows how to expire leases) and later recovery cases (i.e. you've migrated some shares into your server, or you manually deleted shares for some reason). It assumes that share metadata can be retrieved quickly (i.e. fast local disk).
If a server is using a different backend, these rules and goals might not apply. For example, if shares are being stored in S3, are shares stored in a single S3 object each? How important is it that you be able to add or remove objects without going through the storage server? It may be a lot easier/faster to use a different approach:
Anyawys, my point is that you shouldn't assume a Crawler is the best way to do things, and that therefore you must find a way to port the Crawler code to a new backend. It fit a specific use-case for local disk, but it's pretty slow and resource-intensive, and for new uses (i.e. Accounting) I'm seriously considering finding a different approach. Don't be constrained by that particular design choice for new backends.