Opened at 2008-02-07T19:12:06Z
Closed at 2008-09-18T05:20:36Z
#301 closed enhancement (fixed)
t=deep-check with JSON output, for automated checking
Reported by: | zooko | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.3.0 |
Component: | code-encoding | Version: | 0.7.0 |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
Run "check" on files and directories in an automated, regular way.
It's not clear how the checker process should get the verifier caps that it needs. See bigger, more general ticket #119 -- "lease expiration / deletion / filechecking / quotas".
Change History (7)
comment:1 Changed at 2008-02-07T19:32:00Z by zooko
comment:2 Changed at 2008-06-01T21:05:14Z by warner
- Milestone changed from eventually to undecided
comment:3 Changed at 2008-09-03T01:35:36Z by warner
- Milestone changed from undecided to 1.3.0
comment:4 Changed at 2008-09-04T18:58:06Z by warner
- Summary changed from automate checking to t=deep-check with JSON output, for automated checking
The plan for this is:
- provide a deep-check webapi with machine-readable (JSON) output
- give responsibility for running deep-check to users or grid admins. They should periodically run deep-check (possibly with verify=true, probably with repair=true) on their root-caps.
comment:5 Changed at 2008-09-04T20:12:25Z by warner
Here are my docs/webapi.txt additions describing the JSON output for t=check and t=deep-check . I'm working on implementing this now.
POST $URL?t=check This triggers the FileChecker to determine the current "health" of the given file or directory, by counting how many shares are available. The page that is returned will display the results. This can be used as a "show me detailed information about this file" page. If a when_done=url argument is provided, the return value will be a redirect to that URL instead of the checker results. If a return_to=url argument is provided, the returned page will include a link to the given URL entitled "Return to the parent directory". If a verify=true argument is provided, the node will perform a more intensive check, downloading and verifying every single bit of every share. If an output=JSON argument is provided, the response will be machine-readable JSON instead of human-oriented HTML. The data is a dictionary with the following keys: storage-index: a base32-encoded string with the objects's storage index, or an empty string for LIT files repair-attempted: (bool) True if repair was attempted repair-successful: (bool) True if repair was attempted and the file was fully healthy afterwards. pre-repair-results: a dictionary that describes the state of the file before any repair was performed. For LIT files, this dictionary has only the 'healthy' key, which will always be True. For distributed files, this dictionary has the following keys: count-shares-good: the number of good shares that were found count-shares-needed: 'k', the number of shares required for recovery count-shares-expected: 'N', the number of total shares generated count-good-share-hosts: the number of distinct storage servers with good shares. If this number is less than count-shares-good, then some shares are doubled up, increasing the correlation of failures. This indicates that one or more shares should be moved to an otherwise unused server, if one is available. count-corrupt-shares: the number of shares with integrity failures list-corrupt-shares: a list of "share identifiers", one for each share that was found to be corrupt. Each share identifier is a list of (serverid, storage_index, sharenum). needs-rebalancing: (bool) True if there are multiple shares on a single storage server, indicating a reduction in reliability that could be resolved by moving shares to new servers. servers-responding: list of base32-encoded storage server identifiers, one for each server which responded to the share query. healthy: (bool) True if the file is completely healthy, False otherwise. Healthy files have at least N good shares. Overlapping shares (indicated by count-good-share-hosts < count-shares-good) do not currently cause a file to be marked unhealthy. If there are at least N good shares, then corrupt shares do not cause the file to be marked unhealthy, although the corrupt shares will be listed in the results (list-corrupt-shares) and should be manually removed to wasting time in subsequent downloads (as the downloader rediscovers the corruption and uses alternate shares). post-repair-results: a dictionary (with the same keys as pre-repair-results) that describes the state of the file after any repair was performed. If no repair was requested or required, 'pre-repair-results' and 'post-repair'results' will be identical. Note that since immutable shares cannot be modified by clients, any corrupt immutable shares in pre-repair-results will remain in post-repair-results. POST $URL?t=deep-check This triggers a recursive walk of all files and directories reachable from the target, performing a check on each one just like t=check. The result page will contain a summary of the results, including details on any file/directory that was not fully healthy. t=deep-check is most useful to invoke on a directory. If invoked on a file, it will just check that single object. The recursive walker will deal with loops safely. This accepts the same verify=, when_done=, and return_to= arguments as t=check. Be aware that this can take a long time: perhaps a second per object. No progress information is currently provided: the server will be silent until the full tree has been traversed, then will emit the complete response. If an output=JSON argument is provided, the response will be machine-readable JSON instead of human-oriented HTML. The data is a dictionary with the following keys: count-objects-checked: count of how many objects were checked count-objects-healthy: how many of those objects were completely healthy count-objects-unhealthy: how many were damaged in some way count-repairs-attempted: repairs were attempted on this many objects. The count-repairs- keys will always be provided, however unless repair=true is present, they will all be zero. count-repairs-successful: how many repairs resulted in healthy objects count-repairs-unsuccessful: how many repairs resulted did not results in completely healthy objects count-corrupt-shares: how many shares were found to have corruption, summed over all objects examined list-corrupt-shares: a list of "share identifiers", one for each share that was found to be corrupt. Each share identifier is a list of (serverid, storage_index, sharenum). list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares that were successfully repaired are not included. These are shares that need manual processing. Since immutable shares cannot be modified by clients, all corruption in immutable shares will be listed here. list-unhealthy-files: a list of (pathname, check-results) tuples, for each file that was not fully healthy. 'pathname' is relative to the directory on which deep-check was invoked. The 'check-results' field is the same as that returned by t=check&output=JSON, described above.
comment:6 Changed at 2008-09-06T05:44:40Z by warner
Ok, I just split check() and check_and_repair() into separate methods, because they return significantly different results. check() returns a single ICheckerResults instance, whereas the return value of check_and_repair() needs to have two such instances (pre-repair and post-repair), as well as indicating whether repair was attempted or not, and whether it was successful or not. This means there is also a deep_check() and deep_check_and_repair().
I left the webapi alone, but internally the POST t=check (and t=deep-check) implementation calls different methods depending upon the value of the repair= argument.
comment:7 Changed at 2008-09-18T05:20:36Z by warner
- Resolution set to fixed
- Status changed from new to closed
The webapi now does the right thing, and both mutable and immutable checkers provide the right sort of output. (zooko is still working on the immutable verifier).
I think it might be good to go ahead and implement the auto-checker without knowing how it is going to get its verifier caps. The interface to it can include that the client provides verifier caps.