Opened at 2007-08-15T19:33:53Z
Closed at 2013-08-28T16:47:41Z
#104 closed task (invalid)
does cp -r work as expected?
Reported by: | zooko | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-frontend-cli | Version: | 0.7.0 |
Keywords: | usability tahoe-cp docs | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
It would be good if the command-lines
allmydata-tahoe get
and
allmydata-tahoe put
supported the --recursive or -r option so that you could upload or download and entire collection of files with one command-line.
There are actually a host of issues that arise in implementing this, such as those mentioned in the "names versus identifiers" section of webapi.txt, and quoted here:
For example, suppose you are writing code which recursively downloads the contents of a directory. The first thing your code does is fetch the listing of the contents of the directory. For each child that it fetched, if that child is a file then it downloads the file, and if that child is a directory then it recurses into that directory. Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, in which case your attempt to recurse into it would result in an error and the file would be skipped, or a child name that pointed to a file when you listed the directory might now point to a sub-directory, in which case your attempt to download the child would result in a file containing HTML text describing the sub-directory!
These problems can be avoided by traversing identifiers instead of names, but the next problems can't. The next problems are that dirnodes can recurse (a dirnode can contain an entry pointing to another dirnode which contains an entry pointing to the first), or can converge (two entries in the same or different dirnodes can point to the same object). We could implement a recursive download of such things by (perhaps arbitrarily) choosing one path to be a real link and the other to be a symlink. But Windows doesn't have symlinks. Another option would be to abort and print an error message if such a pattern is encountered.
Change History (25)
comment:1 Changed at 2007-08-15T21:34:26Z by zooko
- Milestone changed from undecided to 0.6.0
- Status changed from new to assigned
comment:2 Changed at 2007-09-19T22:55:45Z by zooko
- Milestone changed from 0.6.0 to 0.7.0
comment:3 Changed at 2007-10-01T18:17:13Z by zooko
- Summary changed from recursive get and recursive put to command-line: recursive get and recursive put
comment:4 Changed at 2007-10-01T19:25:42Z by zooko
- Milestone changed from 0.7.0 to 0.6.1
- Version changed from 0.4.0 to 0.6.0
comment:5 Changed at 2007-10-13T06:50:48Z by zooko
- Milestone changed from 0.6.1 to 0.7.0
bumping this to v0.7
comment:6 Changed at 2007-11-01T18:13:48Z by zooko
- Milestone changed from 0.7.0 to 0.7.1
We're focussing on an imminent v0.7.0 (see the roadmap) which hopefully has #197 -- Small Distributed Mutable Files and also a fix for #199 -- bad SHA-256. So I'm bumping less urgent tickets to v0.7.1.
comment:7 Changed at 2007-11-01T18:14:13Z by zooko
- Version changed from 0.6.0 to 0.6.1
comment:8 Changed at 2007-11-13T18:22:08Z by zooko
- Milestone changed from 0.7.1 to 0.7.2
- Version changed from 0.6.1 to 0.7.0
comment:9 Changed at 2008-01-15T21:36:41Z by zooko
- Component changed from code-frontend to code-frontend-cli
comment:10 Changed at 2008-01-23T04:19:03Z by zooko
- Milestone changed from 0.7.2 to undecided
comment:11 Changed at 2008-03-10T01:31:01Z by zooko
- Owner changed from zooko to nobody
- Status changed from assigned to new
comment:12 Changed at 2008-06-01T21:02:33Z by warner
- Milestone changed from eventually to 1.2.0
this is being replaced by "cp -r", and might be sufficiently done by now (although we may wish to put off closing this until "cp -r" works a bit better). Moving this to 1.2.0 with the idea that it might be closed by the 1.1.0 release.
comment:13 Changed at 2008-06-07T19:34:48Z by zooko
- Milestone changed from 1.2.0 to 1.1.0
I don't understand why you put it into Milestone 1.2.0 if you think it is ready to be closed as a feature added to 1.1.0.
Also, what did you do about convergent links (as mentioned in the initial note on this ticket), and what did you do about link cycles? And did you avoid the weirdness of race conditions, as described in the initial note of this ticket, by using caps instead of names as the "next links"?
Thanks!
comment:14 Changed at 2008-06-07T19:35:02Z by zooko
- Owner changed from nobody to warner
comment:15 Changed at 2008-06-09T18:30:16Z by zooko
- Milestone changed from 1.1.0 to 1.2.0
Okay, there is a complete implementation of cp -r, but we haven't analyzed some of the potential issues mentioned in this ticket, or whether this UI is sufficient, or whether it is not actually completely complete. So, later we'll consider these questions, and we're leaving this ticket open to remind us to do that.
comment:16 Changed at 2009-06-30T12:39:27Z by zooko
- Milestone changed from 1.5.0 to eventually
comment:17 Changed at 2009-12-13T03:55:23Z by davidsarah
- Keywords usability added
- Summary changed from command-line: recursive get and recursive put to does cp -r work as expected?
comment:18 Changed at 2009-12-13T03:55:52Z by davidsarah
- Keywords cp added
comment:19 Changed at 2009-12-13T03:56:16Z by davidsarah
- Type changed from enhancement to task
comment:20 Changed at 2010-02-02T03:17:39Z by davidsarah
- Milestone changed from eventually to 1.7.0
comment:21 Changed at 2010-02-12T05:11:11Z by davidsarah
- Keywords tahoe-cp added; cp removed
comment:22 Changed at 2010-02-12T05:11:20Z by davidsarah
- Keywords docs added
comment:23 Changed at 2010-06-16T03:59:31Z by davidsarah
- Milestone changed from 1.7.0 to soon
comment:24 Changed at 2012-11-26T00:36:58Z by davidsarah
This ticket is way too vague.
TahoeDirectorySource and TahoeDirectoryTarget in git/src/allmydata/scripts/tahoe_cp.py have cache dictionaries that seem as though they might have the effect of copying cycles correctly between two Tahoe directories, but I don't see a unit test for that in allmydata.test.test_cli.Cp.
#712 is one way in which tahoe cp -r does not do the right thing. I also don't think it will do the right thing when copying a cyclic Tahoe directory to a local disk, although perhaps #712 obscures that.
I filed #1878 to add tests for both cyclic cases.
OTOH, TahoeDirectorySource does not have the following bug:
Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, [...] or a child name that pointed to a file when you listed the directory might now point to a sub-directory...
Is there anything more to do on this ticket, or is it covered by #712 and #1878?
comment:25 Changed at 2013-08-28T16:47:41Z by daira
- Description modified (diff)
- Resolution set to invalid
- Status changed from new to closed
Closed for vagueness.
Promoting this to Milestone 0.6.1 because my favorite customer, Peter, wants it.