#2329 closed defect (fixed)

cp -r stops with an exception

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.10.1
Component: code-frontend-cli Version: 1.10.0
Keywords: regression tahoe-cp release-blocker review-needed Cc:
Launchpad Bug:

Description

$ tahoe cp --verbose -r $CAP .
Traceback (most recent call last):
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/cli.py", line 551, in cp
    rc = tahoe_cp.copy(options)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 774, in copy
    return Copier().do_copy(options)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 451, in do_copy
    status = self.try_copy()
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 513, in try_copy
    return self.copy_to_directory(sources, target)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 617, in copy_to_directory
    source_dirs = self.build_graphs(source_infos)
  File "/home/zooko/playground/tahoe/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 768, in build_graphs
    name = os.path.basename(os.path.normpath(name))
  File "/usr/lib/python2.7/posixpath.py", line 342, in normpath
    initial_slashes = path.startswith('/')
AttributeError: 'NoneType' object has no attribute 'startswith'
$ tahoe --version
allmydata-tahoe: 1.10.0.post167 [1382-rewrite-4: 102d5846b53a715bd9a51aac20f325dd6f6830be]
foolscap: 0.6.4 
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958
zfec: 1.4.24
Twisted: 13.0.0 
Nevow: 0.11.1   
zope.interface: unknown
python: 2.7.6   
platform: Linux-Ubuntu_14.04-x86_64-64bit_ELF
pyOpenSSL: 0.13 
simplejson: 3.3.1
pycrypto: 2.6.1 
pyasn1: 0.1.7   
mock: 1.0.1
service-identity: 14.0.0
setuptools: 0.6c16dev4

Change History (57)

comment:1 Changed at 2014-11-07T14:29:45Z by zooko

  • Keywords regression added

When I tried the same thing with the Tahoe-LAFS v1.10.0 release (from Ubuntu), it worked! So this is a regression.

comment:2 Changed at 2014-11-07T14:33:27Z by zooko

  • Milestone changed from undecided to 1.11.0

comment:3 Changed at 2014-11-19T07:15:05Z by daira

  • Keywords tahoe-cp added
  • Owner set to daira
  • Priority changed from normal to major
  • Status changed from new to assigned

comment:4 Changed at 2014-11-25T18:11:57Z by daira

Possibly caused by the fix to #712 (which was the last patch to touch the code where the error occurs: trunk/src/allmydata/scripts/tahoe_cp.py?annotate=blame#L767 )

comment:6 Changed at 2014-11-26T03:38:20Z by daira

  • Keywords review-needed added
  • Owner changed from daira to zooko
  • Status changed from assigned to new

Review needed for the test.

comment:7 Changed at 2015-01-20T17:54:48Z by warner

  • Owner changed from zooko to warner
  • Status changed from new to assigned

warner will review the test, commit it with a TODO flag, then try to solve the original problem

comment:8 Changed at 2015-01-20T17:55:14Z by daira

  • Keywords easy added

comment:9 Changed at 2015-01-20T20:32:16Z by warner

So it looks like the intention of #712 was to make a command like cp -r A LOCAL or cp -r A LOCAL/ create LOCAL/A. This depends upon "A" including a file or directory name (eg cp -r DIRCAP/foo.txt LOCAL/). But if A is a pure cap, then it has no human-meaningful name. What should get created in this case?

I think the #2329 bug is resulting from a code path that assumes it will be provided with a name, and the pure-cap source argument doesn't give it one.

comment:10 Changed at 2015-01-20T21:18:14Z by daira

  • Keywords release-blocker added

comment:11 Changed at 2015-01-20T21:24:00Z by daira

Well, what did it do in Tahoe-LAFS v1.10.0? (There are two cases: $CAP is a file cap, or a directory cap.)

comment:12 Changed at 2015-01-21T00:16:28Z by warner

From a tahoe-side tree with PARENTCAP, DIRCAP=PARENTCAP/dir, and FILECAP=PARENTCAP/dir/file.txt, and a local (real) target directory "local":

  • 1.10
    • (A) cp -r PARENTCAP/dir local/ -> local/file.txt
    • (B) cp -r DIRCAP local/ -> local/file.txt
    • (C) cp -r DIRCAP_ALIAS: local/ -> local/file.txt
    • (D) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (E) cp -r FILECAP local/ -> "error, you must specify a destination filename"
    • note the target's trailing slash is optional: local/ and local behave the same way
  • trunk (e73d76e)
    • (F) cp -r PARENTCAP/dir local/ -> local/dir/file.txt
    • (G) cp -r DIRCAP local/ -> (exception)
    • (H) cp -r DIRCAP_ALIAS: local/ -> (exception)
    • (I) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (J) cp -r FILECAP local/ -> "error, you must specify a destination filename"

#712 fixed F to behave more like regular POSIX /bin/cp. I can think of three ways we might go with case G and H:

  • 1: cp -r DIRCAP local/ should behave like an imaginary cp -r DIRCAP/* local/ would do (this is imaginary because we have no tahoe-side globs). In this example, it'd create local/file.txt. This would match what it did in 1.10 (case B/C), but wouldn't match F. That'd be a shame, because we generally claim that "PARENTCAP/dir" and "DIRCAP" and "ALIAS:" are all interchangeable.
  • 2: cp -r DIRCAP local/ would pretend that the source directory was named after the actual DIRCAP string (the ugly base32 representation), and create local/BASE32DIRCAP/file.txt . Ick. For H we could use the alias name as a target name, so it'd create local/ALIASNAME/file.txt, which is still lame but at least human-readable.
  • 3: cp -r DIRCAP local/ would complain "you must specify a destination directory name", like cases E and J. This is somewhat plausible for G, and looks slightly weird for H.

We don't have enough information to make case G behave like F.

Last edited at 2015-01-22T17:22:55Z by daira (previous) (diff)

comment:13 Changed at 2015-01-21T18:47:22Z by warner

Well, I guess I'm going to implement option 1: make cp -r DIRCAP local/ behave like it used to (putting the contents of DIRCAP into local/) instead of like how cp -r PARENTCAP/dir local/ does now (making a new subdirectory under local/). It's not super consistent, but it's better than an exception.

comment:14 Changed at 2015-01-21T20:14:54Z by daira

+1 for option 1.

comment:15 Changed at 2015-01-22T17:56:03Z by daira

When I said +1 for option 1, I was thinking that the previous behaviour of cases (B) and (C) --i.e. copying the contents of a dircap into a directory-- would become inexpressible. However I now understand the error in option 3 to be saying that "you must specify a previously nonexisting destination directory name". In that case you could still write:

  • (K) cp -r DIRCAP local/missing -> local/missing

which would be consistent with cases (F) and (I), because local/missing becomes a copy of DIRCAP.

Version 1, edited at 2015-01-22T17:56:27Z by daira (previous) (next) (diff)

comment:16 Changed at 2015-01-22T18:06:16Z by nejucomo

If there were a tahoe-side globbing feature, the error messages in option 3 could say:

""" To copy the contents of this dircap into an existing destination directory append /*; eg:

tahoe cp -r "$DIRCAP/*" /path/to/existing/dir

To create a new directory with the contents of this dircap, specify a new local directory name; eg:

tahoe cp -r "$DIRCAP" /path/to/nonexisting

"""

A glob feature would allow the explicit indication of whether a bag, or the contents of the bag, should be copied.

comment:17 Changed at 2015-01-22T18:13:55Z by daira

Splitting by whether the destination exists or not:

  • 1.10
    • (A1) cp -r PARENTCAP/dir local/ -> local/file.txt
    • (A2) cp -r PARENTCAP/dir local/missing -> ?
    • (B1) cp -r DIRCAP local/ -> local/file.txt
    • (B2) cp -r DIRCAP local/missing -> ?
    • (C1) cp -r DIRCAP_ALIAS: local/ -> local/file.txt
    • (C2) cp -r DIRCAP_ALIAS: local/missing -> ?
    • (D1) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (D2) cp -r DIRCAP/file.txt local/missing -> local/missing
    • (E1) cp -r FILECAP local/ -> "error, you must specify a destination filename"
    • (E2) cp -r FILECAP local/missing -> local/missing
    • note the target's trailing slash is optional: local/ and local behave the same way
  • trunk (e73d76e)
    • (F1) cp -r PARENTCAP/dir local/ -> local/dir/file.txt
    • (F2) cp -r PARENTCAP/dir local/missing -> ?
    • (G1) cp -r DIRCAP local/ -> (exception)
    • (G2) cp -r DIRCAP local/missing -> (exception?)
    • (H1) cp -r DIRCAP_ALIAS: local/ -> (exception)
    • (H2) cp -r DIRCAP_ALIAS: local/missing -> (exception?)
    • (I1) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (I2) cp -r DIRCAP/file.txt local/missing -> local/missing
    • (J1) cp -r FILECAP local/ -> "error, you must specify a destination filename"
    • (J2) cp -r FILECAP local/missing -> local/missing
Last edited at 2015-01-23T00:52:10Z by daira (previous) (diff)

comment:18 Changed at 2015-01-22T18:20:05Z by zooko

Here's a proposal:

We determine whether the source means the bag, or the contents of the bag, according to this rule:

  • If the source comes with a name, then it means the bag. If it comes with no name, then it means the contents of the bag.

Note that the question of whether the source means the bag or the contents of the bag is not influenced by whether the target exists or doesn't exist.

This is another way of expressing proposal 1 from comment:12.

Last edited at 2015-01-22T18:21:02Z by zooko (previous) (diff)

comment:19 Changed at 2015-01-22T18:34:08Z by nejucomo

I'm convinced globbing is too complicated if it evolves towards emulating bash. The key feature I was interested in comment:16 was an explicit consistent way to disambiguate between container and contents.

So this could also be a --contents flag, or another non-globby-looking symbol, such as the presence of a trailing slash (which I believe warner suggested elsewhere).

Examples:

  • tahoe cp -r "$X" dest/ # Copy the bag $X to dest/$NAME if $X has a name, otherwise fail with an error.
  • tahoe cp -r "$X/" dest/ # Copy the contents of $X into ./dest (the name of $X is irrelevant).

Note that this can be orthogonal to whether or not dest/ exists.

Some semiconcrete examples:

  • tahoe cp -r myalias: foo/ # Error: "myalias:" has no name. To copy the contents into foo/ append a /
  • tahoe cp -r myalias:/ foo/ # foo will contain the contents of myalias after this.
  • tahoe cp -r URI:DIR2... foo/ # Error: a dircap has no name, to copy the contents, append a /
  • tahoe cp -r URI:DIR2.../ foo # Copy the contents of the cap into foo.
  • tahoe cp -r URI:DIR2.../blah foo # ./foo/blah/ will contain the contents.
  • tahoe cp -r URI:DIR2.../blah/ foo # ./foo will contain the contents.
Last edited at 2015-01-22T18:38:30Z by nejucomo (previous) (diff)

comment:20 follow-up: Changed at 2015-01-22T18:47:33Z by zooko

Here's my attempt to fill out the table from comment:17 for a certain rule that I have in my mind right now (written below).

  • "Rule comment:20"
    • (F1) cp -r PARENTCAP/dir local/ -> local/dir/file.txt
    • (F2) cp -r PARENTCAP/dir local/missing -> local/missing/dir/file.txt
    • (G1) cp -r DIRCAP local/ -> local/file.txt
    • (G2) cp -r DIRCAP local/missing -> local/missing/file.txt
    • (H1) cp -r DIRCAP_ALIAS: local/ -> local/file.txt
    • (H2) cp -r DIRCAP_ALIAS: local/missing -> local/missing/file.txt
    • (I1) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (I2) cp -r DIRCAP/file.txt local/missing -> local/missing
    • (J1) cp -r FILECAP local/ -> "error, you must specify a destination filename"
    • (J2) cp -r FILECAP local/missing -> local/missing

"Rule comment:20" is:

  1. If the source is a directory:
  1. If the target is the name of a locally existing file, then "error: there is already a local file present under the name $TARGET".
  1. If the target is the name of something not locally existing, then mkdir it and then use it as "target directory".
  1. If the target is the name of a locally existing directory, then proceed to use it as "target directory".
  1. Check whether the source directory has a name (as in F1, F2) or has no name (as in G1, G2, H1, H2). If it has a name then we say that the source means the bag itself — the directory, and if it has no name then we say that the source means the contents of the bag — the contents of the directory.
  1. Now if the source is the bag itself, then mkdir a new directory inside "target directory", named by the name of the source directory, and copy the contents of the bag into "target directory"/"source directory name"/ (which are cases F1 and F2).
  1. Else (the source was the contents of the bag instead of the bag itself) copy the contents of the bag into "target directory"/ (which are cases G1, G2, H1, and H2).
  1. If the source is a file, then check if the target is an existing directory.
  1. If the source is a nameless file (as in J1, J2) *and* target is an existing directory (as in J1), then "error, you must specify a destination filename".
  1. Else, if the source is a nameless file and the target is not an existing directory, then use the target as the local filename (which is case J2).
  1. Else, if the source is a named file and the target is an existing directory, then use the source filename within the target existing directory (which is case I1).
  1. Else, if the source is a named file and the target is not an existing directory, then use the target as the local filename (which is case I2).
Last edited at 2015-01-22T18:47:49Z by zooko (previous) (diff)

comment:21 in reply to: ↑ 20 Changed at 2015-01-22T19:00:45Z by zooko

"Rule comment:20" could in the future be extended by additional syntax to indicate that the user wants to copy the contents-of-the-bag instead of the bag, in those cases where the bag had a name. comment:19 suggests such additional syntax. (Side note: I have been confused in the past by the presence-or-absence-of-trailing-slash, such as rsync uses, to indicate this sort of thing. I'd prefer a very long and explicit thing like a command-line switch --contents-of.)

comment:22 Changed at 2015-01-22T21:26:34Z by daira

  • Keywords easy removed

comment:23 Changed at 2015-01-22T21:28:49Z by daira

Another argument against globbing is that the argument would have to be quoted -- and if it were not, strange things would happen on Unix (I think the argument would appear to be omitted assuming there is no local file that matches the glob).

comment:24 Changed at 2015-01-27T08:21:34Z by warner

Ok, here's the current state of affairs:

  • cp -r X local/, where local already exists:
X 1.10 trunk(e73d76e) new?
PARENTCAP/dir (A1) local/file (F1) local/dir/file (F3)
DIRCAP (B1) local/file (G1) EXCEPTION-1 (G3)
DIRCAP_ALIAS: (C1) local/file (H1) EXCEPTION-1 (H3)
-
DIRCAP/file (D1) local/file (I1) local/file (I3)
FILECAP (E1) ERROR-2 (J1) ERROR-2 (J3)
  • cp -r X local/missing, where local exists but missing does not:
X 1.10 trunk(e73d76e) /bin/cp -r new?
PARENTCAP/dir (A2) local/missing/file (F2) local/missing/dir/file local/missing/file (F4)
DIRCAP (B2) local/missing/file (G2) EXCEPTION-1 (G4)
DIRCAP_ALIAS: (C2) local/missing/file (H2) EXCEPTION-1 (H4)
-
DIRCAP/file (D2) local/missing/file (I2) local/missing/file local/missing (I4)
FILECAP (E2) EXCEPTION-3 (J2) EXCEPTION-3 (J4)
  • EXCEPTION-1: build_graphs(), NoneType has no attribute startswith
  • ERROR-2: error: you must specify a destination filename
  • EXCEPTION-3: put_file() line 156, name is None but precondition requires isinstance(unicode)
Last edited at 2015-02-10T17:53:35Z by warner (previous) (diff)

comment:25 follow-up: Changed at 2015-01-27T08:34:53Z by warner

I think zooko's rule in comment:20 / comment:27 gives us:

  • cp -r X local/, where local already exists:
X 1.10 trunk(e73d76e) rule comment:20
PARENTCAP/dir (A1) local/file (F1) local/dir/file (F3) local/dir/file
DIRCAP (B1) local/file (G1) EXCEPTION-1 (G3) local/file
DIRCAP_ALIAS: (C1) local/file (H1) EXCEPTION-1 (H3) local/file
-
DIRCAP/file (D1) local/file (I1) local/file (I3) local/file
FILECAP (E1) ERROR-2 (J1) ERROR-2 (J3) ERROR-2
  • cp -r X local/missing, where local exists but missing does not:
X 1.10 trunk(e73d76e) rule comment:20
PARENTCAP/dir (A2) local/missing/file (F2) local/missing/dir/file (F4) local/missing/dir/file
DIRCAP (B2) local/missing/file (G2) EXCEPTION-1 (G4) local/missing/file
DIRCAP_ALIAS: (C2) local/missing/file (H2) EXCEPTION-1 (H4) local/missing/file
-
DIRCAP/file (D2) local/missing/file (I2) local/missing/file (I4) local/missing
FILECAP (E2) EXCEPTION-3 (J2) EXCEPTION-3 (J4) local/missing

Although:

  • this doesn't cover his rule 1.a, where the source is a directory but the target is a pre-existing file
  • we could probably get away with saying that "cp -r" on a file source is an error, if it made things cleaner (in J3/J4)

Feel free to edit the table in this comment if I interpreted your proposal incorrectly.

Last edited at 2015-01-29T12:34:44Z by zooko (previous) (diff)

comment:26 follow-up: Changed at 2015-01-27T18:35:57Z by daira

warner: are you sure the table is right for cp -R DIRCAP/file local/missing, columns 1.10 and trunk? I would have expected the resulting file to be at local/missing.

Last edited at 2015-02-09T01:47:46Z by daira (previous) (diff)

comment:27 Changed at 2015-01-29T12:12:52Z by zooko

My original comment:20 was badly misformatted which hid some of the structure of my proposed rules. Here it is again, unchanged except for proper formatting.

Here's my attempt to fill out the table from comment:17 for a certain rule that I have in my mind right now (written below).

  • "Rule comment:20"
    • (F1) cp -r PARENTCAP/dir local/ -> local/dir/file.txt
    • (F2) cp -r PARENTCAP/dir local/missing -> local/missing/dir/file.txt
    • (G1) cp -r DIRCAP local/ -> local/file.txt
    • (G2) cp -r DIRCAP local/missing -> local/missing/file.txt
    • (H1) cp -r DIRCAP_ALIAS: local/ -> local/file.txt
    • (H2) cp -r DIRCAP_ALIAS: local/missing -> local/missing/file.txt
    • (I1) cp -r DIRCAP/file.txt local/ -> local/file.txt
    • (I2) cp -r DIRCAP/file.txt local/missing -> local/missing
    • (J1) cp -r FILECAP local/ -> "error, you must specify a destination filename"
    • (J2) cp -r FILECAP local/missing -> local/missing

"Rule comment:20" is:

.1. If the source is a directory:

.a. If the target is the name of a locally existing file, then "error: there is already a local file present under the name $TARGET".

.b. If the target is the name of something not locally existing, then mkdir it and then use it as "target directory".

.c. If the target is the name of a locally existing directory, then proceed to use it as "target directory".

.d. Check whether the source directory has a name (as in F1, F2) or has no name (as in G1, G2, H1, H2). If it has a name then we say that the source means the bag itself — the directory, and if it has no name then we say that the source means the contents of the bag — the contents of the directory.

.i. Now if the source is the bag itself, then mkdir a new directory inside "target directory", named by the name of the source directory, and copy the contents of the bag into "target directory"/"source directory name"/ (which are cases F1 and F2).

.ii. Else (the source was the contents of the bag instead of the bag itself) copy the contents of the bag into "target directory"/ (which are cases G1, G2, H1, and H2).

.2. If the source is a file, then check if the target is an existing directory.

.a. If the source is a nameless file (as in J1, J2) *and* target is an existing directory (as in J1), then "error, you must specify a destination filename".

.b. Else, if the source is a nameless file and the target is not an existing directory, then use the target as the local filename (which is case J2).

.c. Else, if the source is a named file and the target is an existing directory, then use the source filename within the target existing directory (which is case I1).

.d. Else, if the source is a named file and the target is not an existing directory, then use the target as the local filename (which is case I2).

comment:28 in reply to: ↑ 25 Changed at 2015-01-29T12:35:53Z by zooko

Replying to warner:

  • we could probably get away with saying that "cp -r" on a file source is an error, if it made things cleaner (in J3/J4)

That sounds okay to me.

Feel free to edit the table in this comment if I interpreted your proposal incorrectly.

I think the table is an accurate reflection of the comment:20 / comment:27 proposal.

comment:29 Changed at 2015-01-29T19:43:07Z by daira

Note that this could potentially interact with #2027.

comment:30 in reply to: ↑ 26 Changed at 2015-02-03T19:49:00Z by daira

Replying to daira:

warner: are you sure the table (in comment:24) is right for cp -R DIRCAP/file local/missing, columns 1.10 and trunk? I would have expected the resulting file to be at local/missing.

Ping in case you missed this question.

Last edited at 2015-02-09T01:47:21Z by daira (previous) (diff)

comment:31 Changed at 2015-02-10T17:22:16Z by warner

D2 and I2, right? Yeah, those give local/missing/file.txt in both cases.

comment:32 Changed at 2015-02-10T17:50:11Z by warner

Ok, I agree that that's a bit weird. /bin/cp doesn't do that: /bin/cp parent/dir/file.txt local/missing creates local/missing, and /bin/cp -r does the same thing.

I guess I need to add table entries for what unix does. If possible/applicable, we should match unix behavior.

comment:33 Changed at 2015-02-10T18:25:46Z by warner

I will try to implement zooko's algorithm from comment:27 , and enhance the table to mention what happens with /bin/cp in both no-flag, -r, and -R cases.

comment:34 Changed at 2015-02-11T16:06:13Z by daira

We discussed this during Nuts and Bolts, and I was persuaded that we should Zooko's comment:27 algorithm despite the difference from /bin/cp (as tested on OS X). The clinching arguments were:

  • /bin/cp doesn't behave consistently across platforms anyway. Its different treatment of a trailing slash on some platforms is confusing and I don't think we should emulate that.
  • Zooko's proposed algorithm has a symmetry between the target-present and target-not-present cases that I hadn't previously noticed, and that may help to prevent race conditions.

comment:35 Changed at 2015-02-18T08:36:03Z by warner

Updated table for cp [-r] X local/, where local already exists:

command FILECAP (J1) DIRCAP/file (I1) PARENTCAP/dir (F1) DIRCAP (G1) DIRCAP_ALIAS: (H1)
/bin/cp - local/file ERROR-4b - -
/bin/cp -r - local/file local/dir/file - -
/bin/cp -R - local/file local/dir/file - -
1.10 ERROR-2 local/file local/file ERROR-4 ERROR-4
1.10 -r ERROR-2 local/file local/file local/file local/file
trunk ERROR-2 local/file local/file ERROR-4 ERROR-4
trunk -r ERROR-2 local/file local/dir/file EXCEPTION-1 EXCEPTION-1
comment:27 ERROR-2 local/file ERROR-4 ERROR-4 ERROR-4
comment:27 -r ERROR-2 local/file local/dir/file local/file local/file

And for cp [-r] X local/missing, where local exists but missing does not:

command FILECAP (J2) DIRCAP/file (I2) PARENTCAP/dir (F2) DIRCAP (G2) DIRCAP_ALIAS: (H2)
/bin/cp - local/missing ERROR-4b - -
/bin/cp -r - local/missing local/missing/file - -
/bin/cp -R - local/missing local/missing/file - -
1.10 local/missing local/missing ERROR-4 ERROR-4 ERROR-4
1.10 -r EXCEPTION-3 local/missing/file local/missing/file local/missing/file local/missing/file
trunk local/missing local/missing ERROR-4 ERROR-4 ERROR-4
trunk -r EXCEPTION-3 local/missing/file local/missing/dir/file EXCEPTION-1 EXCEPTION-1
comment:27 local/missing local/missing ERROR-4 ERROR-4 ERROR-4
comment:27 -r local/missing local/missing local/missing/dir/file local/missing/file local/missing/file
  • EXCEPTION-1: build_graphs(), NoneType has no attribute startswith
  • ERROR-2: you must specify a destination filename
  • EXCEPTION-3: put_file() line 156, name is None but precondition requires isinstance(unicode)
  • ERROR-4: cannot copy directories without --recursive
  • ERROR-4b: X is a directory (not copied)
Last edited at 2015-02-20T00:04:52Z by warner (previous) (diff)

comment:36 Changed at 2015-02-19T17:02:37Z by daira

I updated comment:35 with the proposed behaviour from comment:27. There is one case without -r that I didn't know how to fill in: tahoe cp PARENTCAP/dir local/. (Either local/file or ERROR-4[b] are reasonable possibilities.)

Last edited at 2015-02-19T17:28:29Z by daira (previous) (diff)

comment:37 Changed at 2015-02-19T17:54:01Z by daira

In TC&C, we agreed that tahoe cp PARENTCAP/dir local/ should also give ERROR-4.

comment:38 follow-up: Changed at 2015-02-20T00:11:25Z by warner

I've updated comment:35 to reflect that (the rule is that you must give -r to copy a directory, so a directory-like source without -r gives an error). I think our table is now up to date.

Looking at the table, with an eye towards writing docs that explain what has changed, the new "cp -r PARENTCAP/dir local/missing" case (F2, bottom row) stands out. It's consistent with current trunk, but not with 1.10, or the proposed behavior for other directory-like sources, or with bin/cp. It'd be easier to explain if it were local/missing/file instead. I'm sure we've discussed this to death.. I've been too busy figuring out how to format that table. I'll reread the ticket and re-understand the rationale for that one.

Next steps:

  • try to make the code implement the table
  • put an edited form of the table into NEWS, to explain the user-visible change
  • put a smaller form (just bin/cp and current behavior) into cli.rst
  • put an even smaller form (just current behavior, maybe as prose) into the tahoe-cp --help docstring

comment:39 in reply to: ↑ 38 Changed at 2015-02-20T02:52:17Z by zooko

Replying to warner:

Looking at the table, with an eye towards writing docs that explain what has changed, the new "cp -r PARENTCAP/dir local/missing" case (F2, bottom row) stands out. It's consistent with current trunk, but not with 1.10, or the proposed behavior for other directory-like sources, or with bin/cp. It'd be easier to explain if it were local/missing/file instead. I'm sure we've discussed this to death..

Let's see…

That is intentional in the comment:27 algorithm. The reasoning is twofold:

First of all, the question of whether you want the bag vs. the contents of the bag is determined by whether the directory-like source has a name (the F column) vs. doesn't have a name (the G and H columns). That's a nice simple rule, and according to that, the result in the "comment:27 -r" row has to be local/missing/dir/file instead of local/missing/file, because the latter would be just the contents of the bag (file) instead of the bag (dir/file).

So looking at the table, the cells of column F need to result in dir/file (if they aren't instead an error) and the cells of columns G and H need to result in file (if they aren't instead an error).

Second, the comment:27 algorithm behaves the same way whether the target exists or doesn't exist at the beginning of the algorithm. This is (in my intuition) a nice simple rule, and it also means there isn't a race condition in which the behavior is unpredictable because the existence of the target is unpredictable.

So looking at the table, that means the result of F1 and F2 both need to result in the same behavior as each other.

I'm not saying this rationale is better than other rationales that would justify other designs (such as “This is as much like /bin/cp as we could make it.”, or “The behavior is the same for all directory-like sources.”), but that's the rationale for this design.

  • try to make the code implement the table

Yay!

  • put an edited form of the table into NEWS, to explain the user-visible change

Yay!

  • put a smaller form (just bin/cp and current behavior) into cli.rst

Yay!

  • put an even smaller form (just current behavior, maybe as prose) into the tahoe-cp --help docstring

Yay!

Thank you for your good work on this.

Last edited at 2015-02-20T03:05:09Z by zooko (previous) (diff)

comment:40 Changed at 2015-02-23T19:03:01Z by warner

I've got a branch which adds a test that exercises the full table. While studying how to change the code to let that test pass, I came across another wrinkle: cp accepts multiple source arguments.

In general, if you have multiple source arguments, then the target must be a directory. If the target is a directory, then you can't use unnamed files as sources (one or multiple). The only case that accepts an unnamed file as a source is when you're copying exactly one of them to a target that is (or will be) a file.

Here's a list of what I think should happen (I tried to compress some of the cases.. let me know if it doesn't seem to cover everything):

  • sources are NAMEDDIR, UNNAMEDDIR, NAMEDFILE, UNNAMEDFILE
  • targets are DIR, FILE, or MISSING
  • single-source cases:
    • cp FILE TO-FILE: replace the contents
    • cp FILE TO-MISSING: create the target file
    • cp NAMEDFILE TO-DIR: create/replace TO-DIR/filename
    • cp UNNAMEDFILE TO-DIR: error: need a name
    • (cp -r FILE X: behave same as without -r)
    • cp DIR X: error: must use -r if any source is a directory
    • cp -r NAMEDDIR TO-DIR: create TO-DIR/NAME/ and fill with contents
    • cp -r NAMEDDIR TO-MISSING: same: TO-MISSING/NAME/ filled with contents
    • cp -r UNNAMEDDIR TO-DIR: copy source/* into TO/*
    • cp -r UNNAMEDDIR TO-MISSING: same: mkdir TO-MISSING, fill with contents
  • multiple-source cases:
    • cp X.. TO-FILE: error: many-to-one requires target is a directory
    • cp NAMEDFILES.. TO-DIR: create/replace TO-DIR/filenames
    • cp NAMEDFILES.. TO-MISSING: mkdir, then treat like TO-DIR
    • cp UNNAMEDFILES.. X: error: need a name (1 source is ok, but not >1)
    • cp FILESDIRS.. X: error: must use -r if any source is a directory
    • cp -r X.. TO-MISSING: mkdir target, then treat as TO-DIR
    • cp -r X.. TO-DIR:
      • if X is UNNAMEDFILE: error, need a name
      • if X is NAMEDFILE: create/replace TO-DIR/name
      • if X is UNNAMEDDIR: copy source/* into TO/*, like with single-source
      • if X is NAMEDDIR: copy source/* into TO/name/*

Next step is to figure out how to turn this into a flowchart for tahoe_cp.Copier.try_copy.. I've started on the internal refactorings to make this easier (I was wrong before when I thought the basename should be tracked from outside of TahoeFileSource/etc.. treating it as a possibly-empty property of the source instance is totally the right way to do it).

comment:41 Changed at 2015-02-24T01:15:06Z by daira

You're missing a case:

  • cp NAMEDANDUNNAMEDFILES.. X: error: need a name (if there is >1 source of which >=1 is unnamed)

comment:42 Changed at 2015-02-24T01:21:39Z by daira

Similarly for cp -R X.. TO-DIR:

  • if X has >=1 UNNAMEDFILE: error, need a name
  • otherwise for each X,
    • if X is NAMEDFILE: create/replace TO-DIR/name
    • if X is UNNAMEDDIR: copy source/* into TO-DIR/*, like with single-source
    • if X is NAMEDDIR: copy source/* into TO-DIR/name/*

comment:43 Changed at 2015-02-24T10:35:53Z by warner

Ok, updates:

  • sources are NAMEDDIR, UNNAMEDDIR, NAMEDFILE, UNNAMEDFILE
  • targets are DIR, FILE, or MISSING
  • single-source cases:
    • cp FILE TO-FILE: replace the contents
    • cp FILE TO-MISSING: create the target file
    • cp NAMEDFILE TO-DIR: create/replace TO-DIR/filename
    • cp UNNAMEDFILE TO-DIR: error: need a name
    • (cp -r FILE X: behave same as without -r)
    • cp DIR X: error: must use -r if any source is a directory
    • cp -r DIR TO-FILE: error, directories must be copied into other directories
    • cp -r NAMEDDIR TO-DIR: create TO-DIR/NAME/ and fill with contents
    • cp -r NAMEDDIR TO-MISSING: same: TO-MISSING/NAME/ filled with contents
    • cp -r UNNAMEDDIR TO-DIR: copy source/* into TO/*
    • cp -r UNNAMEDDIR TO-MISSING: same: mkdir TO-MISSING, fill with contents
  • multiple-source cases:
    • cp X.. TO-FILE: error: many-to-one requires target is a directory
    • cp NAMEDFILES.. TO-DIR: create/replace TO-DIR/filenames
    • cp NAMEDFILES.. TO-MISSING: mkdir, then treat like TO-DIR
    • cp SOMEUNNAMEDFILES.. X: error: need a name
      • (cp UNNAMEDFILE X with 1 source is ok, but not if there are multiple sources)
    • cp FILESDIRS.. X: error: must use -r if any source is a directory
    • cp -r X.. TO-MISSING: mkdir target, then treat as TO-DIR
    • cp -r X.. TO-DIR: for each X:
      • if X is UNNAMEDFILE: error, need a name, whole command fails
      • if X is NAMEDFILE: create/replace TO-DIR/name
      • if X is UNNAMEDDIR: copy source/* into TO/*, like with single-source
      • if X is NAMEDDIR: copy source/* into TO/name/*

comment:44 Changed at 2015-02-24T11:23:10Z by warner

https://github.com/warner/tahoe-lafs/tree/2329 has a branch that adds the tests (and some useful refactoring). It does not yet make any behavior changes. Take a look at the new test_cli_cp.CopyOut and see if it covers all those cases.

comment:45 Changed at 2015-03-03T10:30:48Z by warner

Ok, the patch is ready for review: https://github.com/tahoe-lafs/tahoe-lafs/pull/143 . I rewrote the target-assignment code, cleaning up an awful lot in the process. I tried to make the diff as minimal as possible, but it may still look a bit ugly. The core 100-line cluster of functions was replaced by a different 100-line cluster of functions, but there's enough overlap that 'git diff' tries too hard to show you line-by-line changes, and does it badly. My best advice for reviewing it is to print out those 100 lines before, and those 100 lines after, and compare those two printouts, instead of looking at a detailed diff.

comment:46 Changed at 2015-03-03T20:29:28Z by warner

Two unexpected cases discovered during today's review:

1: cp FILE1 TARGETFILE/

2: cp -r DIR1 DIR2 TARGETDIR

We decided the first should signal an error: the trailing slash will only be accepted if the target is directory-like (either a pre-existing directory, or something missing that will be created as a directory, either because the source is a directory, or because there are multiple sources). If the target is directory-like, the trailing slash will not affect the copying behavior.

For the second, the realization was that this causes the contents of DIR1 and DIR2 to be merged together in the TARGETDIR, and thus the order of the source arguments matters (presumeably DIR2's contents will overwrite those of DIR1). The same thing happens if you copy multiple files of the same name into a directory (cp foo/file.txt bar/file.txt outdir).

We need to ensure the implementation preserves the order of the source arguments (I think my current patch does, but if I'd used a set instead of a dict, it would behave differently). We should also double-check that /bin/cp does the same thing.

Last edited at 2015-03-03T20:33:50Z by warner (previous) (diff)

comment:47 Changed at 2015-03-03T20:41:57Z by warner

I guess there are a couple of different forms we should examine.

  • cp -r foo/dir bar/dir TARGETDIR
  • cp -r DIRCAP1 DIRCAP2 TARGETDIR

The first is using two named directories that both happen to have the same name. This will cause TARGETDIR/dir to be created, and the contents of the sources merged into it.

The second is using unnamed directories, triggering our "unnamed directories refer to the contents, not the bag itself" rule. Their contents will be merged into TARGETDIR.

/bin/cp doesn't have unnamed directories, so the only way to simulate the contents-not-bag rule is to use /bin/cp -r DIR1/* DIR2/* TARGETDIR, in which case any name collisions are obvious on the post-glob-expansion argv that's passed from the shell to cp. I think (but I'm not sure) that the only way to wind up with collisions in /bin/cp is if the basename of two SOURCE arguments are the same, since /bin/cp always uses the basename of the source to create a child of the target.

(in other words, /bin/cp may not actually give us precedent to follow)

comment:48 Changed at 2015-03-03T20:54:31Z by warner

I think I was able to compress the rules above into the following set of assertions. These are the ones implemented by the code:

  • if any source is a directory, must use -r
  • if target is missing:
    • if source is a single file, target will be a file
    • else target will be a directory, so mkdir it
  • if there are multiple sources, target must be a dir
  • if target is a file, source must be a single file
  • if target is directory, sources must be named or a dir

Copying files into files is easy. Copying things into directories requires looking at the type of each source object:

  • target is a directory, so each source must be one of:
    • a named file (copied to a new file under the target)
    • a named directory (causes a new directory of the same name to be created under the target, then the contents of the source are copied into that directory)
    • an unnamed directory (the contents of the source are copied into the target, without a new directory being made)
  • If any source is an unnamed file, throw an error, since we have no way to name the output file.

comment:49 Changed at 2015-03-04T02:38:42Z by warner

New pull request is up: https://github.com/tahoe-lafs/tahoe-lafs/pull/144

This incorporates all the review feedback so far. It also rejects trailing slashes on file-like targets, ensures in-order copies of colliding sources (last source wins), fixes a bug I uncovered that preventing colliding sources from working anyways, and adds a bunch more test cases.

comment:50 Changed at 2015-03-05T00:23:25Z by daira

  • Owner changed from warner to daira
  • Status changed from assigned to new

Reviewing.

comment:51 Changed at 2015-03-05T00:23:33Z by daira

  • Status changed from new to assigned

comment:52 Changed at 2015-03-17T19:13:02Z by warner

  • Keywords review-needed removed
  • Owner changed from daira to warner
  • Status changed from assigned to new

Ok, that PR (plus review feedback) has been landed, in e60392a. Remaining work: docs updates, and some notes for future cleanups.

comment:53 Changed at 2015-03-18T07:12:02Z by warner

Some notes from our review discussion, which didn't suggest changes to the current behavior, but which should be recorded for future analysis:

  • cp SOURCE1 SOURCE2 missing is not so clear-cut. Current behavior is to mkdir, but maybe it should throw an error instead.
  • cp SOURCE1 SOURCE2 missing/ is obviously referring to a directory, so mkdir is more correct. (current behavior is to mkdir). (/bin/cp emits an error).
  • cp -r SOURCE1 SOURCEDIR2 missing is not clear-cut.

Cleanups to do:

  • add a test that uses a source directory which contains multiple files. the current tests use multiple source directories (with one file each)
  • find a way to reduce the with/without-trailing-slash redundancy in the test table

comment:54 Changed at 2015-05-02T00:21:49Z by warner

  • Keywords review-needed added

comment:55 Changed at 2015-05-04T04:50:31Z by Brian Warner <warner@…>

In 97fd19407d610de298c45500f7e1ad3e62b8a263/trunk:

Improve docs on 'cp -r', noting the recent 2329 changes

refs ticket:2329

comment:56 Changed at 2015-05-04T05:06:06Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

ok, I think that's a wrap.

comment:57 Changed at 2015-05-04T05:15:24Z by Brian Warner <warner@…>

In ca23c4fa23a77b1fa557c734a2ad1f2abe4e7688/trunk:

tahoe cp: ignore trailing slash on source arguments

This avoids an error case where an empty child name resulted in a
duplicate mkdir. It adds a precondition check to guard against empty
child names, and some test cases. It also cleans up a funny redundancy
noticed earlier (refs ticket:2329).

Note: See TracTickets for help on using tickets.