#1310 reopened defect

separate "gateway state directory" from "client state directory" — at Version 11

Reported by: zooko Owned by: warner
Priority: major Milestone: undecided
Component: code-frontend-cli Version: 1.8.1
Keywords: usability Cc: zooko
Launchpad Bug:

Description (last modified by zooko)

updating the Description for clarity:

This ticket is about the proposal to have separate directories for holding the state/configuration of the LAFS gateway from the state/configuration of the LAFS client. In the current (Tahoe-LAFS v1.10) code, both of those things are maintained in one shared directory, called the "node directory" or "base directory". The state therein is actually non-overlapping:

  • things a client (i.e. the "tahoe" command-line tool) uses out of the base directory:
    • the node.url file to find out how to connect to the gateway
    • the private/backupdb.sqlite file to find out what files have already been uploaded
  • things a gateway uses out of the base directory:
    • everything else that is stored in there except for the node.url, and private/backupdb.sqlite files

So the client never uses any of the files that are kept in that directory for the gateway's purposes, and the gateway never uses any of the files that are kept in that directory for the client's purposes.


I use multiple grids (pub grid, volunteergrid, and a private family grid), and I just now had a confusing error where I ran tahoe backup and it completed quickly but produced a backup directory full of links to files with 0 shares each.

What happened, of course, was that I had previously run tahoe backup --node-url=http://127.0.0.1:3458/ to backup these files to my family grid, and now I was running tahoe backup --node-url=http://127.0.0.1:3457/ to backup these files to the volunteergrid, but I was unwittingly using the same backupdb.sqlite.

I wonder if, when the --node-url option is present, then the CLI shouldn't look into ~/.tahoe at all. Most of the configuration and state in ~/.tahoe is specific to the gateway that the --node-url points to, and the CLI will ignore it anyway and instead whatever configuration is in the tahoe-base-dir that is used by the gateway will take effect.

The only exception that I can think of right away is the private/backupdb.sqlite. Is that the only thing that affects the CLI when --node-url is present? Maybe it should be kept in a different directory.

I think I'm a bit confused about this. I'm not sure what all it means that there exists a ~/.tahoe when I'm actually using a gateway which runs as a separate user process, is specified by the --node-url option, and it has its own ~/.tahoe in its own user account. As a work-around and a way to gain clarity, I'll probably start specifying --node-directory in addition to --node-url, but this really feels wrong as it isn't a node directory at all! It is a CLI directory. :-)

Change History (11)

comment:1 in reply to: ↑ description Changed at 2011-01-15T10:18:24Z by davidsarah

  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #977.

I think I'm a bit confused about this. I'm not sure what all it means that there exists a ~/.tahoe when I'm actually using a gateway which runs as a separate user process, is specified by the --node-url option, and it has its own ~/.tahoe in its own user account.

I also found that confusing.

comment:2 Changed at 2011-01-15T17:46:24Z by zooko

  • Resolution duplicate deleted
  • Status changed from closed to reopened
  • Summary changed from backupdb.sqlite (and all other state in ~/.tahoe?) should be scoped to the --node-url option if it is present to If --node-url is present then --node-directory is mostly but not entirely ignored.

I'm not sure that I agree that this is just a duplicate of #977 (backupdb should store which grid it is scoped to). What about the parts of ~/.tahoe which don't have to do with backup at all? It is still confusing that the command

tahoe put --node-url=http://127.0.0.1:3456 --node-directory=~/.tahoe-volunteergrid MYFILE

will not use any of your configuration from your ~/.tahoe-volunteergrid directory and indeed may not touch the grid you call "volunteergrid" at all.

comment:3 Changed at 2011-01-15T19:37:14Z by zooko

Hm, I'm not sure about this, but one potential solution to this issue would be for there to be a --cli-directory= option which points to a directory that contains nothing but an optional backupdb and a node URL. Hm, in fact that's all that the cli ever uses the so-called --node-directory for, right? It is really not true that you ever tell the CLI what the node directory is. You really tell it where to find the node URL and then it contacts a node with that URL and that node uses whatever node directory it is configured to use. So the option to the CLI named --node-directory is a misnomer.

comment:4 Changed at 2011-01-21T07:22:41Z by mlakewood

In addition to this i've experienced the following problem. If I create a client node in the default directory ie .tahoe through tahoe create-client command, and then modify the web.port = tcp:3456:interface=127.0.0.1 to be say web.port = tcp:3456:interface=10.0.10.89 when you try to run a command it will thow and exception about trying to connect to the client. This is because in node.url its still pointing at http://127.0.0.1:3456 . If you change this to point at 10.0.10.89:3456 then all is good. however if you stop and start the tahoe client then the node.url is reset to http://127.0.0.1:3456 . which is a bit of a usability problem.. node.url seems like a configuration file, but gets overridden with a potentially invalid default. It seems to me node.url shouldn't be overriden on tahoe start.

[edited to make all the examples use the same port number, which I believe is what mlakewood intended --Zooko]

Last edited at 2011-06-10T14:17:39Z by zooko (previous) (diff)

comment:5 Changed at 2011-01-21T07:56:20Z by zooko

Straw man proposal (I'm sleepy!):

Remove --node-directory from the CLI and document the fact that users can write tahoe --node-url=`cat ~/.tahoe/node.url` ls $DIRCAP if they wish. This syntax will make it clear to the user who inspects it that the tahoe command is not reading the contents of ~/.tahoe directory aside from the contents of that one file.

comment:6 follow-up: Changed at 2011-01-21T17:41:41Z by warner

CLI tools look in --node-directory for things that are generated by:

  • the tahoe client node, to help the CLI tools find the node
  • the CLI tools themselves, that they want to retrieve later

At present, the only from-node-to-CLI things are node.url, to find the webapi port. In the future, I'm considering adding a few more files, starting with a $NODEDIR/private/control.key which would help a tahoe webopen --control-panel derive a secret URL that points to node-control functions, outside the scope of any particular filecap. Another possibility is an accounting.secret, which would let the CLI tools create webapi URLs that include authority to use space on a grid (rather than the ambient authority that currently grants storage rights to anyone who can talk to the webapi port). Also, $NODEDIR/private/control.furl could be used by CLI commands that wanted foolscap-based access (which would be easier for some purposes than HTTP-based access).

The set of from-CLI-to-CLI things in there currently includes:

  • private/aliases: allows "tahoe ls home:" instead of "tahoe ls URI:DIR2:blahblahblahblahblah"
  • private/backupdb.sqlite: used by 'tahoe backup', potentially 'tahoe cp'

I named it --node-directory because you can set it equal to a tahoe client node's base directory (which is what I usually mean by "$NODEDIR") and then the CLI tool will correctly find everything it needs to work without any extra effort on your part (and the secret things that it writes will go into a well-named directory that's already chmod go-rwx). And, because --node-directory is consistent across all tahoe commands, including tahoe create-node and tahoe start. So you could make a one-line shell script which did tahoe --node-directory=~/.other $* named othertahoe and use that for everything and it would Just Work (well, if we fixed the option parser to look for --node-directory before the command instead of after).

I've gotta run, I'll come back to this ticket later to chime in about the proposal. At first glance, --cli-directory does sound more accurate, but I think the parallelism of --node-directory everywhere is valuable.

comment:7 in reply to: ↑ 6 Changed at 2011-03-11T21:25:17Z by zooko

I still have the feeling that tahoe-lafs gateways and tahoe-lafs clients are separate objects (for example, I prefer to run my gateways and my clients under separate user accounts on the same machine sometimes), and they should not share state in this way. I find it confusing.

Replying to warner:

I've gotta run, I'll come back to this ticket later to chime in about the proposal. At first glance, --cli-directory does sound more accurate, but I think the parallelism of --node-directory everywhere is valuable.

I have the feeling that this is really wrong -- that you shouldn't say --node-directory to a gateway to mean the directory where it stores its persistent state such as the node.pem and also say --node-directory to a client to mean the directory where it stores its persistent state such as the backupdb.sqlite. It is the "bad" kind of parallelism, when the underlying thing is different, so the interface to it ought to be different too. :-)

(Also, I think it would help us all think and communicate more precisely if we stopped using the ambiguous word "node"... Except in those cases where we actually mean a Tahoe-LAFS process which runs more than one of (server, gateway, introducer) or which runs an unspecified service.)

comment:8 Changed at 2011-03-11T21:39:38Z by gdt

I broadly concur with zooko in comment 7. Sort of related, as I've been setting up storage nodes in a private grid, I find myself wishing for the grid parameters to be in a separate file from the local parameters, so that I could copy the grid-params.conf file around and just drop it in, and not have that overwrite the node id.

Beyond this, I think that probably one should have a directory to represent client access, and that would then have the aliases file, the backupdb, and a pointer to the WAPI. The gateway node directory would then be separate. Once this is done, the CLI probably should lose the ability to specify the WAPI URL directly, and only point to client directories.

comment:9 Changed at 2011-09-26T20:22:47Z by zooko

  • Cc zooko added

I'm thinking about configuration right now, so I'm interested in this ticket. Commenting here to remind myself about it and to let everyone know that I might get around to working on it sometime.

comment:10 Changed at 2013-11-27T20:07:40Z by zooko

  • Description modified (diff)
  • Summary changed from If --node-url is present then --node-directory is mostly but not entirely ignored. to separate "gateway state directory" from "client state directory"

comment:11 Changed at 2013-11-27T20:17:51Z by zooko

  • Description modified (diff)
Note: See TracTickets for help on using tickets.