[tahoe-dev] [tahoe-lafs] #683: So-called URIs aren't

tahoe-lafs trac at allmydata.org
Sun Apr 19 11:02:39 PDT 2009


#683: So-called URIs aren't
---------------------+------------------------------------------------------
 Reporter:  kpreid   |           Owner:  nobody   
     Type:  defect   |          Status:  new      
 Priority:  major    |       Milestone:  undecided
Component:  unknown  |         Version:  1.3.0    
 Keywords:           |   Launchpad_bug:           
---------------------+------------------------------------------------------
 Tahoe has things it calls URIs which identify files. For example:
 URI:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

 However, they are not URIs (which term is defined by RFC); in particular,
 URIs have the syntax <scheme>:<scheme-specific-part>, where the possible
 values for <scheme> are administered by the IETF:

 http://www.iana.org/assignments/uri-schemes.html

 Since Tahoe "URIs" do have the properties a URI should, I believe the
 appropriate fix for this is to register a {{{tahoe:}}} URI scheme. As far
 as I know, the "URI:" part of a Tahoe URI is always the same, so it
 conveys no information and can be replaced with this for only a two-
 character addition:
 tahoe:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

 --- The remainder of this text is not a matter of correctness but
 additional functionality ---

 Furthermore, so that these URIs are also URLs (readily usable for
 contacting the resource with no local context), I would recommend
 including in the the syntax of the scheme-specific-path a provision for an
 OPTIONAL location hint for the grid, i.e. some host that can be contacted
 by some protocol that can put the client in communication with appropriate
 storage servers. This is essentially the same provision as in CapTP URIs;
 borrowing their syntax, it would be like:

 tahoe://example.net:1234,192.168.33.91:1234/CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

 That is, {{{tahoe://}}} comma-separated list of hosts {{{/}}} current
 Tahoe-URI components.


 ----

 Besides correctness in terminology, another advantage of having registered
 Tahoe URI syntax is that Tahoe files can participate as first-class
 entities in URI-based systems, and vice-versa.

 For example, if Tahoe directories could store arbitrary URIs, of which
 Tahoe URIs were a special case, then they could include references not
 just to other things in the same Tahoe grid, but any other URLs-as-
 capabilities system, including other Tahoe grids or Waterken servers or
 ...whatever. You could use the {{{data:}}} URI scheme to store
 sufficiently small files directly in directories. (I vaguely recall that
 Tahoe might already have that capability.)

 If there is a registered Tahoe scheme, then systems which work exclusively
 with URLs, but are extensible to handle additional URL schemes, can be
 extended to support Tahoe, rather than necessarily going through a Tahoe
 web gateway, thus providing useful information (e.g. 'this is immutable'),
 perhaps more efficient downloading, etc.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/683>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list