source: trunk/docs/specifications/url.rst

Last change on this file was 4d34c775, checked in by meejah <meejah@…>, at 2024-08-08T22:40:52Z

someone missed moving this link

  • Property mode set to 100644
File size: 8.1 KB
Line 
1URLs
2====
3
4The goal of this document is to completely specify the construction and use of the URLs by Tahoe-LAFS for service location.
5This includes, but is not limited to, the original Foolscap-based URLs.
6These are not to be confused with the URI-like capabilities Tahoe-LAFS uses to refer to stored data.
7An attempt is also made to outline the rationale for certain choices about these URLs.
8The intended audience for this document is Tahoe-LAFS maintainers and other developers interested in interoperating with Tahoe-LAFS or these URLs.
9
10.. _furls:
11
12Background
13----------
14
15Tahoe-LAFS first used Foolscap_ for network communication.
16Foolscap connection setup takes as an input a Foolscap URL or a *fURL*.
17A fURL includes three components:
18
19* the base32-encoded SHA1 hash of the DER form of an x509v3 certificate
20* zero or more network addresses [1]_
21* an object identifier
22
23A Foolscap client tries to connect to each network address in turn.
24If a connection is established then TLS is negotiated.
25The server is authenticated by matching its certificate against the hash in the fURL.
26A matching certificate serves as proof that the handshaking peer is the correct server.
27This serves as the process by which the client authenticates the server.
28
29The client can then exercise further Foolscap functionality using the fURL's object identifier.
30If the object identifier is an unguessable, secret string then it serves as a capability.
31This unguessable identifier is sometimes called a `swiss number`_ (or swissnum).
32The client's use of the swissnum is what allows the server to authorize the client.
33
34.. _`swiss number`: http://wiki.erights.org/wiki/Swiss_number
35
36.. _NURLs:
37
38NURLs
39-----
40
41The authentication and authorization properties of fURLs are a good fit for Tahoe-LAFS' requirements.
42These are not inherently tied to the Foolscap protocol itself.
43In particular they are beneficial to :doc:`http-storage-node-protocol` which uses HTTP instead of Foolscap.
44It is conceivable they will also be used with WebSockets at some point as well.
45
46Continuing to refer to these URLs as fURLs when they are being used for other protocols may cause confusion.
47Therefore,
48this document coins the name **NURL** for these URLs.
49This can be considered to expand to "**N**\ ew URLs" or "Authe\ **N**\ ticating URLs" or "Authorizi\ **N**\ g URLs" as the reader prefers.
50
51The anticipated use for a **NURL** will still be to establish a TLS connection to a peer.
52The protocol run over that TLS connection could be Foolscap though it is more likely to be an HTTP-based protocol (such as GBS).
53
54Unlike fURLs, only a single net-loc is included, for consistency with other forms of URLs.
55As a result, multiple NURLs may be available for a single server.
56
57Syntax
58------
59
60The EBNF for a NURL is as follows::
61
62  nurl         = tcp-nurl | tor-nurl | i2p-nurl
63  tcp-nurl     = "pb://", hash, "@", tcp-loc, "/", swiss-number, [ version1 ]
64  tor-nurl     = "pb+tor://", hash, "@", tcp-loc, "/", swiss-number, [ version1 ]
65  i2p-nurl     = "pb+i2p://", hash, "@", i2p-loc, "/", swiss-number, [ version1 ]
66
67  hash         = unreserved
68
69  tcp-loc      = hostname, [ ":" port ]
70  hostname     = domain | IPv4address | IPv6address
71
72  i2p-loc      = i2p-addr, [ ":" port ]
73  i2p-addr     = { unreserved }, ".i2p"
74
75  swiss-number = segment
76
77  version1     = "#v=1"
78
79See https://tools.ietf.org/html/rfc3986#section-3.3 for the definition of ``segment``.
80See https://tools.ietf.org/html/rfc2396#appendix-A for the definition of ``unreserved``.
81See https://tools.ietf.org/html/draft-main-ipaddr-text-rep-02#section-3.1 for the definition of ``IPv4address``.
82See https://tools.ietf.org/html/draft-main-ipaddr-text-rep-02#section-3.2 for the definition of ``IPv6address``.
83See https://tools.ietf.org/html/rfc1035#section-2.3.1 for the definition of ``domain``.
84
85Versions
86--------
87
88Though all NURLs are syntactically compatible some semantic differences are allowed.
89These differences are separated into distinct versions.
90
91Version 0
92---------
93
94In theory, a Foolscap fURL with a single netloc is considered the canonical definition of a version 0 NURL.
95Notably,
96the hash component is defined as the base32-encoded SHA1 hash of the DER form of an x509v3 certificate.
97A version 0 NURL is identified by the absence of the ``v=1`` fragment.
98
99In practice, real world fURLs may have more than one netloc, so lack of version fragment will likely just involve dispatching the fURL to a different parser.
100
101Examples
102~~~~~~~~
103
104* ``pb://sisi4zenj7cxncgvdog7szg3yxbrnamy@tcp:127.1:34399/xphmwz6lx24rh2nxlinni``
105* ``pb://2uxmzoqqimpdwowxr24q6w5ekmxcymby@localhost:47877/riqhpojvzwxujhna5szkn``
106
107Version 1
108---------
109
110The hash component of a version 1 NURL differs in three ways from the prior version.
111
1121. The hash function used is SHA-256, to match RFC 7469.
113   The security of SHA1 `continues to be eroded`_; Latacora `SHA-2`_.
1142. The hash is computed over the certificate's SPKI instead of the whole certificate.
115   This allows certificate re-generation so long as the public key remains the same.
116   This is useful to allow contact information to be updated or extension of validity period.
117   Use of an SPKI hash has also been `explored by the web community`_ during its flirtation with using it for HTTPS certificate pinning
118   (though this is now largely abandoned).
119
120.. note::
121   *Only* the certificate's keypair is pinned by the SPKI hash.
122   The freedom to change every other part of the certificate is coupled with the fact that all other parts of the certificate contain arbitrary information set by the private key holder.
123   It is neither guaranteed nor expected that a certificate-issuing authority has validated this information.
124   Therefore,
125   *all* certificate fields should be considered within the context of the relationship identified by the SPKI hash.
126
1273. The hash is encoded using urlsafe-base64 (without padding) instead of base32.
128   This provides a more compact representation and minimizes the usability impacts of switching from a 160 bit hash to a 256 bit hash.
129
130A version 1 NURL is identified by the presence of the ``v=1`` fragment.
131Though the length of the hash string (38 bytes) could also be used to differentiate it from a version 0 NURL,
132there is no guarantee that this will be effective in differentiating it from future versions so this approach should not be used.
133
134It is possible for a client to unilaterally upgrade a version 0 NURL to a version 1 NURL.
135After establishing and authenticating a connection the client will have received a copy of the server's certificate.
136This is sufficient to compute the new hash and rewrite the NURL to upgrade it to version 1.
137This provides stronger authentication assurances for future uses but it is not required.
138
139Examples
140~~~~~~~~
141
142* ``pb://1WUX44xKjKdpGLohmFcBNuIRN-8rlv1Iij_7rQ@tcp:127.1:34399/jhjbc3bjbhk#v=1``
143* ``pb://azEu8vlRpnEeYm0DySQDeNY3Z2iJXHC_bsbaAw@localhost:47877/64i4aokv4ej#v=1``
144
145.. _`continues to be eroded`: https://en.wikipedia.org/wiki/SHA-1#Cryptanalysis_and_validation
146.. _`SHA-2`: https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html
147.. _`explored by the web community`: https://www.rfc-editor.org/rfc/rfc7469
148.. _Foolscap: https://github.com/warner/foolscap
149
150.. [1] ``foolscap.furl.decode_furl`` is taken as the canonical definition of the syntax of a fURL.
151       The **location hints** part of the fURL,
152       as it is referred to in Foolscap,
153       is matched by the regular expression fragment ``([^/]*)``.
154       Since this matches the empty string,
155       no network addresses are required to form a fURL.
156       The supporting code around the regular expression also takes extra steps to allow an empty string to match here.
157
158Open Questions
159--------------
160
1611. Should we make a hard recommendation that all certificate fields are ignored?
162   The system makes no guarantees about validation of these fields.
163   Is it just an unnecessary risk to let a user see them?
164
1652. Should the version specifier be a query-arg-alike or a fragment-alike?
166   The value is only necessary on the client side which makes it similar to an HTTP URL fragment.
167   The current Tahoe-LAFS configuration parsing code has special handling of the fragment character (``#``) which makes it unusable.
168   However,
169   the configuration parsing code is easily changed.
Note: See TracBrowser for help on using the repository browser.