[tahoe-dev] String encoding in tahoe
Francois Deppierraz
francois at ctrlaltdel.ch
Mon Dec 22 08:03:34 PST 2008
Hi Zooko,
I finally managed to find enough time today to investigate this issue
further on. Basically test_unicode_filename raise the issue of strings
which are not being converted as expected.
As Brian pointed out in [1], the current codebase is calling
simplejson.dumps with bytestrings coming from the command line. This
might sometimes work but is definitely not recommended. The same kind of
issues appears with UTF-8 filenames with the FTP or SFTP server.
We usually have UTF-8 bytestrings as input (sys.argv, filenames,
aliases, etc.) and need UTF-8 bytestrings as output (urls, filenames,
etc.). However, it is usually simpler and safer to use unicode strings
internally.
Kumar McMillan gives the following advise in his talk [2].
1. Decode early
2. Unicode everywhere
3. Encode late
and to create wrappers for libraries which not unicode compliant (urllib
for example).
Does it sound coherent in the context of tahoe ? If so, the question is
where are the best places to handle theses conversions ?
Should we (1) automatically convert sys.argv[] from bytestring to
unicode in runner.runner(), or (2) do it selectively for each command
(put, cp, etc.).
I gave a try to (1), see patch [3], which indeed fixed the test failure
on slave3 (dapper box). However, it broke many tests at the same time,
mostly assertions in util/base32.py which seems to require bytestrings
instead of unicode strings.
François
[1] http://allmydata.org/trac/tahoe/ticket/534#comment:31
[2] http://farmdev.com/talks/unicode/
[3]
--- old-tahoe/src/allmydata/scripts/runner.py 2008-12-22
07:33:51.000000000 -0800
+++ new-tahoe/src/allmydata/scripts/runner.py 2008-12-22
07:33:52.000000000 -0800
@@ -33,6 +33,12 @@
stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr,
install_node_control=True, additional_commands=None):
+ # Convert arguments to unicode
+ new_argv = []
+ for arg in argv:
+ new_argv.append(arg.decode('utf-8'))
+ argv = new_argv
+
config = Options()
if install_node_control:
config.subCommands.extend(startstop_node.subCommands)
More information about the tahoe-dev
mailing list