[tahoe-dev] [tahoe-lafs] #534: "tahoe cp" command encoding issue
tahoe-lafs
trac at allmydata.org
Wed Apr 8 16:26:31 PDT 2009
#534: "tahoe cp" command encoding issue
-----------------------------------+----------------------------------------
Reporter: francois | Owner: francois
Type: defect | Status: assigned
Priority: minor | Milestone: 1.5.0
Component: code-frontend-cli | Version: 1.2.0
Resolution: | Keywords: cp encoding unicode filename utf-8
Launchpad_bug: |
-----------------------------------+----------------------------------------
Comment(by zooko):
I'm reviewing your most recent patch, François.
I'll be posting my observations in separate comments as I understand more
of the patch.
Here's the first observation:
The patch seems to assume that the terminal handles either {{{ascii}}} or
{{{utf-8}}} on stdout, but what about terminals that handle a different
encoding, such as Windows {{{cmd.exe}}} (which presumably handles whatever
the current Windows codepage is, or else {{{utf-16le}}})? Apparently
{{{sys.stdout.encoding}}} will tell us what python thinks it should use if
you pass a unicode string to it with {{{print myunicstr}}} or
{{{sys.stdout.write(myunicstr)}}}.
In any case the documentation should explain this -- that what you see
when you run {{{tahoe ls}}} will depend on the configuration of your
terminal. Hm, this also suggests that it isn't correct for tahoe to have
a {{{unicode_to_stdout()}}} function and instead we should just rely on
the python {{{sys.stdout}}} encoding behavior. What do you think?
I guess one place where I would be willing to second-guess python on this
is, if the {{{sys.stdout.encoding}}} says the encoding is {{{ascii}}} or
says that it doesn't know what the encoding is, then pre-encode your
unicode strings with {{{utf-8}}} (or, if on Windows, with {{{utf-16le}}}),
before printing them or {{{sys.stdout.write()}}}'ing them. This is
because of the following set of reasons:
1. A misconfigured environment will result in python defaulting to
{{{ascii}}} when {{{utf-8}}} will actually work better (I just now
discovered that my own Mac laptop on which I am writing this was so
misconfigured, and when I tried to fix it I then misconfigured it in a
different way that had the same result! The first was: {{{LANG}}} and
{{{LC_ALL}}} were being cleared out in my {{{.bash_profile}}}, the second
was: I set {{{LANG}}} and {{{LC_ALL}}} to {{{en_DK.UTF-8}}}, but this
version of Mac doesn't support that locale, so I had to change it to
{{{en_US.UTF-8}}}.)
2. Terminals that actually can't handle {{{utf-8}}} and can only handle
{{{ascii}}} are increasingly rare.
3. If there __is__ something that can handle only {{{ascii}}} and you
give it {{{utf-8}}}, you'll be emitting garbage instead of raising an
exception, which might be better in some cases. On the other hand I
suppose it could be worse in others. (Especially when it happens to
produce control characters and screws up your terminal emulator...)
I'm not entirely sure that this second-guessing of python is really going
to yield better results more often than it yields worse results, and it is
certainly more code, so I would also be happy with just emitting unicode
objects to stdout and letting python and the local system config do the
work from there.
small details and English spelling and editing:
s/Tahoe v1.3.1/Tahoe v1.5.0/
s/aliase/alias/
s/commande/command/
s/moderns/modern/
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/534#comment:51>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list