Opened 16 years ago
Closed 16 years ago
#1 closed defect (fixed)
non-ASCII characters in darcs output cause a crash
Reported by: | warner | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | component1 | Version: | |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
'ndurner' noticed darcsver crashing, due to a non-ascii character in the
output of 'darcs changes --xml-format'. It looks like the german windows
machine emitted a 'local_date' attribute with a long timezone name, something
like "Westeuropaische Normalzeit", except using an a-with-umlaut in the first
word. It looks like the name was encoded with Latin-1.
darcsver has a hack to discard funny-looking characters before it passes the
string to the XML parser, because apparently it's awfully hard to get darcs
to declare a character encoding for its XML output, or for darcs to stick to
that encoding (the local_date string is probably coming from some windows
time/date library, and who knows how to control the encoding *that* uses).
But the hack doesn't discard enough.
My suggestion is to discard everything that isn't ASCII:
allbadchars = "".join([chr(i) for i in range(0x20) + range(0x7f,0x100)]) tt = string.maketrans(allbadchars, "?"*len(allbadchars))
(really, we could probably discard everything that isn't an angle bracket or
the word "patch", since all darcsver really cares about is how many
<patch??? tokens appear in the file)
Change History (4)
comment:1 Changed 16 years ago by warner
comment:2 Changed 16 years ago by ndurner
Yes, it works.
comment:3 Changed 16 years ago by zooko
Fixed by [20090211201316-92b7f-e014f2023111b36590bd5ac8336aebe4ad8f491c]. Thanks folks! See also #2, which will make parsing XML output unnecessary.
comment:4 Changed 16 years ago by zooko
- Resolution set to fixed
- Status changed from new to closed
oops, it might be nice to preserve newlines. how about
also, since "?" in an XML file is special (as in <?xml>), how about translating those characters to something else, like "-"
I think ndurner reports that this seemed to work.