Opened at 2021-04-06T14:15:56Z
Closed at 2021-04-28T19:24:21Z
#3672 closed defect (fixed)
UnicodeDecodeError in Eliot messages
Reported by: | itamarst | Owned by: | GitHub <noreply@…> |
---|---|---|---|
Priority: | normal | Milestone: | Support Python 3 |
Component: | unknown | Version: | n/a |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
Eliot in Tahoe-LAFS currently assumes bytes are always UTF-8 encoded. This is not always the case.
Running trial allmydata:
{"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.704726, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'message_type'\": u\"u'immutable:upload:get-share-placements'\", u\"u'timestamp'\": u'1617718191.704558', u\"'happiness_mappings'\": u\"{0: 'b3llgpwwqwozijzje7ydgossrdyqig5e'}\", u\"'happiness'\": u'1', u\"'existing_shares'\": u'{\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\": [0]}', u\"u'task_level'\": u'[2, 3, 2]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'total_shares'\": u'1', u\"'peers'\": u\"['6jdspiha6nw2az6fqglwfzbu2c2uvnfg', 'b3llgpwwqwozijzje7ydgossrdyqig5e']\", u\"'readonly_peers'\": u'[]'}", "message_type": "eliot:destination_failure", "task_level": [2, 3, 3]} {"timestamp": 1617718191.715011, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "message_type": "immutable:upload:get-shareholders:converged-happiness", "effective_happiness": 1, "task_level": [2, 3, 4]} {"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.715412, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'timestamp'\": u'1617718191.715314', u\"u'action_status'\": u\"u'succeeded'\", u\"'upload_trackers'\": u'[]', u\"u'task_level'\": u'[2, 3, 5]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'already_serverids'\": u'{0: set([\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\"])}', u\"u'action_type'\": u\"u'immutable:upload:locate-all-shareholders'\"}", "message_type": "eliot:destination_failure", "task_level": [2, 4]}
Change History (2)
comment:1 Changed at 2021-04-21T13:58:02Z by itamarst
comment:2 Changed at 2021-04-28T19:24:21Z by GitHub <noreply@…>
- Owner set to GitHub <noreply@…>
- Resolution set to fixed
- Status changed from new to closed
In 1c2ba6b/trunk:
Note: See
TracTickets for help on using
tickets.
More context:
For JSON serialization Eliot therefore followed Python's lead, where if bytes looked like a UTF-8-encoded unicode string, they were serialized as a JSON string.
With Python 3, bytestrings are no longer the default. Which means bytes are more likely to be ... bytes, and so on Python 3 Eliot decided not to handle bytes by default in log messages, since it's not clear what the correct thing to do is. How to handle them is left up to individual applications.
As a result, Tahoe-LAFS on Python 3 needs a policy decision on how to handle byte serialization. The initial policy decision was "handle bytes that look like UTF-8-encoded unicode strings".
However, it turns out Tahoe actually logs random byte strings, some of which are very much not UTF-8 decodable. This PR allows Tahoe to continue doing so by using hex quoting when necessary.