#3672 closed defect (fixed)

UnicodeDecodeError in Eliot messages

Reported by: itamarst Owned by: GitHub <noreply@…>
Priority: normal Milestone: Support Python 3
Component: unknown Version: n/a
Keywords: Cc:
Launchpad Bug:

Description

Eliot in Tahoe-LAFS currently assumes bytes are always UTF-8 encoded. This is not always the case.

Running trial allmydata:

{"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.704726, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'message_type'\": u\"u'immutable:upload:get-share-placements'\", u\"u'timestamp'\": u'1617718191.704558', u\"'happiness_mappings'\": u\"{0: 'b3llgpwwqwozijzje7ydgossrdyqig5e'}\", u\"'happiness'\": u'1', u\"'existing_shares'\": u'{\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\": [0]}', u\"u'task_level'\": u'[2, 3, 2]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'total_shares'\": u'1', u\"'peers'\": u\"['6jdspiha6nw2az6fqglwfzbu2c2uvnfg', 'b3llgpwwqwozijzje7ydgossrdyqig5e']\", u\"'readonly_peers'\": u'[]'}", "message_type": "eliot:destination_failure", "task_level": [2, 3, 3]}
{"timestamp": 1617718191.715011, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "message_type": "immutable:upload:get-shareholders:converged-happiness", "effective_happiness": 1, "task_level": [2, 3, 4]}
{"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.715412, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'timestamp'\": u'1617718191.715314', u\"u'action_status'\": u\"u'succeeded'\", u\"'upload_trackers'\": u'[]', u\"u'task_level'\": u'[2, 3, 5]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'already_serverids'\": u'{0: set([\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\"])}', u\"u'action_type'\": u\"u'immutable:upload:locate-all-shareholders'\"}", "message_type": "eliot:destination_failure", "task_level": [2, 4]}

Change History (2)

comment:1 Changed at 2021-04-21T13:58:02Z by itamarst

More context:

  1. Eliot was originally developed on Python 2, where bytestrings were the norm.
  2. JSON doesn't know about bytes.

For JSON serialization Eliot therefore followed Python's lead, where if bytes looked like a UTF-8-encoded unicode string, they were serialized as a JSON string.

With Python 3, bytestrings are no longer the default. Which means bytes are more likely to be ... bytes, and so on Python 3 Eliot decided not to handle bytes by default in log messages, since it's not clear what the correct thing to do is. How to handle them is left up to individual applications.

As a result, Tahoe-LAFS on Python 3 needs a policy decision on how to handle byte serialization. The initial policy decision was "handle bytes that look like UTF-8-encoded unicode strings".

However, it turns out Tahoe actually logs random byte strings, some of which are very much not UTF-8 decodable. This PR allows Tahoe to continue doing so by using hex quoting when necessary.

comment:2 Changed at 2021-04-28T19:24:21Z by GitHub <noreply@…>

  • Owner set to GitHub <noreply@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 1c2ba6b/trunk:

Merge pull request #1043 from tahoe-lafs/3672.non-utf-8-bytes-in-logs

Support logging non-UTF-8 bytes in logs

Fixes ticket:3672

Note: See TracTickets for help on using tickets.