Version 83 (modified by itamarst, at 2021-03-15T13:56:55Z) (diff) |
---|
Porting to Python 3
Motivation
- Make code behave the same on Python 2 and Python 3, insofar as one can, so e.g. map() is the same on Python 2 and Python 3 (i.e. lazy).
- Reduce errors by relying on Python 2 behavior and tests as well as manual review.
- Try to reduce grunt work.
How to set up your development environment
We use tox to standardize environments across developers and CI.
- Install tox (globally, probably; consider pipx).
- In your Tahoe-LAFS working copy, run tox -e py36 --notest to bootstrap the py36 virtualenv.
- Activate the environment with source .tox/py36/bin/activate or equivalent.
- Wire up for local dev with pip install -e .
- Run trial allmydata.test.test_python3 as a smoke test.
- Options for exercising the whole suite of ported tests (NB: test_python3 != python3_tests):
- trial allmydata.test.python3_tests
- python -m allmydata.test.python3_tests
- deactivate the virtualenv (or switch shells) and run tox -e py36
Worklist
Submodule† | Status | Assignee | Notes |
---|---|---|---|
__init__ | todo | ||
_version | todo | ||
windows | todo | ||
test | doing | itamarst is doing misc orphans, see below for list | |
test_system | doing | Two more tests are skipped, blocked on web and cli being ported | |
util | doing | ||
client | doing | itamarst | |
frontends | doing | itamarst | |
scripts | doing | jaraco | |
testing | doing | chad | |
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| | ||
| |
† of allmydata
‡ Expect spaghetti (see below).
These are test modules that haven't been ported. Unclear if we can do all of them in one PR, but worst case remaining ones can be copied to follow-up tickets. This does not include utility modules in allmydata.test, which also need to be ported.
Misc:
- test_auth
- test_backupdb
- test_hungserver
Utility modules:
- check_*
- cli_node_api
- strategies
- status
- storage_plugin
- eliotutil
- common*
- matchers
- _twisted_9607
- _win_subprocess
Related to tor/i2p:
- test_connections
- test_i2p_provider
- test_tor_provider
Related to CLI, will be done as part of that port:
- test_multi_introducers
- test_runner
- tests/cli/ directory
- extra unported tests in test_system and test_deepcheck
The porting process, big picture
For a module M, there is also a corresponding module T, the unittests for M. If the tests for M are embedded into a module that tests multiple modules, step one is to split off the tests so there's T that only tests M.
Then:
- Update T to run on both 2+3 (see below for what that looks like).
- Run T's tests on Python 2. They should still pass! If they don’t, something broke.
- Port the code module M.
- Now run T's tests on Python 3.
- Fix any problems caught by the tests.
- Add both M and T to allmydata/util/_python3.py.
- Run tox -e py36 (or equivalent) and verify that the module you ported is included and passing.
- Submit for code review.
- Check coverage report. If there are uncovered lines, see if you can add tests, or at least file a separate ticket for adding coverage.
When ports get harder due to spaghetti dependencies
As the port progresses, the simple "port module + its test module" gets difficult, since everything ends up depending on everything else. Here's one way to approach this:
- Port only the test module. This involves many Python 3 fixes to lots of other modules, but they are not officially ported, they're just inched along just enough to make the tests pass. Since the test module is officially ported, regressions to the Python 3 port still are prevented.
- Then, port the corresponding module.
When doing the incidental fixes to other modules, try to change as little as possible: no __future__ imports, no from future.builtins import all thte things, just enough changes to make the tests you care enough pass. This reduces chances of unintentional breakage and unintentional scope creep. You might even do temporary things like from past.builtins import unicode. Later on when specifically porting a you can Do All The Things the right way.
Porting a specific Python file
Zeroth, file a new ticket in milestone "Python 3", assign it to yourself.
First, add explicit byte or unicode annotations for strings where needed.
Second, run futurize --write --both-stages --all-imports path/to/file.py.
Third, fix the imports (automation below).
Delete this bit:
from future import standard_library standard_library.install_aliases() from builtins import str
And replace the from builtins import * variant, if any, with:
from future.utils import PY2 if PY2: from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min # noqa: F401
This adds builtins that match Python 3's semantics. The #noqa: F401 keeps flake8/pyflakes from complaining about unused imports. We do unused imports so that people changing code later don't have to manually check if map() is old style or new style.
Then, delete any instances of from builtins import <name>.
Consider using this sed command to execute the above:
$ sed -ie '/from future import standard_library/d;/standard_library.install_aliases()/d;s/from builtins import \*/from future.utils import PY2\\nif PY2:\\n from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min # noqa: F401/;/from builtins import .*/d' path/to/file.py
Fourth, manually review the code. Futureize is nice, but it very definitely doesn't catch everything, or it makes wrong decisions.
In particular:
- map(), filter(), etc. are now lazy.
- dict.keys() and friends now return a view of the underlying data, rather than a list with a copy.
Fifth, add a note to the module docstring saying it was ported to Python 3.
Sixth, open a PR with the Python 3 Port label.
Known issues with future
The from builtins import <every builtin ever> thing gives a decent Python 3 layer for Python 2. For example it'll automatically create __nonzero__ to wrap a __bool__.
But there are caveats.
One of them is the bytes objects:
- builtins.bytes.translate are builtins.bytes.maketrans buggy on PyPy?. One way to fix this is with a if PY2: translate = string.translate else: translate = bytes.translate.
- The behavior with b"%s" % some_bytes_object works fine if both objects are Future builtins.bytes, or both objects are native Python 2 strings/bytes, but not if you combine them. This has caused bugs. One way to fix this is by exposing only native byte strings for now, see e.g. allmydata.util.base32.
Don't leak Future objects
Leaking Future objects (newints, new dicts, new bytes) in module API can break existing code on Python 2. So need to be careful not to do that. For that reason int isn't in the suggested from builtins import ... list above.
Dealing with utility modules
Often you will have some utility module with lots of random code, some of which doesn't work on Python 3, or which even involves imports of non-Python-3-compatbile code (Nevow, in this case).
Options:
- Create new util_py3.py module, move just the things you need, have util.py import code from there.
- Add conditional imports/declarations to util.py so it imports on Python 3 and at least some of the code can be made to work.
Originally we went with first approach, but plausibly second approach is better.
Serializing bytes with JSON
In Python 2 you can serialize bytes with json. In Python 3 you can't. Real Soon Now there will be utility module allmydata.util.jsonbytes that allows encoding bytes on Python 3, to minimize changes.
Dictionaries with bytes/unicode keys
In Python 2 a key can be bytes or unicode, and it will replace the other one. So the key b"foo" is the same as u"foo" from dict's perspective. In Python 3 they are different keys.
This can lead to bugs when porting, where you end up with two keys instead of one as some strings become Unicode strings.
The interim solution will likely be dicts that enforce key type to be only bytes or only Unicode (https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3476#ticket).
Avoid massive changes
Sometimes it's easier to be a little more lenient in input (support both unicode and bytes), or to change some type to unicode, instead of having to change hundreds of lines of code from unicode to bytes when porting. When to do so is a judgement call, but if you are changing massive amounts of code to have b"" prefix _and_ byteness isn't important, consider alternative approaches.
Catching string/bytes issues
The following can cause bugs and aren't always caught in tests:
- str(some_bytes) (used to return "hello", now returns "b'hello'")
- "%s" % (some_bytes,)
- some_bytes == some_unicode
Python 3 has a -b command line flag for python that turns these into warnings. In the test runner you can then setup a warnings filter that turns those into exceptions for the package being ported (3rd party packages might do this too, and don't want to fail tests because of them). This helps catch bugs that wouldn't be caught otherwise. The cost is that you also need to fix all the logging messages that do this, but ... that's probably worth it.
tox.ini setup for Python 3 now has this setup for Tahoe-LAFS, so if you get BytesWarning? as exception that's what's going on. For debugging purposes it can be useful to disable the exceptions and just look at the warnings; warnings don't fail the tests, so you can grep for them from tests output, sort and uniquify and then fix them in one go.