Python 3: Single codebase vs. 2to3

August 22, 2013

In my previous post about switching to Python 3 as my default Python, I praised the use of a single codebase for supporting both Python 2 and Python 3. I even chastised the Python core developers for creating 2to3, writing, “I think that the core Python folks made a mistake by presenting Python 3 as a new language. It has made people antagonistic against Python 3 (well, that and the print function, which was another stupid mistake, because even if it was a good idea, it alone has kept too many people from switching). 2to3 was a mistake too, because it perpetuated this idea.”

Well, this isn’t entirely fair, because I myself used to be one of the biggest advocates of using 2to3 over a single codebase. Take this GitHub comment from when the IPython guys were considering this issue, where I wrote, “maintaining a common code base is going to be a bit annoying from the developer side.…The main benefit of using 2to3 is that 99% of the time, you can just write your code as you would for Python 2, and when it gets to Python 3, it just works (maybe that percent is a bit smaller if you use strings a lot, but it’s still quite high). To write for Python 2 and 3 at the same time, you have to remember a lot of little rules, which no one will remember (and new contributors will not even know about). And given that IPython’s test coverage is still poor (unless I am mistaken, in which case, please correct me), little mistakes will slip through, and no one will notice until they try the certain behavior in Python 3.”

So I just want to clarify a few things.

  1. I was wrong. When I chastised the Python core developers for making people believe that Python 3 is a different language from Python 2, I too fell into that trap. It took a month of me working on a codebase that had to be directly Python 3 compatible to see the fallacy of this. And seeing just how small the SymPy compatibility file is sealed the deal. I now believe that I was completely wrong in saying that maintaining a common codebase is annoying. As I wrote in the previous post, it is no different from supporting 2.4-2.7, for instance (actually, by my memory, supporting 2.4-2.7 was much worse than supporting 2.6-3.3, because so many language features were introduced in Python 2.5)
  2. If you have to support 2.5 or earlier and Python 3, then 2to3 might actually be better. The reason is simple: Python 2.6 was the first version of Python to “know” about Python 3. So, for instance, from __future__ import print_function was introduced in Python 2.6. This means that to support a single codebase for 2.5-3.x you have to write print('\n') to print an empty line and to print something without a newline at the end, you have to use sys.stdout.write. Also, except Exception as e, using the as keyword, which is the only syntax allowed in Python 3, was introduced in Python 2.6, so if you want to catch an exception you have to use sys.exc_info()[1]. Now that really is annoying. But in Python 2.6, most differences can be fixed with simple definitions, most of which boil down to try, except ImportError, import x as y type workarounds. The worst are the print function, which can be imported from __future__, division, which can also be imported from __future__ (or worked around), and unicode literals (if it’s a big deal, drop support for Python 3.2). Most other things are just simple renames, like xrange -> range, or making sure that you wrap functions that are iterators in Python 3 in list if you want to access items from them.
  3. I was right about test coverage. Supporting Python 2 and Python 3 in a single codebase if you have bad test coverage is not going to work. You can get around the worst things by making sure that __future__ imports are at the top of each file, but you are bound to miss things, because, as I said, you will forget that map(f, s)[0] doesn’t work in Python 3 or that the StringIO module has been renamed to io, or that you can’t pass around data as strings—they have to be bytes.

    Of course, you also need good test coverage to support Python 3 well using 2to3, but you can get away with more because 2to3 will take care of things like the above for you. Perhaps instead of 2to3 what really should have been made is a pyflakes-like tool that uses the same knowledge as 2to3 to check for cross-compatibility for Python 2 and Python 3.

  4. In the end, you have to be actually using Python 3. I feel like people haven’t been, even today, taking Python 3 seriously. They aren’t actually using it. There’s a feeling that someday in the future they will, but for now, Python 2 is the way to go. 2to3 exacerbates this feeling, because to use it, you have to develop in Python 2. You shouldn’t touch the code generated by 2to3. As it is, then, if you develop with 2to3, you only ever use Python 3 to test that things are working in Python 3. You don’t prototype your code in Python 3, because then you will write code that doesn’t work in Python 2.

    With the single codebase, your view should change. You should start prototyping in Python 3. You should only use Python 2 to test that things work in Python 2 (and since you’ve been using Python 2 for so long before switching to Python 3, or at least if you’re like me you have, this is not that bad). Just yesterday, I found a bug in SymPy in Python 3 that went unnoticed. It relates to what I said above about using bytes instead of strings for data. I just checked, and 2to3 wouldn’t have fixed it (and indeed, the bug is present in SymPy 0.7.3, which used 2to3), because there’s no way for 2to3 to have known that the data was bytes and not a string. The code was obviously untested, but it would have been obvious that it didn’t work if anyone was using Python 3 to use SymPy interactively. As it turns out, some of our users are doing this, and they pointed it out on the mailing list, but it remained unfixed until I found it myself independently.

So old mistakes aside, the lessons to take away from this and the previous blog post are

  1. Use a single codebase instead of 2to3 to support both Python 2 and Python 3.
  2. Use Python 3 as your default Python.
  3. Keep Python 2 around, though, because not everything supports Python 3 yet.
  4. Expect to find some bugs, because, until everyone starts doing this, people aren’t going to test their software in Python 3.

Using Python 3 as my default Python

August 9, 2013

So I just finished my internship with Continuum. For the internship, I primarily worked on Anaconda, their free Python distribution, and conda, its free (BSD open source) package manager. I might write a blog post about conda later, but suffice it to say that I’m convinced that it is doing package management the right way. One of the major developments this summer that I helped out with was the ability for anybody to build a conda package, and a site called Binstar where people can upload them (the beta code is “binstar in beta” with no quotes). 

Another thing that happened over the summer is that Almar Klein made conda Python 3 compatible, so that it can be used with the Pyzo project, which is Python 3 only.    The way this was done was by using a single code base for Python 2 and Python 3. Thus, this became the first time I have done any heavy development on Python source that had to be Python 3 compatible from a single codebase (as opposed to using the 2to3 tool). 

Another development this summer was that SymPy was released (0.7.3). This marked the last release to support Python 2.5. Around the same time, we discussed our Python 3 situation, and how annoying it is to run use2to3 all the time. The result was this pull request, which made SymPy use a single code base for Python 2 and Python 3. Now, that pull request is hard to mull through, but the important part to look at is the compatibility file. Everything in that file has to be imported and used, because it represents things that are different between Python 2 and Python 3. Ondřej has written more about this on his blog

In all, I think that supporting Python 2.6-3.3 (not including 3.0 or 3.1) is not that bad. The compatibility file has a few things, but thinking back, it was just that bad or worse supporting Python 2.4-2.7 (heck, back then, we couldn’t even use the all function without importing it). The situation is much better today now that we use Travis too, since any mistake is caught before the pull request is merged. The worst of course is the print function, but since that can be imported from __future__, I will be warned about it pretty fast, since print as a statement is a SyntaxError in that case. It also doesn’t take that long to get into the habit of typing () after print.

Of course, there are a lot of nice Python 3 only features that we cannot use, but this was the case for supporting Python 2.4-2.7 too (e.g., the with statement and the ternary statement were both introduced in Python 2.5).  So this is really nothing new. There is always a stick to drop the oldest Python version we support, and a lag on what features we can use. Now that we have dropped Python 2.5 support in SymPy, we can finally start using new-style string formatting, abstract base classes, relative imports, and keyword arguments after *args.

So as a result of this, I’ve come to the conclusion that Python 3 is not another language. It’s just another version of the same language. Supporting Python 2.6-3.3 is no different from supporting Python 2.4-2.7. You have to have some compatibility imports, you can’t use new language features, and you have to have good test coverage. I think that the core Python folks made a mistake by presenting Python 3 as a new language. It has made people antagonistic against Python 3 (well, that and the print function, which was another stupid mistake, because even if it was a good idea, it alone has kept too many people from switching). 2to3 was a mistake too, because it perpetuated this idea.

In the past, I have always developed against the latest version of Python: 2.6 was the best when I learned Python, and then 2.7. Even though I have had to support back to 2.4, I only used 2.4 explicitly when testing.

Well, given what I said above, the only logical thing to do is to use Python 3.3 as my main development Python. If you use Anaconda, there are basically two ways you can do this. The first is to just create a Python 3 environment (conda create -n python3 python=3), and put that first in your PATH (you also will need to add source activate python3 to your bash profile if you go this route, so that conda install will install into that environment by default). For me, though, I plan to use a Python 3 version of Anaconda, which has Python 3 as the default. The main difference here is that conda itself is written in Python 3. Aside from purity, and the fact that I plan to fix any occasional conda bugs that I come across, the other difference here is that conda itself will default to Python 3 in this case (i.e., when creating a new environment with Python like conda create -n envname python, the Python will be Python 3, not Python 2, and also it will build against Python 3 by default with conda build). Continuum does not yet make Python 3 versions of Anaconda, but there are Python 3 versions of Miniconda (Miniconda3), which is a stripped down version of Anaconda with just Python, the conda package manager, and its dependencies. You can easily install Anaconda into it though with conda install anaconda. I personally prefer to install only what I need to keep the disk usage low (on an SSD, disk space is sparse), so this is perfect for me anyway.

My recommendation is to put a Python 2 installation second in your PATH, so that you can easily call python2 if you want to use Python 2. The easiest way to do this is to create a conda environment for it (conda create -n python2 python=2) and add ~/anaconda/envs/python2 to your PATH.

So far, I have run into a few issues:

  • Some packages aren’t build for Python 3 yet in Anaconda, or they don’t support it at all. The biggest blocker in Anaconda is PySide (at least on Mac OS X), though it should be coming soon.
  • Some packages only install entry points with a “3” suffix, which is annoying. The biggest offender here is IPython. I brought up this issue on their mailing list, so hopefully they will see the light and fix this before the next release, but it hasn’t been implemented yet. I also plan to make sure that the Anaconda package for IPython installs an ipython entry point into Python 3 environments. Even so, one has to remember this when installing old versions of IPython in environments.
  • There are some bugs in conda in Python 3. Actually, I suspect that there are bugs in a lot of packages in Python 3, because people don’t develop against it, unless they have excellent test coverage. Even SymPy missed a few print statements.
  • You can’t setup.py develop against anything that uses 2to3 (like IPython).
  • It’s a little annoying working against old versions of SymPy (e.g., when digging through the git history to track something down), because I have to explicitly use Python 2. Conda makes this easier because I can just create a Python 2 environment and do source activate python2 when I am using Python 2. Or, for a one-off, I can just use python2, and keep a Python 2 environment second in my PATH. But this issue is not really new. For example, really old versions of SymPy only work with Python 2.5, because they used as as a variable name.
  • Everyone else isn’t using Python 3 yet, so if I write a script that only needs to support “the latest version of Python,” it probably needs to support Python 2.7, or else I should explicitly put /usr/bin/env python3 in the shebang line. But for SymPy, I have to be aware of how to support 2.6-3.3, so I have to know all the features that are only in some versions anyway. On the other side of things, if I run some random Python script with a shebang line, it probably is going to expect Python 2 and not Python 3, so I either have to explicitly add python2 to the command or activate a Python 2 environment
  • Some packages just don’t support Python 3 yet. Fabric (and its main dependency, Paramiko) is the one example I have come across so far in my own work. So I have to fall back to Python 2 if I want to use them. The best thing to do here is to pitch in and help these libraries port themselves.
  • People always give code examples with print as a statement instead of a function, so I either have to fix it manually before pasting it or use Python 2. I had tried at one point to make a %print magic for IPython that would let print work like a statement in Python 3, but I never finished it. I guess I should revisit it.

I’ll update this list as I come across more issues.

In all, so far, it’s nothing too bad. Conda makes switching back to Python 2 easy enough, and dealing with these issues are hardly the worst thing I have to deal with when developing with Python. And if anything, seeing Python 2-3 bugs and issues makes me more aware of the differences between the two versions of the language, which is a good things since I have to develop against code that has to support both.