Automating the SymPy release process

July 7, 2013

So I have just published SymPy 0.7.3.rc1. I’ll write a blog post about the release itself when we release 0.7.3 final, but for now, I wanted to write about how we managed to automate our release process.

Our story begins back in October of 2012, when I wrote a long winded rant to the mailing list about how long it was taking to get the 0.7.2 release out (it took over a month from the time the release branch was created).

The rant is fun, and I recommend reading it. Here are some quotes

The intro:

Now here’s a timeline: 0.7.1 was released July 29, 2011, more than a year and two months ago. 0.7.0 was released just over a month before that, on June 28. 0.6.7 was released March 18, 2010, again over a year before 0.7.0. In almost two year’s time, we’ve had three releases, and are struggling to get out a fourth. And it’s not like there were no changes; quite the opposite in fact. If you look at SymPy 0.6.6 compared to the current master, it’s unbelievable the amount of changes that have gone forward in that time. We’ve had
since then the new polys, at least four completely new submodules (combinatorics, sets, differential geometry, and stats), massive improvements to integration and special functions, a ton of new stuff in the physics module, literally thousands of bug fixes, and the list goes on. Each of these changes on it’s own was enough to warrant a release.

So in case I didn’t make my point, le me state it explicitly: we need to release more often. We need to release *way* more often.

My views on some of the fundamental (non-technical) issues:

I think that one other thing that has held back many releases is the feeling of “wait, we should put this in the release”. The use of a release branch has helped keep master moving along independently, but there still seems to be the feeling with many branches of, “this is a nice feature, it ought to go in the release.” My hope is that by making the release process smoother, we can release more often, and this feeling will go away, because it won’t be a big deal if something waits until the next release. As far as deprecations go, the real issue with them is time, not release numbers. So if we deprecate a feature today vs. one month from today, it’s not a big deal (as opposed to today vs. a year from today), regardless of how many versions are in between.

I read about what GitHub does for their Windows product regarding releasing often on their blog: https://github.com/blog/1271-how-we-ship-github-for-windows (they actually have this philosophy for all their products). One thing that they said is, “And by shipping updates so often, there is less anxiety about getting a particular feature ready for a particular release. If your pull request isn’t ready to be merged in time for today’s release, relax. There will be another one soon, so make that code shine!” I think that is exactly the point here. Another thing that they noted is that automation is the key to doing this, which is what I am aiming for with the above point.

My vision:

Once we start releasing very often (and believe me, this is way down the road, but I’m trying to be forward looking here), we can do away with release candidates. A release candidate lives in the wild for a week before the full release. But if we are capable of releasing literally every week, then having release candidates is pointless. If a bug slips into a release, we just fix it and it will be in the next release.

We should release *at least* once a month. I think that if the process is automated enough, that this will be very possible (as opposed to the current situation, where the release branch lasts longer than a month). In times of high activity, we can release more often than that (e.g., after a big pull request is merged, we can release).

That was October. Today is July. Basically, our release process was way too long. Half of it was testing stuff, half of it was tedious releasing stuff (like making tarballs and so on), and half of it was updating websites.

We have moved all our testing to Travis CI. So now every pull request is tested, and we can be pretty much assured that master is always passing the tests. There is still some work to do here (currently Travis CI doesn’t test with external dependencies), but it’s mostly a solved problem.

For updating websites, we conceded that we are not going to update anything that we don’t own. That means no attempting to make Debian or Sage packages, or updating Wikipedia or Freshmeat. Someone else will do that (and does anyone even use Freshmeat any more?).

That leaves the releasing itself. It’s still a pain, because we have to make a source tarball, Windows installer, html docs, and pdf docs, and do them all for both Python 2 and Python 3.

So Ondrej suggested moving to fabric/vagrant. At the SciPy 2013 sprints, he started working on a fabfile that automates the whole process. Basically vagrant is a predefined Linux virtual machine that makes it easy to make everything completely reproducible. Fabric is a tool that makes it easy to write commands (in Python) that are run on that machine.

Building the basic stuff was easy, but I want to automate everything. So far, not everything is done yet, but we’re getting close. For example, in addition to building the tarballs, the fabric script checks the contents of the tarball against git ls-files to make sure that nothing is included that shouldn’t be or left out accidentally (and, indeed, we caught some missing files that weren’t included in the tarball, including the README).

You can run all this yourself. Checkout the 0.7.3 branch from SymPy, then cd into the release directory, and read the README. Basically, you just install Fabric and Vagrant if you don’t have them already, then run

vagrant up
fab vagrant prepare
fab vagrant release

Note that this downloads a 280 MB virtual machine, so it will take some time to run for the first time. When you do this, the releases are in the `release` directory.

Finally, I uploaded 0.7.3.rc1 to GitHub using the new releases feature. This is what the release looks like on GitHub, from the user point of view

SymPy 0.7.3.rc1

This is what it looks like to me

SymPy 0.7.3.rc1 Edit

GitHub has (obviously) the best interface I’ve ever seen for this. Of course, even better would be if there were an API, so that I could automate this too. But since Google’s announcement that they are discontinuing downloads, we can no longer upload to Google Code. Our plan was to just use PyPI, but I am glad that we can have at least one other location, especially since PyPI is so buggy and unreliable (I can’t even log in, I get a 502).

So please download this release candidate and test it. We espeically need people to test the Windows installer, since we haven’t automated that part yet (actually, we are considering not making them any more, especailly given the existence of people like Christoph Gohlke who make them for SymPy anyway, but we’ll see). The only thing that remains to be done is to finish writing the release notes. If you made any contributions to SymPy since the last release, please add them there. Or if you want to help out, you can go through our pull requests and make sure that nothing is missing.


SciPy 2013

July 2, 2013

This past week was the 2013 SciPy conference. It was an exciting time, and a lot of interesting things happened. 

First, a background. This summer, I have been doing an internship with Continuum Analytics. There I have been working mainly on Anaconda and conda. Anaconda is Continuum’s free (to everyone) Python distribution, which makes it really easy to get bootstrapped with all the scientific software (including SymPy). Conda is Anaconda’s package manager, which, I think, solves many if not all of the main issues with the Python packaging tools like pip, easy_install, PyPI, and virtualenv. 

I may write more about that later, but for now, I want to write about my experiences at the conference. The main point there is that I have already been in Austin for about a month, so getting to the conference this year was pretty easy.

On the first day of the conference, on Monday morning, Ondrej Certik and I had our tutorial for SymPy. For the past couple of months, I have been rewriting the official SymPy tutorial from scratch. The official tutorial for SymPy was very old, and had many issues. It only went over features that were good at the time of its writing, so while nothing in the tutorial was wrong, it didn’t really represent the latest and greatest of the library. Also, it was written just like a list of examples, which is not much more than the API docs. In my new tutorial, I aimed to give a narrative style documentation, which starts from the very beginning of what symbolics are and works its way up to the basic functionality of things like solving and simplifying expressions. My goal was also to lead by example, and in particular, to avoid teaching things that I think either are antipatterns, or lead to antipatterns. In Python, there is one– and preferably only one –way to do it. In SymPy, by the nature of the library, there are about seven different ways to create a Symbol, for example (see https://github.com/sympy/sympy/wiki/Idioms-and-Antipatterns, the section, “Creating Symbols”). But there is one best way to do it: by using symbols(). So all throughout the tutorial, I just use symbols(), even if I am creating a single Symbol. I avoid messy things like var. 

The final tutorial is at http://docs.sympy.org/tutorial/tutorial/. This was the basis for the tutorial that Ondrej and I gave at SciPy. The site for our tutorial is at http://certik.github.io/scipy-2013-tutorial/html/index.html. There are links to videos, slides, and exercise notebooks there. 

I think our tutorial was a great success. People liked (I think) the introduction from nothing to SymPy. For our exercises, we used the IPython Doctester. I think that people really liked this way of doing exercises, but there were some issues getting it to work on everyone’s machine. 

In addition to my stuff, Ondrej presented some notebooks of examples of work that he has used in his work at LANL. I think this worked well. There were several physicists in the audience, who understood most of the content, but even for those who weren’t (including me!), it really showed that SymPy is a useful tool. In a beginner tutorial, it is easy to get lost in the easy details, and forget that in the end, you can actually use SymPy to compute some powerful things.  SymPy has in the past year or two really passed the barrier of toy to tool. 

After our tutorial, I attended the IPython tutorial, and the two-part Scikit-Learn tutorial. The most awesome part of this was just getting to meet people. Fernando Perez, Thomas Kluyver, and Brian Granger of IPython were at the conference. Brain is also a SymPy developer, who has spearheaded the quantum module. From SymPy, in addition to Ondrej (who created SymPy), I met Matthew Rocklin, one of the top contributors, Jason Moore, one of the developers of PyDy, which uses SymPy’s mechanics module, and David Li, who works on SymPy Gamma and SymPy Live (more on these people later). 

After the tutorials, Wednesday and Thursday were the talks. There were a lot of good ones. Here are the ones that I remember the most

  • Fernando’s keynote. If you’ve ever seen one of Fernando’s talks, you know that he is a great speaker. 
  • Matthew’s talk. His talk was about his work on using SymPy’s matrix expressions to compile expressions for BLAS/LAPACK. This talk excited many people in the audience. I think this is great, because it shows people some of the real power of things you can only do with symbolics.
  • Jason Moore’s talk about PyDy and the mechanics module. He ran out of time, but there is a nice example of using SymPy to generate a controller for an inverted triple pendulum, which seems impossible, but then he shows a video of an actual thing that can do it.
  • William Schroeder’s keynote. The message was that the academic model is broken, and doesn’t lead to reproducible research. While they are fixing things, the message is that we are the new publishers. There was also mention at the end that we should stop using noncommercial licenses, and stop using viral licenses like the GPL and LGPL. I was a little surprised to hear such a controversial statement, but it’s actually very true, and I agree with him that if people don’t stop using the GPL, then we will never achieve openness in science. 
  • David Li’s talk. David Li is a high school student (starting his senior year in the fall), who started with SymPy two years ago with Google Code-In. He has continued working on SymPy Live, and SymPy Gamma since. He is the reason that we have SymPy Live in our docs. His talk was also well received.  David is a good speaker, and SymPy Gamma and SymPy Live are pretty cool (for those of you who don’t know, SymPy Live is an online shell where you can run a Python session with SymPy in the browser, and SymPy Gamma is the SymPy version of WolframAlpha).
  • Brian Granger’s talk. His talk is entitled “Why you should write buggy software with as few features as possible“. I think he had some good messages in there. You have to reduce the scope of your project, or it will get out of hand. As for bugs, getting bug reports is a good thing, because it shows that people are using the software, and what parts of it they are using. 
  • The lightning talks. Especially Matthew Rocklin’s lightning talk. His talk was about splitting things up into very small packages, so that you don’t have to get a huge package just for one function. He went a little far with it, and I think his ideas aren’t really usable in the current Python packaging ecosystem, but, taken in moderation, I agree with him. At any rate, it was very entertaining (I don’t have any video links here because they aren’t posted yet, but I encourage you to watch the lightning talks once they are posted). 
  • I heard the matplotlib talk was good, but I haven’t seen it because it was at the same time as Matthew’s talk. I plan to watch it when the videos come out. If you saw it, I encourage you to watch Matthew’s talk, especially if you’ve ever used BLAS/LAPACK.

Topping off the week were the sprints on Friday and Saturday. My goal was to get out a release of SymPy. We didn’t quite get that far, but we got close. We are only blocking on a few small things to get out a release candidate, so expect one before the end of the week. We did introduce a lot of people to SymPy at the sprints, though, and got some first time contributions. Definitely I think we made a lot more people aware of SymPy at this conference than we ever have before. 

Another interesting thing at the sprints: before the conference, I was telling David Li that we should switch to Dill for SymPy Live (the way SymPy Live works on the App Engine, it has to pickle the session between runs, because there is a 60 time limit on each execution). Dill is a library that extends Python’s pickle so that it can pickle just about anything. At the end of David’s talk, the guy who wrote Dill, Mike McKerns raised his hand and asked him about it! At the sprints, David and he worked together to get it working in SymPy Live (and coincidentally, he also uses SymPy in another package, mystic). There were some fixes needed for Dill. He also moved Dill out of a larger project (in the spirit of Matthew’s lightning talk), and over to GitHub. Now all they need is a logo (Paul Ivanov suggested a variation on “we can pickle that!”). 

In all, it was a fun conference. The best part, as always, was meeting people in person, and talking to them. To conclude, I want to mention two other interesting things that happened.

The first is that Matthew and I talked seriously about how to go about fixing the assumptions in SymPy. I will write to the list about this soon, but the basic idea is to just get in there and hack things together, so that we can get something that works. The work there is started at https://github.com/sympy/sympy/pull/2210, where I am seeing if we can merge the old and new assumptions, so that something assumed in one can be asked in the old one.

The second thing is that Ondrej got a new hat: Ondrej's Hat