PuDB, a better Python debugger

June 4, 2010

So Christian Muise unwittingly just reminded me on IRC that I forgot to mention the main method that I used to learn how the heurisch function works in my last blog post. I usually only use a debugger when I have a really hard bug I need to figure out, when the print statements aren’t enough. The reason for this is that the debugger that I had been using, winpdb, is, well, a pain to use. There are so many little bugs, at least in Mac OS X, that it is almost not worth while to use it unless I need to. For example, restarting a script from the debugger doesn’t work. If I pass a point that I wanted to see, I have to completely close the winpdb window and restart it from the command line, which takes about half a minute. Also, winpdb uses it’s own variant of pdb, which seems to cause more problems than it creates (like bugging me about sympy importing pdb somewhere every time I start debugging.)

But I really wanted to be able to step through the heurisch code to see exactly how it works, because many of the implementation details, such as gathering the components of an expression, will be similar if not exactly the same in the full algorithm. So I started my quest for a better debugger. For me, the ideal debugger is the C debugger in XCode. That debugger has saved me in most of my programming assignments in C. But it is only for C based languages (C, Objective-C, probably C++, …), not Python. So I did a Google search, and it turns out that there is a list of Python debuggers here. So I went through them, and I didn’t have to go far. The very first one, pudb, turns out to be awesome!

You can watch this screencast to get a better idea of the features, or even better install it and check them out. The debugger runs in the console, not in some half-hacked GUI (half-hacked is what any non-Cocoa GUI looks like in Mac OS X). The only down side to this is that you have to use the keyboard to do everything, but it ends up not being too bad. And you can press ‘?’ at any time to see the possible commands.

To install it, just do easy_install pudb. To run it, just create a script of what you want to debug, and do python -m pudb.run my-script.py and it just works! I have a line that says alias pudb='python -m pudb.run' in my .profile, which makes it even easier to run. If you want to set a break point in the code, you can either navigate there from within pudb by pressing ‘m’, or you add a line that says from pudb import set_trace; set_trace() to the code (if you add the line to your code, you don’t even need to create a script. Just execute the code in IPython and when it hits that line, it will load the debugger).

Some cool features:

– IPython console. Just press ‘!’ to go to a console, where you can manipulate variables from the executed namespace, and you can choose an IPython console.

– Very easy to navigate. You just need to know the keys ‘s’, ‘n’, and ‘t’.

– View the code from elsewhere than what is being run. Pressing ‘m’ lets you view all imported modules. You can easily view points on the stack by choosing them.

– If an exception is thrown, it catches it! This may sound obvious for a debugger, but it is one of things that didn’t work very well in winpdb. You can view the traceback of the exception, and choose to restart without having to close and reopen the debugger. Actually, it asks you if you want to restart every time the script finishes too, which is also a great improvement over winpdb.

This is what it looks like. Click for a bigger picture:

This is where the heurisch algorithm hangs.

Some annoyances (in case Andreas Kloeckner reads this):

– The default display for variables is type, which is completely useless. I have to manually go through and change each to str so I can see what the variable is. Is there a way to change this default?

– It asks me every time if I want to use IPython. I always want to use IPython.

– This is might be a Mac OS X Terminal bug, but when I execute a statement that takes a while to run, it doesn’t redraw the pudb window until it finishes. This means that stepping through a program “flashes” black from what is above pudb in the window, and if I run a statement that takes forever, I loose the ability to see where it is unless I keyboard interrupt. Fortunately, it catches keyboard interrupts, so I can still see the traceback.

– There is no way to resize the variables window, or to scroll sideways in it. If I want to see what a long variable expression is, I have to go to the IPython console and type it there.

Some of these might be fixable and I just don’t know it yet. But even with them, this is still an order of magnitude improvement over winpdb. Now I can actually use the debugger all the time in my coding, instead of just when I have a really tough bug and no other choice.

UPDATE:

The first two were trivial to fix in a fork of the repository (isn’t open source awesome?). So if those are bothering you too, check out my branches at http://github.com/asmeurer/PuDB. Maybe if I have some time I will make them global options using environment variables or something and see if Andreas wants to merge them back into the main repo.

As for the second one, I realized that it might be a good thing, because you can see anything that is printed. Still, I would prefer seeing both, if possible (and the black flashes are annoying).

UPDATE 2:

You can resize the side view by pushing +/-, though there doesn’t seem to be a way to, say, make the variables view bigger and the breakpoints view smaller.

UPDATE 3:

A while back Ondrej modified the code to have a different color theme, and I followed suit. See this conversation at GitHub. So now, instead of looking like a DOS terminal, in PuDB for me looks like this:

PuDB XCode Midnight Theme Colors

PuDB XCode Midnight Theme Colors

This is exactly the same colors as my code in XCode, the editor I use, with the Midnight Theme. It’s pretty easy to change the colors to whatever you want. Right now, you have to edit the source, but Ondrej or I might someday make it so you can have themes.

Also, having used this all summer (and it was a life-saver having it in multiple occasions, and I am sure made my development speed at least twice as fast in others), I have one additional gripe. It is too difficult to arrow up to the variable that you want to access in the variables view. It would be nice to have a page up/page down feature there.

UPDATE 4: PuDB has since improved a lot, include many fixes by myself. It now supports themes, saved settings, variable name wrapping, and more. See this followup post.


Update for this week

June 4, 2010

So I started writing up a blog post on how rational function integration works, but Ondrej wants a blog post every week by the end of I don’t think I would do it justice by rushing to finish it now (read: I’m to lazy to do it). So instead, I’ll just give a short post (if that’s possible for me) on what I have been doing this week.

I finished up writing doctests for the polynomials module for now (see issue 1949), so now this week I started looking at the integrator. In particular, I went through each of the 40 issues with the Integration label and added them to a test file that I can monitor throughout the summer to see my progress. It is the test_failing_integrals.py file in my Integration branch, where all my work will be going for the foreseeable future. So if you want to follow my work, follow that branch. Here are some observations from those issues:

– integrate() can’t handle almost all algebraic integrals (functions with square roots, etc.). It can handle the derivative of arcsin and arcsinh because of special code in heurisch.py, but that’s about it. Before I can do any work on the Algebraic Risch Algorithm, I will need to implement the transcendental algorithm, so I think my temporary solution for this may be to add pattern matching heuristics for some of the more common algebraic integrals (anyone know a good integral table?).

– I figured out why integrate hangs forever with some integrals, such as the one in issue 1441. Here is, in a nutshell, how the Heuristic Risch algorithm works: Take the integrand and split it into components. For example, the components of x*cos(x)*sin(x)**2 are [x, cos(x), sin(x)]. Replace each of these components with a dummy variable, so if x = x0, cos(x) = x1, and sin(x) = x2, then the integrand is x0*x1*x2**2. Also, compute the derivative of each component in terms of the dummy variables. So the derivatives of [x0, x1, x2] are [1, -x2, 2*x1*x2]. Then, using these, perform some magic to create some rational functions out of the component dummy variables. Then, create a candidate integral with a bunch of unknowns [A1, A2, …], which will be rational numbers, and a multinomial of the An’s and the xn’s that should equal 0 if the candidate integral is correct. Then, because the xn’s are not 0, and there is also some algebraic independence, you have the the An coefficients of each term must equal 0. So you get a system of linear equations in the An’s. You then solve these equations, and plug the values of the An’s into the candidate integral to give you the solution, or, if the system is inconsistent, then if cannot find a solution, possibly because there is no elementary one.

Well, that over simplifies a lot of things, but the point I want to make is that the integral from issue 1441 creates a system of ~600 linear equations in ~450 variables, and solving that equation is what causes the integration to hang. Also, as Mateusz, my mentor and the one who wrote the current integration implementation, pointed out, quite a bit of time is spent in the heurisch algorithm doing expansion on large Basic polynomials. When I say Basic polynomials, I mean that they are SymPy expressions, instead of Poly instances. Using Poly should speed things up quite a bit, so my next move will be to convert heurisch() into using Poly wherever applicable.

– There were a few bugs in the rational integration, which I fixed in my branch. The problem was in rational integrals with symbolic coefficients. Because the new polys are able to create polynomials using any expression as a generator, not just symbols, things like Poly(sin(y)*x, x) creates Poly(sin(y)*x, x, domain=’ZZ[sin(y)]’). But using the polynomial ring or fraction field creates problems with some things like division, whereas we really only want the domain to be EX (expression domain) in this case. So this was not too difficult to fix, and you can see the fix in my integration branch.

– Some integrals will require some good implementation of special functions such as the hypergeometric function to work. Sometimes, you don’t want to know what the non-elementary integral looks like, but you just want to calculate a definite integral. The solution here is to use Meijer-G functions, which are on the list of things to possibly do at the end of the summer if I have time.

– Another bug that I plan on fixing (I haven’t done it yet, but I know how to do it and it will be trivial), is this (issue 1888):

In [18]: print integrate(f(x).diff(x)**2, x)

2*D(f(x), x)*f(x)/3 – 2*x*D(f(x), x, x)*f(x)/3 + x*D(f(x), x)**2/3

The problem is in the step where it computes the derivative of the components, it tries to compute the derivative of f(x).diff(x) in terms of a dummy variable, but it reduces to 0 because diff(x2, x) == 0. Thus, it treats f(x).diff(x) like something that has a 0 third derivative, i.e., x**2.

Well that’s it. I knew I couldn’t make a short blog post :). If you want to help, I have three branches that need review (1, 2, 3), and except for the last one, my work is based on top of the other two, so none of my integration work can be pushed in until those two reviewed positively.