The Risch Algorithm: Part 3, Liouville’s Theorem

August 14, 2010

So this is the last official week of the Summer of Code program, and my work is mostly consisting of removing NotImplementedErrors (i.e., implementing stuff), and fixing bugs. None of this is particularly interesting, so instead of talking about that, I figured I would produce another one of my Risch Algorithm blog posts. It is recommended that you read parts 1 and 2 first, as well as my post on rational function integration, which could be considered part 0.

Liouville’s Theorem
Anyone who’s taken calculus intuitively knows that integration is hard, while differentiation is easy. For differentiation, we can produce the derivative of any elementary function, and we can do so easily, using a simple algorithm consisting of the sum and product rules, the chain rule, and the rules for the derivative of all the various elementary functions. But for integration, we have to try to work backwards.

There are two things that make integration difficult. First is the existence of functions that simply do not have any elementary antiderivative. e^{-x^2} is perhaps the most famous example of such a function, since it arises from the normal distribution in statistics. But there are many others. \sin{(x^2)}, \frac{1}{\log{(x)}}, and x^x are some other examples of famous non-integrable functions.

The second problem is that no one single simple rule for working backwards will always be applicable. We know that u-substitution and integration by parts are the reverse of the chain rule and the product rule, respectively. But those methods will only work if those rules were the ones that were applied originally, and then only if you chose the right u and dv.

But there is a much simpler example that gets right down to the point with Liouville’s theorem. The power rule, which is that \frac{d}{dx}x^n=nx^{n-1} is easily reversed for integration. Given the power rule for differentiation, it’s easy to see that the reverse rule should be \int{x^ndx}=\frac{x^{n+1}}{n+1}. This works fine, except that were are dividing something, n+1. In mathematics, whenever we do that, we have to ensure that whatever we divide by is not 0. In this case, it means that we must assert n\neq -1. This excludes \int{\frac{1}{x}dx}. We know from calculus that this integral requires us to introduce a special function, the natural logarithm.

But we see that n=-1 is the only exception to the power rule, so that the integral of any (Laurent) polynomial is again a (Laurent) polynomial, plus a logarithm. Recall from part 0 (Rational Function Integration) that the same thing is true for any rational function: the integral is again a rational function, plus a logarithm (we can combine multiple logarithms into one using the logarithmic identities, so assume for simplicity that there is just one). The argument is very similar, too. Assume that we have split the denominator rational function into linear factors in the algebraic splitting field (such as the complex numbers). Then perform a partial fractions decomposition on the rational function. Each term in the decomposition will be either a polynomial, or of the form \frac{a}{(x - b)^n}. The integration of these terms is the same as with the power rule, making the substitution u = x - b. When n\geq 2, the integral will be \frac{-1}{n - 1}\frac{a}{(x - b)^{n - 1}}; when n = 1, the integral will be a\log{(x - b)}. Now computationally, we don’t want to work with the algebraic splitting field, but it turns out that we don’t need to actually compute it to find the integral. But theory is what we are dealing with here, so don’t worry about that.

Now the key observation about differentiation, as I have pointed out in the earlier parts of this blog post series, is that the derivative of an elementary function can be expressed in terms of itself, in particular, as a polynomial in itself. To put it another way, functions like e^x, \tan{(x)}, and \log{(x)} all satisfy linear differential equations with rational coefficients (e.g., for these, y'=y, y'=1 + y^2, and y'=\frac{1}{x}).

Now, the theory gets more complicated, but it turns out that, using a careful analysis of this fact, we can prove a similar result to the one about rational functions to any elementary function. In a nutshell, Liouville’s Theorem says this: if an elementary function has an elementary integral, then that integral is a composed only of functions from the original integrand, plus a finite number of logarithms of functions from the integrand, which can be considered one logarithm, as mentioned above (“functions from” more specifically means a rational function in the terms from our elementary extension). Here is the formal statement of the theorem.

Theorem (Liouville’s Theorem – Strong version)
Let K be a differential field, C=\mathrm{Const}(K), and f\in K. If there exist an elementary extension E of K and g \in E such that Dg =f, then there are v \in K, c_1, \dots, c_n\in \bar{C}, and u_1, \dots,u_n\in K(c_1,\dots,c_n)^* such that

f = Dv + \sum_{i=1}^n c_i\frac{Du_i}{u_i}.


Looking closely at the formal statement of the theorem, we can see that it says the same thing as my “in a nutshell” statement. K is the differential extension, say of \mathbb{Q}(x), that contains all of our elementary functions (see part 2). E is an extension of K. The whole statement of the theorem is that E need not be extended from K by anything more than some logarithms. f is our original function and g=\int f. Recall from part 1 that Dg = \frac{Du}{u} is just another way of saying that g = \log{(u)}. The rest of the formal statement is some specifics dealing with the constant field, which assure us that we do not need to introduce any new constants in the integration. This fact is actually important to the decidability of the Risch Algorithm, because many problems about constants are either unknown or undecidable (such as the transcendence degree of \mathbb{Q}(e, \pi)). But this ensures us that as long as we start with a constant field that is computable, our constant field for our antiderivative will also be computable, and will in fact be the same field, except for some possible algebraic extensions (the c_i).

At this point, I want to point out that even though my work this summer has been only on the purely transcendental case of the Risch Algorithm, Liouville’s Theorem is true for all elementary functions, which includes algebraic functions. However, if you review the proof of the theorem, the proof of the algebraic part is completely different from the proof of the transcendental part, which is the first clue that the algebraic part of the algorithm is completely different from the transcendental part (and also a clue that it is harder).

Liouville’s Theorem is what allows us to prove that a given function does not have an elementary antiderivative, by giving us the form that any antiderivative must have. We first perform the same Hermite Reduction from the rational integration case. Then, a generalization of the same Lazard-Rioboo-Trager Algorithm due to Rothstein allows us to find the logarithmic part of any integral (the \sum_{i=1}^n c_i\frac{Du_i}{u_i} from Liouville’s Theorem).

Now a difference here is that sometimes, the part of the integrand that corresponds to the \frac{a}{x - b} for general functions doesn’t always have an elementary integral (these are called simple functions. I think I will talk about them in more detail in a future post in this series). An example of this is \frac{1}{\log{(x)}}. Suffice it to say that any elementary integral of \frac{1}{\log{(x)}} must be part of some log-extension of \mathbb{Q}(x, \log{(x)}), and that we can prove that no such logarithmic extension exists in the course of trying to compute it with the Lazard-Rioboo-Rothstein-Trager Algorithm.

In the rational function case, after we found the rational part and the logarithmic part, we were practically done, because the only remaining part was a polynomial. Well, for the general transcendental function case, we are left with an analogue, which are called reduced functions, and we are far from done. This is the hardest part of the integration algorithm. This will also be the topic of a future post in this series. Suffice it to say that this is where most of the proofs of non-integrability come from, including the other integrals than \frac{1}{\log{(x)}} that I gave above.

Conclusion
That’s it for now. Originally, I was also going to include a bit on the structure theorems too, but I think I am going to save that for part 4 instead. I may or may not have another post ready before the official end of coding date for Google Summer of Code, which is Monday (three days from now). I want to make a post with some nice graphs comparing the timings of the new risch_integrate() and the old heurisch() (what is currently behind SymPy’s integrate()). But as I have said before, I plan on continuing coding the integration algorithm beyond the program until I finish it, and even beyond that (there are lots of cool ways that the algorithm can be extended to work with special functions, there’s definite integration with Meijer-G functions, and there’s of course the algebraic part of the algorithm, which is a much larger challenge). And along with it, I plan to continue keeping you updated with blog posts, including at least all the Risch Algorithm series posts that I have promised (I have counted at least three topics that I have explicitly promised but haven’t done yet). And of course, there will be the mandatory GSoC wrap-up blog post, detailing my work for the summer.

Please continue to test my prototype risch_integrate() function in my integration3 branch, and tell me what you think (or if you find a bug).


Prototype risch_integrate() function ready for testing!

August 5, 2010

So today I finally finished up the prototype function I talked about last week. The function is called risch_integrate() and is available at my integration3 branch. Unlike the inner level functions I have showcased in previous blog posts, this function does not require you to do substitution for dummy variables and manually create a list of derivatives, etc. All you have to do is pass it a function and the integration variable, and it will return the result, just like normal integrate(). I have spent the past few days working on a monster of a function called build_extension() that does this preparsing work for you. The reason that the function was so hard to write is that the transcendental Risch Algorithm is very picky. Every differential extension has to be transcendental over the previous extensions. This means that if you have a function like e^x + e^{\frac{x}{2}}, you cannot write this as t_0 + t_1 with t_0=e^x and t_1=e^{\frac{x}{2}} because t_0 and t_1 will each be algebraic over the other (t_0=t_1^2). You also cannot let t_0=e^{x} and rewrite the whole integral in terms of t_0 because you will get t_0 + \sqrt{t_0}, which is an algebraic function. The only way that you can do it is to let t_0=e^{\frac{x}{2}}, and then your function will be t_0^2 + t_0.

Now, fortunately, there is an algorithm that provides necessary and sufficient conditions for determining if an extension is algebraic over the previous ones. It’s called the Risch Structure Theorems. My first order of business this week was to finish implementing these. This is actually the reason that I we had to wait until now to get this prototype function. The Structure Theorems are at the very end of Bronstein’s book, and the integration algorithm is not correct without them (namely, it is not correct if you add an algebraic extension). I just recently got to them in my reading. Actually, I skipped some work on tangent integration so I could get to them first. I hope to talk a little about them in a future “Risch Integration” blog post, though be aware that they require some extremely intense algebraic machinery to prove, so I won’t be giving any proofs.

Even though these algorithms can tell me, for example, that I shouldn’t have added t_0=e^x above because it makes e^{\frac{x}{2}}=\sqrt{t_0}, that means that I have to go back and restart my search for an extension so that I can try to get t_0=e^{\frac{x}{2}} instead. So I wrote a simple function that takes the arguments of the exponentials and determines the lowest common factor. This heuristic saves a lot of time.

I also noticed (actually, Chris Smith inadvertently pointed it out to me; super thanks to him), that the Structure Theorem algorithms only tell you if the terms are the same as monomials. It would tell you that e^x = e^{x + 1} because both satisfy Dt=t. Therefore, I had to also modify the structure theorem algorithms to pull out any constant term.

It can still be necessary to restart building the extension even with the above heuristic. For example, if you have e^x + e^{x^2} + e^{\frac{x}{2} + x^2}, and start with t_0=e^x and t_1=e^{x^2}, then the structure theorems will tell you that e^{x/2 + x^2} = \sqrt{t_0}t_1, which we cannot use because of the radical. The solution it uses is to split it up as e^x + e^{x^2} + e^{\frac{x}{2}}e^{x^2} (the structure theorems tell you exactly how to do this so you are splitting in terms of the other exponentials) and then restart the extension building entirely. This can be an expensive operation, because you have to rebuild t_0 and t_1, but this time, the heuristic function I wrote from above handles the e^{\frac{x}{2}} correctly, making t_0=e^{\frac{x}{2}}, with the final answer t_0^2 + t_1 + t_0t_1. I could have probably made it smarter by only going back to before the conflicting extensions, but this was quite a bit more work, and adds more difficulties such as non-trivial relationships, so I just took the lazy way and restarted completely. It doesn’t take that much time.

Of course, sometimes, you cannot add a new exponential, no matter how you add the extensions. The classic example is e^{\frac{\log{(x)}}{2}}, which you can see is actually equal to \sqrt{x}, an algebraic function. Therefore, I had to implement some tricky logic to keep the build_extension() function from trying again infinitely. I hope I did it right, so that it never infinite loops, and never fails when it really can be done. Only time and testing will tell.

It is exactly the same for logarithms, except in that case, when a new logarithm is algebraic in terms of old ones, it can be written as a linear combination of them. This means that there are never any radicals to worry about, though you do also have to worry about constants. For example, \log{(x)} looks the same as \log{(2x)} because they both satisfy Dt=\frac{1}{x}. An example of a logarithm that is algebraic over old ones is \log{(x^2 - 1)} over \log{(x + 1)} and \log{(x - 1)}, because \log{(x^2 - 1)}=\log{((x + 1)(x - 1))}=\log{(x + 1)} + \log{(x - 1)}.

The parallels between exponentials and logarithms are amazing. For the structure theorems, the exponential case is exactly the same as the logarithmic case except replacing addition with multiplication and multiplication with exponentiation. For the exponential case, you need the arguments of the already added logarithms to find the algebraic dependence, and the arguments of the already added exponentials to find the constant term. For the logarithmic case, you need the arguments of the already added exponentials to find the algebraic dependence, and the arguments of the already added logarithms to find the content term. Everything else is exactly the same, except for the shift in operators. Of course, I realize why these things are, mathematically, but the symmetry still amazing to me. I will hopefully explain in more detail in my future Structure Theorems post.

So onto the risch_integrate() function. Here is the text that I have basically put in my commit message, the aptly numbered issue that I have created for it, and the post to the mailing list (it’s not so much that I am lazy as that I was really excited to get this out there).

I have ready in my integration3 branch a prototype risch_integrate() function that is a user-level function for the full Risch Algorithm I have been implementing this summer. Pull from http://github.com/asmeurer/sympy/tree/integration3.

This is NOT ready to go in. It is a prototype function that I am making available so people can try out the new algorithm and hopefully help me to find the bugs in it. Please pass it your favorite non-elementary integrals and see if it can determine that they are not elementary. If you try to pass it a very crazy function at random, the chances are pretty high that it will not be elementary. So a better way to test it is to come up with a crazy function, then differentiate it. Then pass the derivative and see if it can give you your original function back. Note that it will probably not look exactly the same as your original function, and may differ by a constant. You should verify by differentiating the result you get and calling cancel() (or simplify(), but usually cancel() is enough) on the difference.

So you can review the code too, if you like, but just know that things are not stable yet, and this isn’t strictly a branch for review.

So far, this function only supports exponentials and logarithms.
Support for trigonometric functions is planned. Algebraic functions are
not supported. If the function returns an unevaluated Integral, it means
that it has proven the integral to be non-elementary. Note that several
cases are still not implemented, so you may get NotImplementedError
instead. Eventually, these will all be eliminated, and the only
NotImplementedError you should see from this function is
NotImplementedError(“Algebraic extensions are not supported.”)

This function has not been integrated in any way with the already
existing integrate() yet, and you can use it to compare.

Examples:

In [1]: risch_integrate(exp(x**2), x)
Out[1]:
⌠
⎮  ⎛ 2⎞
⎮  ⎝x ⎠
⎮ ℯ     dx
⌡

In [2]: risch_integrate(x**100*exp(x), x).diff(x)
Out[2]:
 100  x
x   ⋅ℯ

In [3]: %timeit risch_integrate(x**100*exp(x), x).diff(x)
1 loops, best of 3: 270 ms per loop

In [4]: integrate(x**100*exp(x), x)
... hangs ...

In [5]: risch_integrate(x/log(x), x)
Out[5]:
⌠
⎮   x
⎮ ────── dx
⎮ log(x)
⌡

In [6]: risch_integrate(log(x)**10, x).diff(x)
Out[6]:
   10
log  (x)

In [7]: integrate(log(x)**10, x).diff(x)
Out[7]:
   10
log  (x)

In [8]: %timeit risch_integrate(log(x)**10, x).diff(x)
10 loops, best of 3: 159 ms per loop

In [9]: %timeit integrate(log(x)**10, x).diff(x)
1 loops, best of 3: 2.35 s per loop

Be warned that things are still very buggy and you should always verify
results by differentiating. Usually, cancel(diff(result, x) – result)
should be enough. This should go to 0.

So please, please, PLEASE, try out this function and report any bugs that you find. It is not necessary to report NotImplementedError bugs, because I already know about those (I put them in there), and as I mentioned above, they are all planned to disappear. Also, I am continually updating my branch with fixes, so you should do a “git pull” and try again before you report anything.

Also, I am aware that there are test failures. This is because I had to hack exp._eval_subs() to only do exact substitution (no algebraic substitution). It’s just a quick hack workaround, and I should eventually get a real fix.

Finally, I’m thinking there needs to be a way to differentiate between an unevaluated Integral because the integrator failed and an unevaluated Integral because it has proven the integral to be non-elementary. Any ideas?

Also, looking at the integral from the previous blog post, you can get the different results by using the handle_log argument to risch_integrate():

If handle_first == 'log' (the default right now), then it will gather all logarithms first, and then exponentials (insomuch as it can do it in that order). If handle_first='exp', it gathers exponentials first. The difference is that the Risch Algorithm integrates recursively, one extension at a time, starting with the outer-most one. So if you have an expression with both logarithms and exponentials, such that they do not depend on each other, handle_first == 'log' will integrate the exponentials first, because they will be gathered last (be at the top of the tower of extensions), and handle_first == 'exp' will integrate the logarithms first. Right now, I have defaulted to ‘log’ because the exponential integration algorithm is slightly more complete. If you get NotImplementedError with one, it is possible (though I don’t know for sure yet) that you might get an answer with the other.

Also, they can give different looking results, and at different speeds. For example:

Hover over the code and click on the left-most, “view source” icon (a paper icon with < > over it) to view without breaks. Opens in a new window.

In [1]: f = (x*(x + 1)*((x**2*exp(2*x**2) - log(x + 1)**2)**2 +
   ...: 2*x*exp(3*x**2)*(x - (2*x**3 + 2*x**2 + x + 1)*log(x + 1))))/((x +
   ...: 1)*log(x + 1)**2 - (x**3 + x**2)*exp(2*x**2))**2

In [2]: f
Out[2]: 
          ⎛                          2                                                   ⎞
          ⎜⎛                       2⎞                                                   2⎟
          ⎜⎜     2           2  2⋅x ⎟        ⎛    ⎛           2      3⎞           ⎞  3⋅x ⎟
x⋅(1 + x)⋅⎝⎝- log (1 + x) + x ⋅ℯ    ⎠  + 2⋅x⋅⎝x - ⎝1 + x + 2⋅x  + 2⋅x ⎠⋅log(1 + x)⎠⋅ℯ    ⎠
──────────────────────────────────────────────────────────────────────────────────────────
                                                                2                         
                         ⎛                                    2⎞                          
                         ⎜   2                  ⎛ 2    3⎞  2⋅x ⎟                          
                         ⎝log (1 + x)⋅(1 + x) - ⎝x  + x ⎠⋅ℯ    ⎠                          

In [3]: risch_integrate(f, x, handle_first='log')
Out[3]: 
       ⎛              ⎛ 2⎞⎞                   ⎛                ⎛ 2⎞⎞                             
       ⎜log(1 + x)    ⎝x ⎠⎟                   ⎜  log(1 + x)    ⎝x ⎠⎟          ⎛ 2⎞               
    log⎜────────── + ℯ    ⎟                log⎜- ────────── + ℯ    ⎟       2  ⎝x ⎠               
       ⎝    x             ⎠                   ⎝      x             ⎠      x ⋅ℯ    ⋅log(1 + x)    
x + ─────────────────────── - log(1 + x) - ───────────────────────── + ──────────────────────────
               2                                       2                                        2
                                                                              2           3  2⋅x 
                                                                       - x⋅log (1 + x) + x ⋅ℯ    

In [4]: risch_integrate(f, x, handle_first='exp')
Out[4]: 
       ⎛                ⎛ 2⎞⎞                   ⎛                ⎛ 2⎞⎞        ⎛ 2⎞             
       ⎜                ⎝x ⎠⎟                   ⎜                ⎝x ⎠⎟        ⎝x ⎠             
    log⎝log(1 + x) + x⋅ℯ    ⎠                log⎝log(1 + x) - x⋅ℯ    ⎠     x⋅ℯ    ⋅log(1 + x)  
x + ───────────────────────── - log(1 + x) - ───────────────────────── - ──────────────────────
                2                                        2                                    2
                                                                            2           2  2⋅x 
                                                                         log (1 + x) - x ⋅ℯ    

In [5]: %timeit risch_integrate(f, x, handle_first='log')
1 loops, best of 3: 1.49 s per loop

In [6]: %timeit risch_integrate(f, x, handle_first='exp')
1 loops, best of 3: 1.21 s per loop

In [7]: cancel(risch_integrate(f, x, handle_first='log').diff(x) - f)
Out[7]: 0

In [8]: cancel(risch_integrate(f, x, handle_first='exp').diff(x) - f)
Out[8]: 0

So go now, and pull my branch, and try this function out. And report any problems that you have back to me, either through the mailing list, IRC, issue 2010, or as a comment to this blog post (I don’t really care how).