Prototype risch_integrate() function ready for testing!

So today I finally finished up the prototype function I talked about last week. The function is called risch_integrate() and is available at my integration3 branch. Unlike the inner level functions I have showcased in previous blog posts, this function does not require you to do substitution for dummy variables and manually create a list of derivatives, etc. All you have to do is pass it a function and the integration variable, and it will return the result, just like normal integrate(). I have spent the past few days working on a monster of a function called build_extension() that does this preparsing work for you. The reason that the function was so hard to write is that the transcendental Risch Algorithm is very picky. Every differential extension has to be transcendental over the previous extensions. This means that if you have a function like $e^x + e^{\frac{x}{2}}$, you cannot write this as $t_0 + t_1$ with $t_0=e^x$ and $t_1=e^{\frac{x}{2}}$ because $t_0$ and $t_1$ will each be algebraic over the other ($t_0=t_1^2$). You also cannot let $t_0=e^{x}$ and rewrite the whole integral in terms of $t_0$ because you will get $t_0 + \sqrt{t_0}$, which is an algebraic function. The only way that you can do it is to let $t_0=e^{\frac{x}{2}}$, and then your function will be $t_0^2 + t_0$.

Now, fortunately, there is an algorithm that provides necessary and sufficient conditions for determining if an extension is algebraic over the previous ones. It’s called the Risch Structure Theorems. My first order of business this week was to finish implementing these. This is actually the reason that I we had to wait until now to get this prototype function. The Structure Theorems are at the very end of Bronstein’s book, and the integration algorithm is not correct without them (namely, it is not correct if you add an algebraic extension). I just recently got to them in my reading. Actually, I skipped some work on tangent integration so I could get to them first. I hope to talk a little about them in a future “Risch Integration” blog post, though be aware that they require some extremely intense algebraic machinery to prove, so I won’t be giving any proofs.

Even though these algorithms can tell me, for example, that I shouldn’t have added $t_0=e^x$ above because it makes $e^{\frac{x}{2}}=\sqrt{t_0}$, that means that I have to go back and restart my search for an extension so that I can try to get $t_0=e^{\frac{x}{2}}$ instead. So I wrote a simple function that takes the arguments of the exponentials and determines the lowest common factor. This heuristic saves a lot of time.

I also noticed (actually, Chris Smith inadvertently pointed it out to me; super thanks to him), that the Structure Theorem algorithms only tell you if the terms are the same as monomials. It would tell you that $e^x = e^{x + 1}$ because both satisfy $Dt=t$. Therefore, I had to also modify the structure theorem algorithms to pull out any constant term.

It can still be necessary to restart building the extension even with the above heuristic. For example, if you have $e^x + e^{x^2} + e^{\frac{x}{2} + x^2}$, and start with $t_0=e^x$ and $t_1=e^{x^2}$, then the structure theorems will tell you that $e^{x/2 + x^2} = \sqrt{t_0}t_1$, which we cannot use because of the radical. The solution it uses is to split it up as $e^x + e^{x^2} + e^{\frac{x}{2}}e^{x^2}$ (the structure theorems tell you exactly how to do this so you are splitting in terms of the other exponentials) and then restart the extension building entirely. This can be an expensive operation, because you have to rebuild $t_0$ and $t_1$, but this time, the heuristic function I wrote from above handles the $e^{\frac{x}{2}}$ correctly, making $t_0=e^{\frac{x}{2}}$, with the final answer $t_0^2 + t_1 + t_0t_1$. I could have probably made it smarter by only going back to before the conflicting extensions, but this was quite a bit more work, and adds more difficulties such as non-trivial relationships, so I just took the lazy way and restarted completely. It doesn’t take that much time.

Of course, sometimes, you cannot add a new exponential, no matter how you add the extensions. The classic example is $e^{\frac{\log{(x)}}{2}}$, which you can see is actually equal to $\sqrt{x}$, an algebraic function. Therefore, I had to implement some tricky logic to keep the build_extension() function from trying again infinitely. I hope I did it right, so that it never infinite loops, and never fails when it really can be done. Only time and testing will tell.

It is exactly the same for logarithms, except in that case, when a new logarithm is algebraic in terms of old ones, it can be written as a linear combination of them. This means that there are never any radicals to worry about, though you do also have to worry about constants. For example, $\log{(x)}$ looks the same as $\log{(2x)}$ because they both satisfy $Dt=\frac{1}{x}$. An example of a logarithm that is algebraic over old ones is $\log{(x^2 - 1)}$ over $\log{(x + 1)}$ and $\log{(x - 1)}$, because $\log{(x^2 - 1)}=\log{((x + 1)(x - 1))}=\log{(x + 1)} + \log{(x - 1)}$.

The parallels between exponentials and logarithms are amazing. For the structure theorems, the exponential case is exactly the same as the logarithmic case except replacing addition with multiplication and multiplication with exponentiation. For the exponential case, you need the arguments of the already added logarithms to find the algebraic dependence, and the arguments of the already added exponentials to find the constant term. For the logarithmic case, you need the arguments of the already added exponentials to find the algebraic dependence, and the arguments of the already added logarithms to find the content term. Everything else is exactly the same, except for the shift in operators. Of course, I realize why these things are, mathematically, but the symmetry still amazing to me. I will hopefully explain in more detail in my future Structure Theorems post.

So onto the risch_integrate() function. Here is the text that I have basically put in my commit message, the aptly numbered issue that I have created for it, and the post to the mailing list (it’s not so much that I am lazy as that I was really excited to get this out there).

I have ready in my integration3 branch a prototype risch_integrate() function that is a user-level function for the full Risch Algorithm I have been implementing this summer. Pull from http://github.com/asmeurer/sympy/tree/integration3.

This is NOT ready to go in. It is a prototype function that I am making available so people can try out the new algorithm and hopefully help me to find the bugs in it. Please pass it your favorite non-elementary integrals and see if it can determine that they are not elementary. If you try to pass it a very crazy function at random, the chances are pretty high that it will not be elementary. So a better way to test it is to come up with a crazy function, then differentiate it. Then pass the derivative and see if it can give you your original function back. Note that it will probably not look exactly the same as your original function, and may differ by a constant. You should verify by differentiating the result you get and calling cancel() (or simplify(), but usually cancel() is enough) on the difference.

So you can review the code too, if you like, but just know that things are not stable yet, and this isn’t strictly a branch for review.

So far, this function only supports exponentials and logarithms.
Support for trigonometric functions is planned. Algebraic functions are
not supported. If the function returns an unevaluated Integral, it means
that it has proven the integral to be non-elementary. Note that several
cases are still not implemented, so you may get NotImplementedError
instead. Eventually, these will all be eliminated, and the only
NotImplementedError you should see from this function is
NotImplementedError(“Algebraic extensions are not supported.”)

This function has not been integrated in any way with the already
existing integrate() yet, and you can use it to compare.

Examples:

In [1]: risch_integrate(exp(x**2), x)
Out[1]:
⌠
⎮  ⎛ 2⎞
⎮  ⎝x ⎠
⎮ ℯ     dx
⌡

In [2]: risch_integrate(x**100*exp(x), x).diff(x)
Out[2]:
100  x
x   ⋅ℯ

In [3]: %timeit risch_integrate(x**100*exp(x), x).diff(x)
1 loops, best of 3: 270 ms per loop

In [4]: integrate(x**100*exp(x), x)
... hangs ...

In [5]: risch_integrate(x/log(x), x)
Out[5]:
⌠
⎮   x
⎮ ────── dx
⎮ log(x)
⌡

In [6]: risch_integrate(log(x)**10, x).diff(x)
Out[6]:
10
log  (x)

In [7]: integrate(log(x)**10, x).diff(x)
Out[7]:
10
log  (x)

In [8]: %timeit risch_integrate(log(x)**10, x).diff(x)
10 loops, best of 3: 159 ms per loop

In [9]: %timeit integrate(log(x)**10, x).diff(x)
1 loops, best of 3: 2.35 s per loop


Be warned that things are still very buggy and you should always verify
results by differentiating. Usually, cancel(diff(result, x) – result)
should be enough. This should go to 0.

So please, please, PLEASE, try out this function and report any bugs that you find. It is not necessary to report NotImplementedError bugs, because I already know about those (I put them in there), and as I mentioned above, they are all planned to disappear. Also, I am continually updating my branch with fixes, so you should do a “git pull” and try again before you report anything.

Also, I am aware that there are test failures. This is because I had to hack exp._eval_subs() to only do exact substitution (no algebraic substitution). It’s just a quick hack workaround, and I should eventually get a real fix.

Finally, I’m thinking there needs to be a way to differentiate between an unevaluated Integral because the integrator failed and an unevaluated Integral because it has proven the integral to be non-elementary. Any ideas?

Also, looking at the integral from the previous blog post, you can get the different results by using the handle_log argument to risch_integrate():

If handle_first == 'log' (the default right now), then it will gather all logarithms first, and then exponentials (insomuch as it can do it in that order). If handle_first='exp', it gathers exponentials first. The difference is that the Risch Algorithm integrates recursively, one extension at a time, starting with the outer-most one. So if you have an expression with both logarithms and exponentials, such that they do not depend on each other, handle_first == 'log' will integrate the exponentials first, because they will be gathered last (be at the top of the tower of extensions), and handle_first == 'exp' will integrate the logarithms first. Right now, I have defaulted to ‘log’ because the exponential integration algorithm is slightly more complete. If you get NotImplementedError with one, it is possible (though I don’t know for sure yet) that you might get an answer with the other.

Also, they can give different looking results, and at different speeds. For example:

Hover over the code and click on the left-most, “view source” icon (a paper icon with < > over it) to view without breaks. Opens in a new window.

In [1]: f = (x*(x + 1)*((x**2*exp(2*x**2) - log(x + 1)**2)**2 +
...: 2*x*exp(3*x**2)*(x - (2*x**3 + 2*x**2 + x + 1)*log(x + 1))))/((x +
...: 1)*log(x + 1)**2 - (x**3 + x**2)*exp(2*x**2))**2

In [2]: f
Out[2]:
⎛                          2                                                   ⎞
⎜⎛                       2⎞                                                   2⎟
⎜⎜     2           2  2⋅x ⎟        ⎛    ⎛           2      3⎞           ⎞  3⋅x ⎟
x⋅(1 + x)⋅⎝⎝- log (1 + x) + x ⋅ℯ    ⎠  + 2⋅x⋅⎝x - ⎝1 + x + 2⋅x  + 2⋅x ⎠⋅log(1 + x)⎠⋅ℯ    ⎠
──────────────────────────────────────────────────────────────────────────────────────────
2
⎛                                    2⎞
⎜   2                  ⎛ 2    3⎞  2⋅x ⎟
⎝log (1 + x)⋅(1 + x) - ⎝x  + x ⎠⋅ℯ    ⎠

In [3]: risch_integrate(f, x, handle_first='log')
Out[3]:
⎛              ⎛ 2⎞⎞                   ⎛                ⎛ 2⎞⎞
⎜log(1 + x)    ⎝x ⎠⎟                   ⎜  log(1 + x)    ⎝x ⎠⎟          ⎛ 2⎞
log⎜────────── + ℯ    ⎟                log⎜- ────────── + ℯ    ⎟       2  ⎝x ⎠
⎝    x             ⎠                   ⎝      x             ⎠      x ⋅ℯ    ⋅log(1 + x)
x + ─────────────────────── - log(1 + x) - ───────────────────────── + ──────────────────────────
2                                       2                                        2
2           3  2⋅x
- x⋅log (1 + x) + x ⋅ℯ

In [4]: risch_integrate(f, x, handle_first='exp')
Out[4]:
⎛                ⎛ 2⎞⎞                   ⎛                ⎛ 2⎞⎞        ⎛ 2⎞
⎜                ⎝x ⎠⎟                   ⎜                ⎝x ⎠⎟        ⎝x ⎠
log⎝log(1 + x) + x⋅ℯ    ⎠                log⎝log(1 + x) - x⋅ℯ    ⎠     x⋅ℯ    ⋅log(1 + x)
x + ───────────────────────── - log(1 + x) - ───────────────────────── - ──────────────────────
2                                        2                                    2
2           2  2⋅x
log (1 + x) - x ⋅ℯ

In [5]: %timeit risch_integrate(f, x, handle_first='log')
1 loops, best of 3: 1.49 s per loop

In [6]: %timeit risch_integrate(f, x, handle_first='exp')
1 loops, best of 3: 1.21 s per loop

In [7]: cancel(risch_integrate(f, x, handle_first='log').diff(x) - f)
Out[7]: 0

In [8]: cancel(risch_integrate(f, x, handle_first='exp').diff(x) - f)
Out[8]: 0


So go now, and pull my branch, and try this function out. And report any problems that you have back to me, either through the mailing list, IRC, issue 2010, or as a comment to this blog post (I don’t really care how).

19 Responses to Prototype risch_integrate() function ready for testing!

1. xavier says:

Hi,

Hum I’m not a git expert…how should I get your branch?
(I’m a linux user…just tell me how to get your git branch :))

• asmeurer says:

Hi.

Don’t worry, git is super easy if you are just using it to download someone’s branch. So first (obviously), you need to install git. It should be available in your Linux package manager. Then, just cd into whatever directory you want SymPy, and run the following commands:

git clone git://github.com/sympy/sympy.git
cd sympy
git fetch asmeurer
git checkout -b integration3 asmeurer/integration3


And you are done. Then you can just run ./bin/isympy, and there should be a risch_integrate() function there for you to work with. If you want to update the branch (which you should do often, because I’m always pushing up fixes/improvements, just do

git pull


and it will do it.

You can run git help command_name to see what each command does. I also created a page on the SymPy wiki about how to do this.

• Miha says:

Fifth command should be

git checkout -b integration3 asmeurer/integration3,

not

git checkout -b asmeurer/integration3 integration3,

right?

• Aaron Meurer says:

Oh, sorry. You are right. I always use the

git checkout asmeurer/integration3
git checkout -b integration3


variant, so I get that confused sometimes. I will fix the original reply.

2. xavier says:

Ok so far I haven’t found a bug… :)

Ho maybe one :
In [44]: risch_integrate((1/3)*x, x)
Out[44]:
2
0.166666666666667⋅x

Why do we get a float?

I can imagine that there is no symbolic fraction in sympy…is that right?

• asmeurer says:

This is one of the disadvantages of using Python. Python evaluates 1/3 to 0.333333… before SymPy gets a chance to even see it. The solution is to never do number/number, always do number*Symbol/number, or you can do

>>> S(1)/3
1/3


By the way, if you just pass it a rational function in x (no exp or log), it is just using the already existing ratint(), which works great, but it’s nothing new.

3. xavier says:

Ok. It looks like it works just fine even in some corners cases
(Except the expected NotImplementedErrors).

I’m gonna test it with more complex inputs…

• asmeurer says:

Great! I am currently in the process of removing those NotImplementedErrors, so don’t forget to do git pull every once in a while. Also, I have fixed a few bugs that I’ve found on my own, so that’s another good reason to do it.

4. schilly says:

About differentiating between “failed” and “proven to be impossible” to integrate: I suggest in both cases you should return a custom exception, returning the unevaluated integral as an included object part of the exception, they are called something like “RischFailure” and “RischNonElementary”…

• asmeurer says:

Well, eventually, risch_integrate() will just be an internal function, and all integration will be done through integrate(). In that case, if risch_integrate() fails, it will just pass it on to the next method, or else return an unevaluated Integral(). I think I am going to create a NonElementaryIntegral class, which would be similar to the already existing unevaluated Integral class, except that it would be smart enough to not try to evaluate the integral, and it might pretty print differently, so that you can tell it’s unevaluated because it’s proven to be non-elementary and not just because it failed.

The question is, what should

>>> pprint(NonElementaryIntegral(exp(-x**2), x))

look like?

5. […] continue to test my prototype risch_integrate() function in my integration3 branch, and tell me what you think (or if you find a bug). Possibly […]

6. Benhuard says:

Integrating ln(x+c) doesn’t work when the constant is not an integer, for instance

risch_integrate(ln(x+1/2),x)

raises

CoercionFailed: expected an integer, got 0.500000000000000

while
risch_integrate(ln(x+S(1)/2),x)

produces
TypeError: Not an iterable container

However,
c=Symbol(‘c’)
risch_integrate(ln(x+c),x)

• asmeurer says:

It works on my computer:

In [1]: risch_integrate(ln(x+S(1)/2),x)
Out[1]:
log(2 + 4⋅x)
──────────── - x + x⋅log(1/2 + x)
2


Which means that one of my uncommitted changes fixes it. As for the 1/2 one, that’s just a general problem that you’re liable to encounter with floating point numbers, because of limitations in the Polys (which hopefully Mateusz will be fixing eventually).

7. Anton says:

It is a pity that Manuel Bronstein is dead before he managed to finish his second book on symbolic integration.

Can you do without the second book? I have read at Wolfram MathWorld that «The case of algebraic extensions is quite complicated and is therefore not completely implemented in any computer algebra system.» Is that true?

• Aaron Meurer says:

Well, a second Bronstein book would have been very nice, as his first is the most clear and well-written math textbook I have ever read. For the algebraic case, I am going to have to rely on the book by Davenport and Trager’s PhD thesis.

But before I even think of implementing the algebraic case, I need to do two things, first, finish the transcendental case, and second, learn some more algebra (I need to learn some more about the theory of algebraic curves to fully understand the algebraic algorithm).

And yes, as far as I know, it is true that no CAS has completely implemented the algebraic case. You see, there are actually three cases, the transcendental case, which deals with purely transcendental functions, the algebraic case, which deals with purely algebraic functions, and the mixed case, which deals with a mix of both (like $\sqrt{\sin{(x)} + 1}$). I believe it is this third case which is the hardest to implement, and is likely what not even Axiom has completely done (though I could be wrong, I haven’t looked into it too deeply yet).

• Anton says:

Can you provide references to the books that you think together are completely describing symbolic integration process?

• Aaron Meurer says:

Well, I haven’t studied the non-transcendental parts to know what really gives a full overview yet, but here’s a try:

If you just want a basic overview (not all the subalgorithms detailed like in Bronstein’s book) see Bronstein’s “Symbolic Integration Tutorial.” You can download it for free at http://www-sop.inria.fr/cafe/Manuel.Bronstein/publications/issac98.ps.gz.

Like I have said, for the transcendental part of the algorithm, all you need is Bronstein’s book.

If you download the front and back matter from Bronstein’s book here, you will see in the preface that he has cited [8, 9, 11, 14, 29, 73, 74, 76, 91] for the algebraic algorithm. 29 and 91 are the Davenport and Trager (respectively) that I mentioned above. I would go with the most recently published things first, as I think that there have been some advancements in the algebraic part of the algorithm since Risch’s original paper.

Beyond that, my only recommendation would be to follow the sources in Bronstein’s “Tutorial” that I linked to above. Sorry if I can’t be super specific, but I really have only studied in depth the transcendental algorithm so far. One thing I can tell you is that aside from Bronstein (and perhaps Davenport) there aren’t really any books that describe the algorithm in detail (books in computer algebra may detail an outline, like Bronstein’s “Tutorial”). So if you wanted to learn the whole thing in enough detail to implement it, you will have to use other literature, like journal articles and PhD theses. If you don’t care about enough detail to implement the algorithm, I think Bronstein’s “Tutorial” should be sufficient to give a basic overview (assuming you have the algebraic background to understand what he is talking about).

By the way, what is your reason for wanting such a list?

8. […] it is computed. To take an example from issue 2010, the issue about risch_integrate() (you may also recognize this example if you are a regular reader of this blog), the handle_first keyword argument to […]

9. homework help…

[…]Prototype risch_integrate() function ready for testing! « Aaron Meurer's SymPy Blog[…]…