Differences Between Methods for Solving Stiff ODEs


April 6 2024 in Differential Equations, Mathematics | Tags: | Author: Christopher Rackauckas

I found these notes from August 2018 and thought they might be useful so I am posting them verbatim.

A stiff ordinary differential equation is a difficult problem to integrator. However, many of the ODE solver suites offer quite a few different choices for this kind of problem. DifferentialEquations.jl offers almost 200 different choices for example. In this article we will dig into what the differences between these integrators really is so that way you can more easily find which one will be most efficient for your problem.

Quick Overview (tl;dr)

  1. A BDF, Rosenbrock, ESDIRK method are standard
  2. For small equations, Rosenbrock methods have performance advantages
  3. For very stiff systems, Rosenbrock and Rosenbrock-W methods do not require convergence of Newton’s method and thus can take larger steps, being more efficient
  4. BDF integrators are only L-stable (and A-stable) to order 2, so if the problem is more than semi-stiff these methods might need extra time steps, decreasing efficiency
  5. ESDIRK, Rosenbrock-W, FIRK, and BDF methods can reuse factorized Jacobians between steps which has a greater effect on the former than the latter.
  6. When the ODE system is too large to factorize the Jacobians, the BDF methods can be advantageous since they only require one implicit solve per step
  7. Factorizations and linear solves can be made cheaper by using approximate factoriations and IMEX formulations
  8. The multistep methods are more greatly effected by event handling than the one-step methods
  9. Exponential integrators may be more optimized on stiff ODEs without mass matrices than the implicit methods, and this has been shown in some cases, but more testing is required
  10. For semi-stiff equations with cheap derivative calls, Runge-Kutta Chebyschev methods can be the most efficient by avoiding factorizations altogether

The “Standard”: Multistep BDF Methods

Let’s start with what has been adopted as the “standard” method for stiff integrators. I put that in quotations because it’s not necessarily a standard by any means. What I mean is that multistep BDF methods have been the go-to method for many people since the first incarnation of the FORTRAN GEAR code. Since then, updated versions of this same code went under the names LSODE, LSODA, EPISODE, VODE, and CVODE, each adding a new component. LSODE was an improvement upon the original GEAR code in terms of interface and had spin offs like LSODR which included root finding (for event handling). LSODA added the ability to automatically switch between Adams (for non-stiff equations) methods and BDF methods using a stiffness detection estimate (this feature was not adopted in future versions).

The difference between EPISODE and the previous codes is an important change. Multistep BDF methods utilize history, i.e. points from the past. For example, the second order BDF method can be written as:

$$ y_{n+2} – \frac{4}{3} y_{n+1} + \frac{1}{3} y_n = \frac{2}{3} h f(t_{n+2}, y_{n+2}) $$

(the Lowarance Livermore set of solvers, i.e. this set being discussed, don’t explicitly use the history but instead have it encoded in a Nordsieck vector but the same idea applies). This formula is written in a form that requires that the previous $$\Delta t$$ equals the current $$\Delta t$$, so one might ask how adaptive time stepping can be done with such a formula. The first idea was to keep using the same fixed timestep formula but interpolate the history to a new point. Using a few values of your history, you can write an interpolating polynomial to change the time point at which you have historical values, and then move it to the correct time point so that way it’s lined up with the new $$\Delta t$$. This is the breakthrough of GEAR that’s in the LS line of codes. The implementations of BDF integrators by Shampine, i.e. DDEBDF and ode15s, along with some codes for differential-algebraic equations like DASSL and ode15i, utilize this same trick. The QNDF and QBDF methods of DifferentialEquatons.jl are of this form.

(Note: Because multistep methods utilize the history, they must reset to order 1 when events occur. This greatly increases the number of time steps if events are common and thus these methods should only be used if events are rare!)

While this methodology makes it simple to support adaptive time stepping in any multistep integrator, the lack of precision due to the interpolation decreases the real stability of the solver. Thus people investigated alternative ways to handle adaptive time stepping in multistep methods. You can instead re-derive these formulas so that way it does not assume adaptive time stepping. This is the basis of the EPISODE and VODE integrators. But there’s an added step. Note that BDF methods require solving an implicit system

$$(I – \gamma J)x = b$$

at every step where $$\gamma$$ is due to the coefficient on the $$u_{n+1}$$ term. If this term is independent of $$\Delta t$$ then factorizations can be re-used between different steps, a fact that can be used to greatly increase efficiency since solving this linear system is the most costly part of the integration. Thus the variable time step formulas can be written in what’s known as Fixed-Leading Coefficient (FLC) form. EPSIODE and VODE are methods in FLC form, and CVODE is a rewrite of VODE to C++ as the SUNDIALS suite. The JVODE method under development in DifferentialEquations.jl is of this form as well.

Given the improved stability of the variable time step formulas over the interpolation based forms, these later integrators are the recommended BDF methods to use on stiff ODEs when available.

Rosenbrock(-W) and (Explicit first step) Singly-Diagonally Implicit Runge-Kutta Methods ((E)SDIRK)

To start making comparisons, let’s introduce the second most used class of methods. These are the Rosenbrock methods and the (E)SDIRK methods. These methods are very highly related. An SDIRK method is an implicit Runge-Kutta method which is only implicit in the diagonal. In a more concrete sense, this means that if you write the method in the abstract form

where the coefficients are known as a Butcher Tableau, the coefficients of an explicit Runge-Kutta method are lower triangular and for an SDIRK method they are lower triangular plus the diagonal. This means that the next step is only implicit in terms of the new step variable, meaning that the implementation of an SDIRK method decomposes to become the solution of $$S$$ implicit equations for $$S$$ stages (and an ESDIRK method has $$S-1$$ implicit stages since it has the first stage as explicit, hence the E).

If instead of fully solving the implicit equations in full one only wanted to do one iteration of Newton’s method, then the method can be written out as explicit parts and a single $$(I-\gamma J)x = b$$ linear system to solve per stage. When this is done one has a Rosenbrock method. A standard Rosenbrock method then requires that the Jacobian is part of the error analysis for the convergence requirement. This can be a strict requirement since then it means you need $$J$$ nearly exact for high order convergence (i.e. utilizing an analytical solution for the Jacobian or automatic differentiation to avoid numerical errors). However, one can develop Rosenbrock methods where the Jacobian doesn’t explicitly show up in the order calculations, and these are Rosenbrock-W or RosW methods.

In MATLAB, there exists the ode2tb method which is a ESDIRK TRBDF2 integrator, and the ode2t method which is an SDIRK Trapezoid integrator. However, both are only second order and thus do not exploit the full efficiency of this class. The ode23s method is a Rosenbrock-W method, also of order 2. In DifferentialEquations.jl there are plenty of these methods, but ones to note are the KenCarp4 and Kvaerno4 4th order ESDIRK methods, Rosenbrock23 which is a second order Rosenbrock-W method, and Rodas4 and Rodas5 which are 4th and 5th order Rosenbrock methods respectively.

Cost/Benefit Analysis Between Multistep BDF, Rosenbrock, and (E)SDIRK

Multistep BDF methods only have to solve a single implicit equation per step while Rosenbrock and (E)SDIRK have to solve a few, so it may seem like BDF methods are clearly better. However, there are multiple factors to consider. First of all, the BDF methods of order greater than 2 are not L-stable (not even A-stable), so if your problem is more than semi-stiff then the BDF methods will not effectively utilize the methods above order 2. By constraining the order, the time steps of the BDF integrator shrink in order to still hit the error estimates and so it will take more steps than the other schemes, off-setting decreased cost per step. Additionally, the Rosenbrock and ESDIRK methods can optimize their leading order coefficients. The leading order coefficients are the coefficients of the highest truncation error terms and by decreasing these one gets a method with less error per step. The result of these method optimizations is that once again the Rosenbrock and ESDIRK methods require less steps.

But the most significant effect is factorization caching. Although the Rosenbrock and ESDIRK methods need to solve multiple implicit systems per time step, all of them involve the same $$I – \gamma J$$ linear system. Thus this can be solved by only performing single factorization. Since the dominating cost of an solving a linear system is overwhelmingly this factorization, this puts the total computational cost of the Rosenbrock and ESDIRK methods roughly on par with the BDF methods.

Therefore, when the implicit and linear systems can be solved with matrix factorizations, Rosenbrock and ESDIRK methods have a big advantage over the traditional BDF methods since they can utilize higher order and more error-optimized methods and thus take larger time steps with similar cost per step. However, notice the “when the implicit and linear systems can be solved with matrix factorizations”. These factorizations do not scale well with system size, and thus when the Jacobian is large enough factorization methods cannot be used. By declaring the sparsity structure for the Jacobian, one can use a sparse factorization (or if the matrix has a specific structure something like a banded QR) to increase this ceiling, but this requires some user input. Sooner or later, one will need to utilize iterative Krylov subspace methods like GMRES to solve the linear system. In this case, there can still be tricks to save the Krylov subspace between steps, but requiring only a single implicit solve per step is an advantage in this case. Therefore BDF methods tend to do well for the largest of problems, but below that size the ESDIRK and Rosenbrock methods do between.

Note that the factorization of $$I – \gamma J$$ only needs to be approximate for the ESDIRK and BDF methods since it’s only used as the line search in the Newton method for the implicit solves. Therefore, even if $$J$$ changes, the same factorization can be reused between time steps in order to decrease the total number of required factorizations. This is not possible with standard Rosenbrock methods because of the requirement of an exact(ish) Jacobian in the order calculations. However, the Rosenbrock-W methods do not have this requirement and thus they can reuse factorizations from previous time steps as well.

In the other direction, the Rosenbrock methods explicitly specialize on the form of the linear solve and so if the factorization is not so large of a portion of the computational cost these methods can be the most efficient. Additionally, implicit integrators require that the Newton method converges which can fail to happen in cases which are very stiff or bad predictors are given. Since Rosenbrock methods do not require doing a full implicit solve and only require the linear system part, these methods can be more stable on very stiff equations.

Quick Note: Implicit-Explicit (IMEX) Methods and Approximate Factorization

Since the factorization or linear solve with $$I – \gamma J$$ is the most expensive portion of the calculation, you might ask whether it is possible to decrease the cost of said factorization. There are two ways this is being done. The first is IMEX methods. These methods split the ODE into two functions:

$$u’ = f_1 + f_2$$

and then only are implicit in the first part (hence implicit-explicit). Then if the user chooses a function $$f$$ with a simpler Jacobian, then the cost of the factorizations can be greatly decreased. This is thus used with solvers for partial differential equations (PDEs) since the terms with the derivative operators tend to have nice sparse structures (banded) which can then be exploited to make the factorizations fast, while remaining terms can be handled explicitly if they are non-stiff. The SBDF methods like SBDF2 from DifferentialEquations.jl are IMEX BDF integrators while the KenCarp integrators like KenCarp4 from DifferentialEquations.jl are examples of ESDIRK IMEX integrators. The same analysis as before applies to the difference between the classes in this case.

Additionally, if you wanted to do an implicit solve on both $$f_1$$ and $$f_2$$, then one can utilize the fact that:

$$(I – \gamma J)^{-1} \approx (I – \gamma J_1)^{-1}(I – \gamma J_2)^{-1} $$

i.e. the factorization of the full $$f$$ can be approximated by factorization of the two pieces. If both of the Jacobians are easy to specialize the factorization on then this can increase the performance as well. This can be applied to any of the implicit methods, even the FIRK methods to be discussed next.

Fully Implicit Runge-Kutta Methods (FIRK)

The ESDIRK methods were structured so that way their Butcher tableau was lower triangular plus the diagonal. This gave a lot of advantages since then each stage could be solved independently, but it limited the order of the resulting methods to 5. When this limitation is lifted you get the fully-implicit Runge-Kutta methods which are abbreviated as the FIRK methods. The FIRK methods do not have an order limitation, and the classic class of FIRK integrators is the Radau class which has 3rd, 5th, 9th, and 13th order formulas employed in Harier’s classic adaptive order radau integrator.

While these methods can achieve a much higher order, they also need to solve a much larger Jacobian equation since all of the steps need to be solved together. Thus if the system is of size $$N$$ and there are $$S$$ stages, then the Jacobian for the implicit solver is of size $$NS$$. Since factorizations are cubic, the cost of factorizations on this integrator are $$\mathcal{O}((NS)^3)$$ as opposed to $$\mathcal{O}(N^3)$$ multiple times which is for ESDIRK. However, since factorizations can be parallelized, there can be a tradeoff here. But still, for very large PDE systems this does not scale well. On the other hand, the high order is much more efficient when high accuracy is necessary, and these methods tend to do well by decreasing the required number of steps. Generally, the DiffEqBenchmarks have found these methods to only do well when high accuracy is necessary though, but they tend to dominate this area.

Exponential Integrators

Moving away from the implicit integrators we have the exponential integrators. These methods take a very different form since they are described for the semilinear system

$$u’ = Au + f$$

and attempt to solve the linear part exactly with some version of $$exp(\Delta t A)$$ (the solution to the linear ODE). On a standard ODE, once can pull the Jacobian out of the implicit equation, and so

$$u’ = f = J – (J – f) = J + g$$

is the Exponential Rosenbrock trick to apply an exponential integrator on any first-order ODE by pulling out the Jacobian (and related EPRIK methods are similar). Since the linear part is solved exactly, this method can handle stiff equations since the remaining portion is non-stiff and can be integrated explicitly.

While this avoids having to solve implicit equations, it does require the solution of $$exp(\Delta t A)$$ which is a significant computation itself. Thus for these exponentials one can either utilize a dense solve or utilize a Krylov method.

The jury is still out as to whether these methods are more efficient than the more traditional implicit integrators, but there are lots of indicators that they are or at least will be. A FEW DIFFERENT SOURCES SHOW PROMISING RESULTS. Also, I think that at a more heuristic level it can be proven. In a previous blog post I described the idea that the more information an integrator encodes about a problem the more efficient it can be. This principle applies to this case. Notice that the implicit integrators can in principle solve equations with mass matrices

$$Mu’ = f$$

even when this mass matrix is singular, which gives you a differential-algebraic equation or an ODE which allows constraint equations. Thus the implicit methods can handle a larger class of equations, and by this principle they should be less efficient than the most optimized exponential integrators. These methods have been developed in DifferentialEquations.jl and methods to note are EPIRK5P1 and EXPRB53s3 which are 5th order EPIRK methods.

Runge-Kutta Chevyschev Methods

Explicit Runge-Kutta methods have bounds on their stability regions which is what makes them not able to be used on stiff ODEs. What if someone made explicit Runge-Kutta methods with really big bounds? That’s the Runge-Kutta Chevyschev methods: explicit Runge-Kutta methods with optimal stability regions. A recurrence relation is developed so that way a stable method can be constructed by chaining a bunch of explicit Runge-Kutta stages. The result is a method which has enlarged stability regions, and the number of recurrences can be chosen so that the stability region is large enough to handle the time step. The ROCK methods are methods of this category, such as the ROCK2 method in DifferentialEquations.jl

These methods thus do not require factorizations (or exponentials), but they instead require a lot more $$f$$ calls. Therefore if the ODE is only semi-stiff and calculations of $$f$$ are cheap, these methods can be the most efficient. Otherwise it can get more costly by requiring many $$f$$ calls.

Write a Reply or Comment

Your email address will not be published. Required fields are marked *


*

This site uses Akismet to reduce spam. Learn how your comment data is processed.