- Written in
**Python** - Uses numpy and scipy to do
**both exact and floating-point computation**(depending on the nature of the input) - Today I won't discuss this related software: http://github.com/ketch/RK-Opt

- This talk is an
**IPython notebook**that you can download and run yourself: - Or you can run it directly on Wakari:

By this I don't mean software implementations of numerical algorithms, but rather *software for conducting numerical analysis*.

- Most of the code in NodePy is code I would write anyway for my research

- I spend a little more time to
**organize, document**, and make it**publicly available**, so that:- Code from different projects can be used together
- I can remember how to use my code one or two years later
- Others can use my code too
- Errors in my code are more likely to be detected

- In this talk I will try to show by example that such software is useful to those who
**study**numerical methods - Such software can also be useful to those who
**use**numerical methods, by helping them determine which methods to use

In [1]:

```
from nodepy import *
import sympy
one = sympy.Rational(1)
import numpy as np
from sympy import init_printing
%matplotlib inline
import matplotlib.pyplot as plt
```

NodePy knows about many common methods:

In [2]:

```
rk4 = rk.loadRKM('RK44')
print rk4
```

In [3]:

```
ab6=lm.Adams_Bashforth(6)
print ab6.alpha, ab6.beta
```

[0 0 0 0 0 -1 1] [-95/288 959/480 -3649/720 4991/720 -2641/480 4277/1440 0]

You can also make up your own methods -- just give the coefficients:

In [4]:

```
A = np.array([[0, 0,],[one/2, one/2]])
b = np.array([one/2,one/2])
itrap = rk.RungeKuttaMethod(A,b)
print itrap
```

Runge-Kutta Method 0 | 1 | 1/2 1/2 _____|__________ | 1/2 1/2

What do we know about these methods? We may be interested in their

- stability (linear, nonlinear)
- accuracy (order, leading coefficients)
- storage requirements: how much memory do we need when using them?
- potential for concurrency: can we compute multiple stages in parallel?
*symplecticity*- ...

In [5]:

```
S=rk4.plot_stability_region()
```

Here is something a little more difficult:

In [6]:

```
ab6.plot_boundary_locus()
```

In [7]:

```
ab6.plot_stability_region()
```

Notice that NodePy automatically finds the part of the stable region that is connected to the origin and sets the plot bounds appropriately.

Next let's check the order of each method.

In [8]:

```
print rk4.order()
print itrap.order()
print ab6.order()
```

4 2 6

This we knew already. But how badly does RK4 violate the 5th order conditions?

In [54]:

```
print rk4.order_conditions(5)
print rk4.principal_error_norm()
```

[1/576 -1/288 1/96 0 1/288 -1/96 0 1/96 1/120] 0.0145045823432

The first line gives the amounts by which RK4 violates the order condition corresponding to each of the order-five rooted trees. The second gives the \(L_2\) norm of those coefficients.

In [10]:

```
forest = rt.list_trees(5)
print forest
```

And here are those trees.

In [53]:

```
T=rt.plot_all_trees(5)
```

- Runge-Kutta (explicit and implicit)
- Runge-Kutta-Chebyshev methods
- Strong stability preserving methods
- Embedded pairs
- Low-storage RK methods and pairs

- Linear multistep (explicit and implicit)
- General linear (multistep, multistage) methods
- Two- and three-step Runge-Kutta

- Iterative methods
**Extrapolation****Deferred correction**

- ...

Having all these methods and the ability to query their properties makes it trivial to ask some interesting questions.

Let's take a look at extrapolation methods. We'll look at methods obtained by extrapolating explicit Euler and the explicit midpoint rule.

Note: this part of the talk comes from a paper that is joint work with student Umair bin Waheed (KAUST).

In [12]:

```
from IPython.display import HTML
HTML('<iframe src=http://arxiv.org/abs/1305.6165?format=mobile width=800 height=350></iframe>')
```

Out[12]:

We usually think of extrapolation as an iterative process, but if we know beforehand how many iterations we'll use, then it's just a kind of Runge-Kutta method.

In [13]:

```
ex4E = rk.extrap(4)
ex4M = rk.extrap(2,'midpoint')
print ex4E
print ex4M
```

In [14]:

```
S=ex4E.plot_stability_region()
S=ex4M.plot_stability_region()
```

Extrapolation methods have a lot of potential for parallelism (see this review paper):

In [15]:

```
rk_graph.plot_dependency_graph(ex4E)
```

In [16]:

```
print len(ex4E)
print ex4E.num_seq_dep_stages()
```

7 4

In [17]:

```
from IPython.display import Image
Image('./parallel_table.png')
```

Out[17]:

NodePy also understands embedded Runge-Kutta pairs. Let's look at some pairs based on extrapolation.

In [18]:

```
ex54 = rk.extrap_pair(5)
print ex54
```

Let's compare this pair with Fehlberg's pair.

In [19]:

```
fehlberg = rk.loadRKM('Fehlberg45')
print fehlberg.order()
print fehlberg.embedded_method.order()
```

5 4

In [20]:

```
problem = ivp.load_ivp('vdp')
methods = [fehlberg,ex54]
tol = [10.**(-m) for m in range(2,8)]
[work,err] = conv.ptest(methods,problem,tol,verbosity=0)
```

The relatively poor performance of the extrapolation method may seem puzzling if we just compare the size of the leading error term coefficients:

In [21]:

```
print ex54.principal_error_norm()
print fehlberg.principal_error_norm()
```

0.00184975286137 0.00335574469285

Instead, the difference in performance is related to the number of stages used by each method:

In [22]:

```
print len(ex54)
print len(fehlberg)
```

11 6

In [23]:

```
Image('./parallel_table.png')
```

Out[23]:

Notice that the number of stages grows quadratically with the order! Most interest in extrapolation methods is focused on methods of very high order, for which it is difficult to directly develop Runge-Kutta methods.

There is another well-known class of very high order integrators: the deferred correction methods.

In [24]:

```
dc109 = rk.DC_pair(9)
```

There are a few 10th-order Runge-Kutta methods out there; the one we've had the most success with is due to Curtis (1975):

In [37]:

```
from nodepy import loadmethod
curtis10 = loadmethod.load_rkpair_from_file('rk108curtis.txt')
```

Which of these 10th order methods is most efficient in general?

In [26]:

```
ex108 = rk.extrap_pair(5,'midpoint')
methods = [ex108,dc109,curtis10]
problem = ivp.detest('D1')
[work,err] = conv.ptest(methods,problem,tol,verbosity=0)
```

At 10th order, extrapolation is competitive. Why is deferred correction so inefficient?

In [27]:

```
print curtis10.principal_error_norm()
print ex108.principal_error_norm()
print dc109.principal_error_norm()
```

7.68312478367e-07 1.44835573043e-06 6.42205612914e-08

It's definitely not the error coefficients -- DC has the smallest! What about the number of stages?

In [28]:

```
print len(curtis10)
print len(ex108)
print len(dc109)
```

21 26 82

In [29]:

```
Image('./fulltable.png')
```

Out[29]:

Given that extrapolation is competitive in serial, it seems that if implemented in parallel, it should have a significant advantage over traditional RK methods.

One immediate question is, can the theoretical parallel speedup be achieved in practice?

We examined this by adding a single OpenMP directive to Ernst Hairer's well-known odex code. We tested the speedup using a gravitational \(N\)-body problem with 400 bodies.

Here's what you get with the most naive approach (dynamic scheduling):

In [30]:

```
Image(filename='odex_speedup.png')
```

Out[30]:

By doing slightly more work (another 10 lines of code), you can get (nearly) the theoretical maximum speedup with the expected number of processors.

In [31]:

```
Image('odex_speedup_load_balanced.png')
```

Out[31]:

Something curious happens when one uses very high order Euler extrapolation methods at tight tolerances.

\[\begin{align} x''(t) & = -x/r^3 & x(0) & = 0.7 & x'(0) & = 0,\\ y''(t) & = -y/r^3 & y(0) & = 0 & y'(0) & = \sqrt{13/7}, \\ r^2 & = x^2 + y^2. \end{align}\]

In [32]:

```
f45 = rk.loadRKM('Fehlberg45')
ex = rk.extrap_pair(12)
problem = ivp.detest('D2')
```

In [55]:

```
tolerances = [1.e-3,1.e-4,1.e-5,1.e-6,1.e-7,1.e-8,1.e-9,1.e-10,1.e-11,1.e-12,1.e-13,1.e-14]
errors = [[],[]]
work = [[],[]]
tols = [[],[]]
for imeth, method in enumerate((f45,ex)):
print method.name
for tol in tolerances:
if (imeth==1 and tol<1.e-10): continue
t,y = method(problem,errtol=tol,dt=0.01)
errors[imeth].append(np.abs(y[-1][0]+0.177702735714040))
work[imeth].append(len(t))
tols[imeth].append(tol)
```

In [41]:

```
plotstyles = ['-ok','--sb','-.dr']
for i in range(2):
plt.loglog(tols[i],errors[i],plotstyles[i],linewidth=3,markersize=10)
plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Input tolerance')
plt.ylabel('Measured error')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)
```

In [42]:

```
for i in range(2):
plt.loglog(tols[i],work[i],plotstyles[i],linewidth=3,markersize=10)
plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Measured error')
plt.ylabel('Time steps taken')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)
```

What's going on here?

An \(s\)-stage RK method can be written

\[\begin{align} y_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, y_j) & (1\leq i \leq s), \\ u_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, y_j). \end{align}\]

But in reality there are (roundoff) errors in the stages: \[\begin{align} \tilde{y}_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_i & (1\leq i \leq s), \\ \tilde{u}_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_{s+1}. \end{align}\]

Of course, nobody implements extrapolation in Butcher form; they use a certain Shu-Osher form. With roundoff errors, it reads:

\[\begin{align} \tilde{y}_i &= v_i u_n + \sum_{j=1}^s (\alpha_{ij} \tilde{y}_j + \Delta t \beta_{ij}f(t_n+c_j \Delta t, \tilde{y}_j) ) + \tilde{r}_i & (1 \leq i \leq s+1), \\ \tilde{u}_{n+1} &= \tilde{y}_{s+1} \end{align}\]

This implementation choice makes a big difference in the propagation of roundoff errors.

Internal stability can be understood by considering the linear, scalar (\(m=1\)) initial value problem

\[\begin{align} \begin{cases} u'(t) = \lambda u(t), & \lambda \in {\mathbb C} \\ u(t_0) = u_0. \end{cases} \end{align}\]

Then we find that

\[\begin{equation} \begin{split} \epsilon_{n+1} & = P(z) \epsilon_n + Q(z;\alpha,\beta) \tilde{r}_{1:s} + \tilde{r}_{s+1} \end{split} \end{equation}\]

where \(Q\) is a vector of *internal stability polynomials*.

**Definition**: The *maximum internal amplification factor* of a RK method is

\[\begin{align} {\mathcal M}(\alpha,\beta):= \max_{j} \sup_{z \in {\mathcal S}} |Q_{j}(z)|, \end{align}\]

We can consider the whole stability region \({\mathcal S}\), but when the step size is very small, the value at the origin is most relevant:

\[\begin{align} {\mathcal M_0} = \max_j |Q_j(0)|. \end{align}\]

In [46]:

```
print fehlberg.maximum_internal_amplification()
```

Out[46]:

(5.3895917900074943, 0.0)

In [47]:

```
print ex.maximum_internal_amplification()
```

(332603.42306967499, 137786.59611992945)

In [48]:

```
print ex.maximum_internal_amplification(use_butcher=True)
```

(169897.66258197918, 0.0)

In [52]:

```
Image(filename='mtable.png')
```

Out[52]:

In [51]:

```
Image(filename='internal_errors.png')
```

Out[51]:

The last portion of the talk is taken from joint work with Lajos Loczi (KAUST) and Matteo Parsani (NASA):

In [57]:

```
HTML('<iframe src=http://arxiv.org/abs/1309.1317?format=mobile width=800 height=350></iframe>')
```

Out[57]:

In []: