Solving Systems of Stochastic PDEs and using GPUs in Julia

December 11 2017 in CUDA, Differential Equations, Julia, Mathematics, Stochastics | Tags: | Author: Christopher Rackauckas

What I want to describe in this post is how to solve stochastic PDEs in Julia using GPU parallelism. I will go from start to finish, describing how to use the type-genericness of the DifferentialEquations.jl library in order to write a code that uses within-method GPU-parallelism on the system of PDEs. This is mostly a proof of concept: the most efficient integrators for this problem are not compatible with GPU parallelism yet, and the GPU parallelism isn’t fully efficient yet. However, I thought it would be nice to show an early progress report showing that it works and what needs to be fixed in Base Julia and various libraries for us to get the full efficiency.

Edit May 2019

As of DifferentialEquations.jl v6.4.0, this is no longer a proof of concept. The whole library, including implicit solvers with GMRES, etc., and same for SDEs, DAEs, DDEs, etc. are all GPU-compatible with a fast form of broadcast. This has been optimized and made efficient. The only methods which are not are the non-native Julia ones, like Sundials. Some of this blog post has been editing towards the newer version of the GPU code.

Our Problem: 2-dimensional Reaction-Diffusion Equations

The reaction-diffusion equation is a PDE commonly handled in systems biology which is a diffusion equation plus a nonlinear reaction term. The dynamics are defined as:

$$u_t = D \Delta u + f(t,u)$$

But this doesn’t need to only have a single “reactant” u: this can be a vector of reactants and the $$f$$ is then the nonlinear vector equations describing how these different pieces react together. Let’s settle on a specific equation to make this easier to explain. Let’s use a simple model of a 3-component system where A can diffuse through space to bind with the non-diffusive B to form the complex C (also non-diffusive, assume B is too big and gets stuck in a cell which causes C=A+B to be stuck as well). Other than the binding, we make each of these undergo a simple birth-death process, and we write down the equations which result from mass-action kinetics. If this all is meaningless to you, just understand that it gives the system of PDEs:

$$A_t = D \Delta A + \alpha_A(x) – \beta_A A – r_1 A B + r_2 C$$
$$B_t = \alpha_B – \beta_B B – r_1 A B + r_2 C$$
$$C_t = \alpha_C – \beta_C C + r_1 A B – r_2 C$$

One addition that was made to the model is that we let $$\alpha_A(x)$$ be the production of $$A$$, and we let that be a function of space so that way it only is produced on one side of our equation. Let’s make it a constant when x>80, and 0 otherwise, and let our spatial domain be $$x \in [0,100]$$ and $$y \in [0,100]$$.

This model is spatial: each reactant $$u(t,x,y)$$ is defined at each point in space, and all of the reactions are local, meaning that $$f$$ at spatial point $$(x,y)$$ only uses $$u_i(t,x,y)$$. This is an important fact which will come up later for parallelization.

Discretizing the PDE into ODEs

In order to solve this via a method of lines (MOL) approach, we need to discretize the PDE into a system of ODEs. Let’s do a simple uniformly-spaced grid finite difference discretization. Choose $$dx = 1$$ and $$dy = 1$$ so that we have 100*100=10000 points for each reactant. Notice how fast that grows! Put the reactants in a matrix such that A[i,j] = $$A(x_j,y_i)$$, i.e. the columns of the matrix is the $$x$$ values and the rows are the $$y$$ values (this way looking at the matrix is essentially like looking at the discretized space).

So now we have 3 matrices (A, B, and C) for our reactants. How do we discretize the PDE? In this case, the diffusion term simply becomes a tridiagonal matrix $$M$$ where $$[1,-2,1]$$ is central band. You can notice that $$MA$$ performs diffusion along the columns of $$A$$, and so this is diffusion along the $$y$$. Similarly, $$AM$$ flips the indices and thus does diffusion along the rows of $$A$$ making this diffusion along $$x$$. Thus $$D(M_yA + AM_x)$$ is the discretized Laplacian (we could have separate diffusion constants and $$dx \neq dy$$ if we want by using different constants on the $$M$$, but let’s not do that for this simple example. I’ll leave that as an exercise for the reader). I enforced a Neumann boundary condition with zero derivative (also known as a no-flux boundary condition) by reflecting the changes over the boundary. Thus the derivative operator is generated as:

const Mx = Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])
const My = copy(Mx)
# Do the reflections, different for x and y operators
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0

I also could have done this using the DiffEqOperators.jl library, but I wanted to show what it truly is at its core.

Since all of the reactions are local, we only have each point in space react separately. Thus this represents itself as element-wise equations on the reactants. Thus we can write it out quite simply. The ODE which then represents the PDE is thus in pseudo Julia code:

DA = D*(M*A + A*M)
@. DA + α₁ - β₁*A - r₁*A*B + r₂*C
@. α₂ - β₂*B - r₁*A*B + r₂*C
@. α₃ - β₃*C + r₁*A*B - r₂*C

Note here that I am using α₁ as a matrix (or row-vector, since that will broadcast just fine) where every point in space with x<80 has this zero, and all of the others have it as a constant. The other coefficients are all scalars. How do we do this with the ODE solver?

Our Representation via Views of 3-Tensors

We can represent our problem with a 3-dimensional tensor, taking each 2-dimensional slice as our (A,B,C). This means that we can define:

u0 = zeros(N,N,3)

Now we can decompose it like:

   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]

These views will not construct new arrays and will instead just be pointers to the (contiguous) memory pieces, so this is a nice and efficient way to handle this. Together, our ODE using this tensor as its container can be written as follows:

function f(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  DA = D*(M*A + A*M)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

where this is using @. to do inplace updates on our du to say how the full tensor should update in time. Note that we can make this more efficient by adding some cache variables to the diffusion matrix multiplications and using mul!, but let’s ignore that for now.

Together, the ODE which defines our PDE is thus:

prob = ODEProblem(f,u0,(0.0,100.0))
sol = solve(prob,ROCK2())

if I want to solve it on $$t \in [0,100]$$. Done! The solution gives back our tensors (and interpolates to create new ones if you use sol(t)). We can plot it in Plots.jl

and see the pretty gradients. Using this 3rd order explicit adaptive Runge-Kutta method we solve this equation in about 40 seconds. That’s okay.

Some Optimizations

There are some optimizations that can still be done. When we do A*B as matrix multiplication, we create another temporary matrix. These allocations can bog down the system. Instead we can pre-allocate the outputs and use the inplace functions mul! to make better use of memory. The easiest way to store these cache arrays are constant globals, but you can use closures (anonymous functions which capture data, i.e. (x)->f(x,y)) or call-overloaded types to do it without globals. The globals way (the easy way) is simply:

const MyA = zeros(N,N)
const AMx = zeros(N,N)
const DA = zeros(N,N)
function f(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

For reference, closures looks like:

MyA = zeros(N,N)
AMx = zeros(N,N)
DA = zeros(N,N)
function f_full(du,u,p,t,MyA,AMx,DA)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
f = (du,u,p,t)-> f_full(du,u,p,t,MyA,AMx,DA)

and a call overloaded type looks like:

struct MyFunction{T} <: Function
  MyA::T
  AMx::T
  DA::T
end
 
# Now define the overload
function (ff::MyFunction)(du,u,p,t)
  # This is a function which references itself via ff
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(ff.MyA,My,A)
  mul!(ff.AMx,A,Mx)
  @. ff.DA = D*(ff.MyA + ff.AMx)
  @. dA = f.DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
MyA = zeros(N,N)
AMx = zeros(N,N)
DA = zeros(N,N)
 
f = MyFunction(MyA,AMx,DA)
# Now f(du,u,p,t) is our function!

These last two ways enclose the pointer to our cache arrays locally but still present a function f(du,u,p,t) to the ODE solver.

Now since PDEs are large, many times we don’t care about getting the whole timeseries. Using the output controls from DifferentialEquations.jl, we can make it only output the final timepoint.

sol = solve(prob,ROCK2(),progress=true,save_everystep=false,save_start=false)

Also, if you’re using Juno this’ll give you a nice progress bar so you can track how it’s going.

Quick Note About Performance

We are using the ROCK2 method here because it’s a method for stiff equations with eigenvalues that are real-dominated (as opposed to dominated by the imaginary parts). If we wanted to use a more conventional implicit ODE solver, we would need to make use of the sparsity pattern. This Gist shows how to use SparseDiffTools.jl to perform matrix coloring and specialize the ODE solver on the sparsity pattern. It turns out that ROCK2 is more efficient anyways (and doesn’t require sparsity handling), so we will keep this setup.

The Full ODE Code

As a summary, here’s a full PDE code:

using OrdinaryDiffEq, RecursiveArrayTools, LinearAlgebra
 
# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 100
const X = reshape([i for i in 1:100 for j in 1:100],N,N)
const Y = reshape([j for i in 1:100 for j in 1:100],N,N)
const α₁ = 1.0.*(X.>=80)
 
const Mx = Array(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
u0 = zeros(N,N,3)
 
const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N)
# Define the discretized PDE as an ODE function
function f(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
sol = solve(prob,ROCK2(),progress=true,save_everystep=false,save_start=false)
 
using Plots; pyplot()
p1 = surface(X,Y,sol[end].x[1],title = "[A]")
p2 = surface(X,Y,sol[end].x[2],title = "[B]")
p3 = surface(X,Y,sol[end].x[3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))

Making Use of GPU Parallelism

That was all using the CPU. How do we make turn on GPU parallelism with DifferentialEquations.jl? Well, you don’t. DifferentialEquations.jl “doesn’t have GPU bits”. So wait… can we not do GPU parallelism? No, this is the glory of type-genericness, especially in broadcasted operations. To make things use the GPU, we simply use a GPUArray. If instead of zeros(N,M) we used CuArray(zeros(N,M)). CuArray naturally overrides broadcast such that dotted operations are performed on the GPU. DifferentialEquations.jl uses broadcast internally, and thus just by putting the array as a GPUArray, the array-type will take over how all internal updates are performed and turn this algorithm into a fully GPU-parallelized algorithm that doesn’t require copying to the CPU. Wasn’t that simple?

From that you can probably also see how to multithread everything, or how to set everything up with distributed parallelism. You can make the ODE solvers do whatever you want by defining an array type where the broadcast does whatever special behavior you want.

So to recap, the entire difference from above is changing to:

using CuArrays
const gMx = CuArray(Float32.(Mx))
const gMy = CuArray(Float32.(My))
const gα₁ = CuArray(Float32.(α₁))
gu0 = CuArray(Float32.(u0))
 
const gMyA = CuArray(zeros(Float32,N,N))
const AgMx = CuArray(zeros(Float32,N,N))
const gDA = CuArray(zeros(Float32,N,N))
function gf(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(gMyA,gMy,A)
  mul!(gAMx,A,gMx)
  @. gDA = D*(gMyA + AgMx)
  @. dA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
CuArray.allowslow(false) # makes sure none of the slow fallbacks are used
@time sol = solve(prob2,ROCK2(),progress=true,dt=0.003,save_everystep=false,save_start=false)

Go have fun.

And Stochastic PDEs?

Why not make it an SPDE? All that we need to do is extend each of the PDE equations to have a noise function. In this case, let’s use multiplicative noise on each reactant. This means that our noise update equation is:

function g(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  @. dA = γ₁*A
  @. dB = γ₂*A
  @. dC = γ₃*A
end

Now we just define and solve the system of SDEs:

prob = SDEProblem(f,g,u0,(0.0,100.0))
sol = solve(prob,SRIW1())

We can see the cool effect that diffusion dampens the noise in [A] but is unable to dampen the noise in [B] which results in a very noisy [C]. The stiff SPDE takes much longer to solve even using high order plus adaptivity because stochastic problems are just that much more difficult (current research topic is to make new algorithms for this!). It gets GPU’d just by using CuArray like before. But there we go: solving systems of stochastic PDEs using high order adaptive algorithms with within-method GPU parallelism. That’s gotta be a first? The cool thing is that nobody ever had to implement the GPU-parallelism either, it just exists by virtue of the Julia type system.

(Note: We can also use one of the SROCK methods for better performance here, but they will require a choice of dt)

Side Notes

Warning: This can take awhile to solve! An explicit Runge-Kutta algorithm isn’t necessarily great here, though to use a stiff solver on a problem of this size requires once again smartly choosing sparse linear solvers. The high order adaptive method is pretty much necessary though since something like Euler-Maruyama is simply not stable enough to solve this at a reasonable dt. Also, the current algorithms are not so great at handling this problem. Good thing there’s a publication coming along with some new stuff…

Conclusion

So that’s where we’re at. GPU parallelism works because of abstract typing. But in some cases we need to help the GPU array libraries get up to snuff to handle all of the operations, and then we’ll really be in business! Of course there’s more optimizing that needs to be done, and we can do this by specializing code paths on bottlenecks as needed.

I think this is at least a nice proof of concept showing that Julia’s generic algorithms allow for one to not only take advantage of things like higher precision, but also take advantage of parallelism and extra hardware without having to re-write the underlying algorithm. There’s definitely more work that needs to be done, but I can see this usage of abstract array typing as being one of Julia’s “killer features” in the coming years as the GPU community refines its tools. As of May 2019, all of this GPU stuff is compatible with stiff solvers and linear solver choices (so that way it can make use of GPU-based Jacobian factorizations and Krylov methods), and comparable methods for SDEs also are implemented in DifferentialEquations.jl. A follow-up blog post will show how to best use GPUs with implicit methods.

Full Script

Here’s the full script for recreating everything:

#######################################################
### Solve the PDE
#######################################################
 
using OrdinaryDiffEq, RecursiveArrayTools, LinearAlgebra
 
# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 100
const X = reshape([i for i in 1:100 for j in 1:100],N,N)
const Y = reshape([j for i in 1:100 for j in 1:100],N,N)
const α₁ = 1.0.*(X.>=80)
 
const Mx = Array(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
u0 = zeros(N,N,3)
 
const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N)
# Define the discretized PDE as an ODE function
function f(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
@time sol = solve(prob,ROCK2(),progress=true,save_everystep=false,save_start=false)
 
using Plots; pyplot()
p1 = surface(X,Y,sol[end][:,:,1],title = "[A]")
p2 = surface(X,Y,sol[end][:,:,2],title = "[B]")
p3 = surface(X,Y,sol[end][:,:,3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
 
#######################################################
### Solve the PDE using CLArrays
#######################################################
 
using CuArrays
gu0 = CuArray(Float32.(u0))
const gMx = CuArray(Float32.(Mx))
const gMy = CuArray(Float32.(My))
const gα₁ = CuArray(Float32.(α₁))
 
const gMyA = CuArray(zeros(Float32,N,N))
const gAMx = CuArray(zeros(Float32,N,N))
const gDA = CuArray(zeros(Float32,N,N))
function gf(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(gMyA,gMy,A)
  mul!(gAMx,A,gMx)
  @. gDA = D*(gMyA + AgMx)
  @. dA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
prob2 = ODEProblem(gf,gu0,(0.0,100.0))
CuArray.allowslow(false) # makes sure none of the slow fallbacks are used
@time sol = solve(prob2,ROCK2(),progress=true,save_everystep=false,save_start=false)
 
#######################################################
### Solve the SPDE
#######################################################
 
using StochasticDiffEq
 
function g(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  @. dA = γ₁*A
  @. dB = γ₂*A
  @. dC = γ₃*A
end
 
prob3 = SDEProblem(f,g,u0,(0.0,100.0))
sol = solve(prob3,SOSRI(),progress=true,save_everystep=false,save_start=false)
 
p1 = surface(X,Y,sol[end][:,:,1],title = "[A]")
p2 = surface(X,Y,sol[end][:,:,2],title = "[B]")
p3 = surface(X,Y,sol[end][:,:,3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
 
# Exercise: Do SPDE + GPU

7 thoughts on “Solving Systems of Stochastic PDEs and using GPUs in Julia”

Helmut Haensel
says:

November 15, 2019 at 2:16 pm

Hi Chris, really great post!

just a remark that the syntax for obtaining a full matrix has changed from full() to Array(). So

`const Mx = full(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))`

would become

`const Mx = Array(Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1]))`

if you really want to calculate with full matrices…

Reply
- Christopher Rackauckas
  says:
  
  November 16, 2019 at 3:22 pm
  
  Yes, thanks! Yeah, you don’t really want to use matrices for this, it’s more of a demonstration. These days using GPUs is quite normal and pervasive in the library, so this blog post is somewhat antique.
  
  Reply
Javier
says:

July 1, 2018 at 5:59 pm

Yes, thanks. I’m impressed by how much of a difference that makes.

Reply
Javier
says:

July 1, 2018 at 2:39 pm

Thank you for this great post! Gives a lot of insight to a newbie 🙂 I am curious about the performance on A_mul_B! sparse matrix multiplication vs doing it in for loops as in your gm_fast example. In my case, A_mul_B! seems to be much faster than for loops.
For instance, in this simple example:

N = 100
Mx = Bidiagonal(-ones(N),ones(N-1),false)
Mx[1] = 0
u0 = zeros(N)
du = similar(u0)

f1 = function(du,u,p,t)
t > 1 ? u[1]=1. : u[1] = 0.
A_mul_B!(du,Mx,u)
end

f2 = function(du,u,p,t)
t > 1 ? u[1]=1. : u[1] = 0.
@inbounds for i in 2:N
du[i] = u[i-1] – u[i]
end
end

In my machine, f1 is solved 30x faster and allocates 100x less. Am I doing something wrong in the for loop version?

Regards,
Javier

Reply
- Christopher Rackauckas
  says:
  
  July 1, 2018 at 3:09 pm
  
  Yes, you’re using N as a global which is making i not inferred and the whole thing slow. Make N a const or use length(u).
  
  Reply
Jianghui Du
says:

December 20, 2017 at 9:19 pm

Great post! Very educational. I do have a few questions. Is there any reason that you converted the tridiagonal matrices to dense matrices? I though linear algebra on diagonal matrices are optimized. Also you didn’t use Array Partition when using the Sundials solvers, is it because they are not compatible?

Reply
- Christopher Rackauckas
  says:
  
  December 23, 2017 at 3:55 pm
  
  Hey, thanks for the comments! No, the fact that I full’d them was actually a mistake. These should be left Tridiagonal. I coped that part off of a code that had periodic boundary conditions, in which case instead of reflecting over the boundary you pull the values from across the boundary which results in having a 1 on the top right and bottom left of the matrix (i.e. not tridiagonal). When writing this post I forget to get rid of that part, and your right not fulling it will make it faster.
  
  In fact, if you really want speed you should probably make it matrix-free. this is an example of doing that on the on the Gierer-Meinhardt model. We have a library DiffEqOperators.jl which builds matrix-free operators like this for finite difference PDEs by overloading the multiplication to perform a multithreaded stencil calculation, though I didn’t want to use that here because I wanted to show what the operator actually is (but that would be easier and more efficient). Additionally, one can make use of ParallelAccelerator.jl and its stencil calculation functionality which will be very efficient.
  
  As for the second part, yes. Sundials.jl is a wrapper over the C++ Sundials package, so it cannot use Julia-defined number or array types. No BigFloats, no ArrayPartitions, etc. This is one of the reasons why we have been systematically re-creating everything in Julia: it is able to be more efficient because the compile-time optimizations go deeper, and we get a lot more functionality out. But, Sundials’ BDF is a very good algorithm so for now we rely on that for solving discretizations of PDEs.
  
  Reply

Write a Reply or Comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Stochastic Lifestyle

Categories

Bookmarks

Biology

Hilarious

Math

Programming

Solving Systems of Stochastic PDEs and using GPUs in Julia

December 11 2017 in CUDA, Differential Equations, Julia, Mathematics, Stochastics | Tags: | Author: Christopher Rackauckas

Edit May 2019

Our Problem: 2-dimensional Reaction-Diffusion Equations

Discretizing the PDE into ODEs

Our Representation via Views of 3-Tensors

Some Optimizations

Quick Note About Performance

The Full ODE Code

Making Use of GPU Parallelism

And Stochastic PDEs?

Side Notes

Conclusion

Full Script

Related

7 thoughts on “Solving Systems of Stochastic PDEs and using GPUs in Julia”

Helmut Haensel

Christopher Rackauckas

Javier

Javier

Christopher Rackauckas

Jianghui Du

Christopher Rackauckas

Write a Reply or Comment Cancel reply

Stochastic Lifestyle

Categories

Bookmarks

Biology

Hilarious

Math

Programming

Solving Systems of Stochastic PDEs and using GPUs in Julia

December 11 2017 in CUDA, Differential Equations, Julia, Mathematics, Stochastics | Tags: | Author: Christopher Rackauckas

Edit May 2019

Our Problem: 2-dimensional Reaction-Diffusion Equations

Discretizing the PDE into ODEs

Our Representation via Views of 3-Tensors

Some Optimizations

Quick Note About Performance

The Full ODE Code

Making Use of GPU Parallelism

And Stochastic PDEs?

Side Notes

Conclusion

Full Script

Share this:

Related

7 thoughts on “Solving Systems of Stochastic PDEs and using GPUs in Julia”

Helmut Haensel

Christopher Rackauckas

Javier

Javier

Christopher Rackauckas

Jianghui Du

Christopher Rackauckas

Write a Reply or Comment Cancel reply