Direct Automatic Differentiation of (Differential Equation) Solvers vs Analytical Adjoints: Which is Better?


Automatic differentiation of a “solver” is a subject with many details for doing it in the most effective form. For this reason, there are a lot of talks and courses that go into lots of depth on the topic. I recently gave a talk on some of the latest stuff in differentiable simulation with the American Statistical Association, and have some detailed notes on such adjoint derivations as part of the 18.337 Parallel Computing and Scientific Machine Learning graduate course at MIT. And there are entire organizations like my SciML Open Source Software Organization which work day-in and day-out on the development of new differentiable solvers.

I’ll give a brief summary of all my materials here below.

Continuous vs Discrete Differentiation of Solvers

AD of a solver can be done in essentially two different ways: either directly performing automatic … READ MORE

Engineering Trade-Offs in Automatic Differentiation: from TensorFlow and PyTorch to Jax and Julia


To understand the differences between automatic differentiation libraries, let’s talk about the engineering trade-offs that were made. I would personally say that none of these libraries are “better” than another, they simply all make engineering trade-offs based on the domains and use cases they were aiming to satisfy. The easiest way to describe these trade-offs is to follow the evolution and see how each new library tweaked the trade-offs made of the previous.

Early TensorFlow used a graph building system, i.e. it required users to essentially define variables in a specific graph language separate from the host language. You had to define “TensorFlow variables” and “TensorFlow ops”, and the AD would then be performed on this static graph. Control flow constructs were limited to the constructs that could be represented statically. For example, an `ifelse` function statement is very different from … READ MORE

Useful Algorithms That Are Not Optimized By Jax, PyTorch, or Tensorflow


In some previous blog posts we described in details how one can generalize automatic differentiation to give automatically stability enhancements and all sorts of other niceties by incorporating graph transformations into code generation. However, one of the things which we didn’t go into too much is the limitation of these types of algorithms. This limitation is what we have termed “quasi-static” which is the property that an algorithm can be reinterpreted as some static algorithm. It turns out that for very fundamental reasons, this is the same limitation that some major machine learning frameworks impose on the code that they can fully optimize, such as Jax or Tensorflow. This led us to the question: are there algorithms which are not optimizable within this mindset, and why? The answer is now published at ICML 2021, so lets dig into … READ MORE

Neural Jump SDEs (Jump Diffusions) and Neural PDEs


This is just an exploration of some new neural models I decided to jot down for safe keeping. DiffEqFlux.jl gives you the differentiable programming tools to allow you to use any DifferentialEquations.jl problem type (DEProblem) mixed with neural networks. We demonstrated this before, not just with neural ordinary differential equations, but also with things like neural stochastic differential equations and neural delay differential equations.

At the time we made DiffEqFlux, we were the “first to the gate” for many of these differential equations types and left it as an open question for people to find a use for these tools. And judging by the Arxiv papers that went out days after NeurIPS submissions were due, it looks like people now have justified some machine learning use cases for them. There were two separate papers on neural … READ MORE