How chaotic is chaos? How some AI for Science / SciML papers are overstating accuracy claims


Just how chaotic are chaotic systems? Many of you may have heard of “the butterfly effect” but don’t quite know the mathematics behind such systems. What I want to demonstrate is the “sensitive dependence to initial conditions” property of chaotic systems and just how sensitive these systems are. The reason this has come up is that I have seen some AI papers claiming to be able to predict the timeseries of a chaotic system (many more can be found online too, just highlighting a few random ones). What I want to bring to the forefront is an examination of what is really being claimed: just how hard is it to actually forecast a chaotic system? And if they aren’t doing that, what have they done instead?

Quick Understanding of Chaos: Sensitive Dependence and the Shadowing Lemma

First of all, let’s establish a baseline understanding of chaotic systems. They have many properties but the key one is sensitive dependence to initial conditions. What this means is that there is an exponential growth in the difference between any two trajectories that start nearby (initial conditions being both initial values of the system and the parameters). Let’s do an example. Hokay so, here’s the Lorenz equation:

$$\begin{aligned}
\frac{dx}{dt} &= σ(y-x) \\
\frac{dy}{dt} &= x(ρ-z) – y \\
\frac{dz}{dt} &= xy – βz \\
\end{aligned}$$

Dang, that is a sweet equation you might say? WRONG! This is a really nasty equation. Let’s solve it an plot the solution:

using OrdinaryDiffEqTsit5, Plots
function lorenz!(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
u0 = [1.0; 0.0; 0.0]
p = [10.0, 28.0, 8 / 3]
tspan = (0.0, 100.0)
prob = ODEProblem(lorenz!, u0, tspan, p)
sol = solve(prob, Tsit5())
p1 = plot(sol)
p2 = plot(sol; idxs = (1,2,3))
plot(p1,p2)

This is using an adaptive ODE solver with default tolerances abstol=1e-6, reltol=1e-3, solving using a 5th order method. On the left you see the timeseries of (x(t), y(t), z(t)), on the right you see what’s known as a phase diagram, where instead it’s shown using the (x,y,z) coordinates and the time axis is not shown. You are probably familiar with the plot on the right, “the butterfly wings”

But… this blog post isn’t going so well because I’ve already lied to you. That’s not the solution to the Lorenz equations. In fact, it’s not even close. “But wait, ODE solvers with high order to a tolerance… okay it has numerical error but just a little bit?” No it’s not even close. To demonstrate this, let’s see what happens if I had a slightly different parameter a little bit, let’s change 10.0 to 10.000000000000001 and plot the two solutions on top of each other:

prob2 = ODEProblem(lorenz!, u0, tspan, [10.000000000000001, 28.0, 8/3])
sol2 = solve(prob2, Tsit5())
plot(sol, idxs=1); plot!(sol2, idxs=1)

Here I’m just plotting the x coordinate of the two different solutions. Notice that by around t=45 the two trajectories split and then you have effectively O(1) error from that point on. Initial conditions are the same way (parameters are just initial conditions of ODEs with 0 derivative). In fact, chaotic systems are so wild that even the slightest of change to the computation makes it completely different. For example, let’s change the absolute tolerance of the ODE solver from 1e-6 to 1f-6, that’s 10^-6 represented in 32-bit floats (converted back up to 64-bit floats) vs a tolerance of 10^-6 represented in 64-bit floats. For reference, 1e-6 – Float64(1f-6) = 2.5247572468697606e-15.

And that is way sufficient enough so that you get a completely different solution by t=30. That’s too big of a difference in chaos, that’s O(1) error causing.

So okay, I lied to you at the beginning. That picture was not the solution to the Lorenz equation. It should be self-evident by now that little tiny errors cause completely incorrect pictures, but ODE solvers themselves are just approximations, so we’re approximating each step with 1e-6 error, then we’re definitely not seeing the right solution. But we see something, isn’t that odd? That’s the result of the shadowing lemma. It states in formal terms that the perturbed solution we get due to small errors is not the real solution, but it is the solution of some other initial condition (/parameters) that are epsilon close. So when you solve the equation, you don’t get a solution for the parameters [10.0, 28.0, 8 / 3] with initial condition [1.0; 0.0; 0.0], but there is some initial condition and parameters, say [10.0, 28.00000000000000001, 8 / 3] and [1.0; 0.00000000000000000000001; 0.0] for which you get a “very close” approximation! In practice, it is not possible to know what values you did, but you know that they exist in a small epsilon ball around the ones you were actually solving for.

There’s two ways to think about this. One is that it’s a statement about the density of solutions on the chaotic attractor. There are all sorts of cool properties of chaotic attractors, such that they live on non-integer dimensional objects called fractals, but here what is mentioned is that for every point on the fractal, the nearby points are also solutions, but every little ball in the the fractal contains a value from “almost every” possible solution (past Lyapunov time). This is a very formal way of saying that tiny perturbations effectively put you randomly in (x(t), y(t), z(t)) on the fractal when past Lyopunov time: the trajectories that start close diverge so much that you have a chance to be anywhere. In other words, past Lyapunov time you can never get a look at the trajectory you want, just random samples of other possible trajectories. Even though it’s deterministic, you effectively can only treat it probabilistically.

But secondly, you do actually get “A” trajectory. Not the right one, but you get one, and it shows you the attractor. That’s why the (x(t), y(t), z(t)) plot still looks interesting: you still get the general picture of the object on which the trajectories live, you just choose the wrong path through it.

So just how hard is it to get an accurate trajectory?

So okay, from that information about sensitive dependence and the shadowing lemma it should already be fairly clear why no machine learning algorithm trained to even 1e-6 error, or even 1e-8 error is going to give you anything remotely close to “accurate forecasts” of a chaotic system. It’s fairly clear that any statement about accurate forecasts of chaotic systems is pretty dubious beyond a Lyapunov time because the errors grow exponentially fast in any approximation, making it so that you effectively only have random sampling. But that has some fuzzy words, “effectively”. Here’s a question: just how accurate do you have to be in order to make a prediction past 1 Lyapunov time, or 2?

To give an answer to this question, let’s take the Lorenz equation. Normally it’s solved from (0,100) to get the cool pictures I showed, but for the sake of not requiring a large compute cluster run to solve this, let’s ask the question: how hard is it to get an accurate solution on (0,60)? This would be a way to really visualize how hard chaotic computations really are.

To do this, we’ll need some special ammo. Julia does have a lot of nice tools for validated Taylor integration with provable error bounds in order to formally analyze this kind of thing, but those are very compute heavy so we’re going to do something slightly more heuristic. Instead we’re going to use the DiffEqCallbacks.jl ProbInts callback with a high order integrator. The method is fairly straightforward: each step of an ODE solver has an error E, which is estimated by the adaptive solver, and thus perturbing randomly by normally distributed random variables with mean zero and variance E at each step gives you a distribution that converges to covering the correct solution in some mathematical sense. While it’s just heuristic, you need “enough” trajectories to see all behaviors, and that is a property as N->infinity, here we’re going to take N=100 since it tends to be good enough.

So okay, what we do is solve the ODE with this callback that perturbs each step by the error estimate. Let’s see this in action in the simple case from above:

using OrdinaryDiffEqTsit5, Plots, DiffEqCallbacks
function lorenz!(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
u0 = [1.0; 0.0; 0.0]
p = [10.0, 28.0, 8 / 3]
tspan = (0.0, 100.0)
prob = ODEProblem(lorenz!, u0, tspan, p)
cb = AdaptiveProbIntsUncertainty(5)
ensemble_prob = EnsembleProblem(prob)
sim = solve(ensemble_prob, Tsit5(), trajectories = 100, callback = cb)
plot(sim, idxs = (0, 1), linealpha = 0.4)

What you are looking at here is just the x(t) coordinates over time for the 100 slightly perturbed trajectories. At around t=20, the error growth is large enough that some “fall” to the wrong part of the butterfly wing, and boom now it has O(1) error. Thus the default ODE solver on the Lorenz equation only gives you “the right” trajectory up to t=20, and then you end up on some shadow trajectory and are somewhere randomly on the attractor. This means that the “system” is pretty normal until t=20, and only afterwards it displays the chaotic behavior. Let’s remember that t=20 for later.

Now, let’s see what it takes to get the trajectory correct. Again, just for (0,60), about half of the time the “normal” plotting is done on. To do this, let’s take a bunch of different tolerances and see what we get. When I started this exercise, I very quickly ran out of digits of accuracy in standard 64-bit floating point numbers, so first I moved to MPFR BigFloats. These are really slow, it took ~4000 seconds to compute at 1e-20 accuracy. So then I moved to the new MultiFloats.jl which has a very recent SIMD-accelerated higher precision floating point type, where I used the 128 bit version. This looks like:

using OrdinaryDiffEq, DiffEqCallbacks, Plots, MultiFloats
function g(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
u0 = Float64x2.([1.0; 0.0; 0.0])
tspan = Float64x2.((0.0, 60.0))
p = Float64x2.([10.0, 28.0, 8 / 3])
prob = ODEProblem(g, u0, tspan, p)
cb = AdaptiveProbIntsUncertainty(5)
ensemble_prob = EnsembleProblem(prob)
#tols = Float64x2.([1e-10,1e-12,1e-14,1e-16, 1e-18, 1e-20, 1e-22, 1e-24, 1e-26, 1e-28])
tols = Float64x2.([1e-30,1e-32])
ts = Float64[]
 
for tol in tols
    t = @elapsed sim = solve(ensemble_prob, Vern9(), abstol=tol, reltol=tol, trajectories = 100, callback = cb, dt=1e-6, saveat = Float64x2.(0:0.1:100))
    @show tol, t
    push!(ts, t)
    plot(sim, idxs = (0, 1), linealpha = 0.4)
    savefig("tol$(Float64(tol)).png")
end
 
#=
(tol, t) = (MultiFloat{Float64, 2}((1.0e-10, 0.0)), 3.061468)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-12, 0.0)), 3.2470819)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-14, 0.0)), 4.1461386)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-16, 0.0)), 5.0764427)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-18, 0.0)), 6.997277)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-20, 0.0)), 11.0628261)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-22, 0.0)), 16.3280154)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-24, 0.0)), 25.8473729)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-26, 0.0)), 48.3387268)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-28, 0.0)), 86.6352988)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-30, 0.0)), 187.174455)
(tol, t) = (MultiFloat{Float64, 2}((1.0e-32, 0.0)), 500.8718445)
=#

As you can see, the estimated time grows pretty rapidly. Now let’s see the plots. At 1e-10:

at 1e-14 (i.e. the last you can do with 64-bit floating point accuracy)

at 1e-18

at 1e-24

at 1e-30

and at 1e-32 (note eps(Float64x2) = 4.9303806576313238e-32 so this is the very edge of 128-bit floating point accuracy)

So what this shows you is:

  1. With 64-bit floating point numbers, you can only get to about t=35 before the error growth makes it not possible to predict the trajectory
  2. Getting to (0,60) required ~1e-32 accuracy per step, the very edge of what you can do with 128-bit floats
  3. Getting further out gets exponentially harder: each few seconds of accuracy is gained by increasing the accuracy by orders of magnitude
  4. Getting a reference trajectory to know “I solved this chaotic system” as a baseline to test a chaotic forecast against takes specialized software (high order ODE solvers compatible with very fast extended precision arithmetic which only has an implementation in Julia, and even then this faster one is only a few months old<)
  5. Getting to the normal (0,100) requires more precision and crashed my workstation (I can do this on an HPC etc., but that seemed like too much work for a blog post so I’ll stop here. But you have the codes: use Float64x4 and lower precision, the rest is left to the reader).

Just as a ballpark measure of difficulty, the specialized multifloats used here solved 1e-20 approximately 400x faster than MPFR bigfloats, so the 1e-32 bit run without specialized floating point numbers to get a reference to (0,60) would have taken 500*400 seconds = 2.3 days to compute. So probably 2 weeks or so for the (0,100) version on an HPC with large enough memory for each chaotic trajectory you want to get the accuracy of? And this is the heuristic way, the validated arithmetic way probably is 10x-100x on that if you want to be sure.

So what are the ML papers doing to claim good predictive accuracy on chaotic systems?

So okay, clearly the ML papers didn’t do any of that. They don’t mention what HPCs they are using, how many digits of accuracy, the specialized higher-precision and high order integrators etc. And they are testing a usually PyTorch neural network with 32-bit floating point accuracy, something that fundamentally does not have enough digits to even store an accurate enough measurement of the state to produce an accurate solution to the most basic chaotic test case everyone shows. So… they clearly are not “forecasting” the chaotic trajectory: they never computed a reference trajectory to test against, and their forecaster does not have enough digits to store a solution that can test it.

So what did they do? Well, let’s look for example at DeepXDE for a second. This paper doesn’t actually reference chaos, but it’s a nice example because it’s a popular paper (a known physics-informed neural network software) that people have pointed me to in order to say “this paper shows that PINNs work for solving inverse problems on chaotic systems”. Let’s see what they did:

Did you catch that? It’s subtle, but you now know enough to understand what happened here: “the observations are produced by solving the above system to t = 3”. Remember above, the chaotic properties of the Lorenz equation “start” at t=20 or so, but t=60 is a good “it’s very chaotic” kind of point. At t=3, you will have not seen the error growth, it’s not a chaotic system. Now I will commend the authors for not making a claim here that their software works for chaotic systems, but they did choose the most classic chaotic system as a test case to then changed the setup so it wasn’t actually a test on a chaotic system, so it’s clear why it’s misleading. And I might be the one to blame here: I assume Lu Lu chose this case due to some benchmarks I have published for a long time on inverse problems on Lorenz, where that benchmark is the only one I know that chooses (0,3) to start. But then it has a second section on (0,30) to demonstrate the failure of global optimization methods in that case, where the point was to show the two in conjunction that the parameter estimation works on (0,3) but doesn’t work on (0,30), which you probably understand why that would be the case now by reading this blog post. But still, all of those facts together, this subtlety, etc. is what then leads to “oh, I thought the main PINN papers showed it works for chaotic systems?” This is subtle difference of choosing a different end time completely changes the context, and we should all be careful about that.

So let’s take a look at another paper I mentioned at the top, “Modeling chaotic Lorenz ODE System using Scientific Machine Learning”. This one comes from one of my previous students whom I respect very much, he does lots of great work. But he also made this same mistake (sorry Raj, it’s just a great example of this). The Figure 3 is:

Trained until t=10, forecast until t=15. Chaos solved. You now know enough to understand why we would say “this is just neural ODEs on a non-chaotic system” even though, yes it’s “the” classic chaos example, but it’s not an example of forecasting the chaotic system. It is a chaotic system but avoids the chaos.

And finally to a paper that has been getting a lot of hype recently: Panda: A pretrained forecast model for universal representation of chaotic dynamics. Let’s take a look at a part of figure 11:

Now this isn’t exactly the Lorenz equation because their architecture predicts for new chaotic systems, etc., but it’s clearly from the figure a set of equations that is similar to the Lorenz equations. And in this case, you can see that the trajectory only swaps branches once in the forecast. From the tests above we know that you need a few mixing events before you actually hit the chaotic behavior. So Panda, while clearly giving some pretty good predictions, doesn’t actually “Panda exhibits emergent properties: zero-shot forecasting of unseen real world chaotic systems”, it does good forecasting of systems that are chaotic in some regimes but where the forecasting system does not hit the chaotic properties. But if it’s not setup to be in the chaotic regime, then it’s just any other ODE, all here being non-stiff ODEs, so this is zero-shot forecasting of small non-stiff ODEs. While still very interesting, I see a lot of people sending this around saying “this shows AI can do what no classical method can do, it can make accurate forecasts of chaotic systems!”. No honey, and now you understand why that’s not going to be possible but also exactly how you were misled.

Note: Please Be Nice

Now, I will give everyone a pass here. Again, all of the researchers whose papers I chose to mention here are people who I think do very good research and should be commended for much of their work. But the reason why I point this is out is exactly that: many very good researchers are making the very same mistake over and over. This means that what is going on is not obvious, and someone needed to write a blog post to make it very clear to everyone what exactly is going on and what we should do. Please do not email these people saying “Chris called you out for fraud!” etc., no. Chaos isn’t easy to understand. Lots of smart people made the same mistake. Give everyone a pass: it seems like even reviewers aren’t catching this! But hopefully if this happens again, point them to this blog. (Also, let’s be clear the DeepXDE paper doesn’t claim to make good predictions on PINNs, it’s just a common misread by others)

But SINDy worked on a chaotic system?

Yes, Steven Brunton’s Discovering governing equations from data by sparse identification of nonlinear dynamical systems did learn the Lorenz equations from data! They used a method for if you have points which are finely spaced enough, you can put a spline through them to get derivative estimates, and directly learn the map u’=f(u,p,t) without ever integrating in time (remember, this sensitive dependence is a time integration issue beyond Lyapunov time). From there they show they get the correct form of the equations. They never validate or say that from the learned equations they then have good predictions on (0,100), which again would be impossible because 1 digit of accuracy loss in the parameter values would be enough to make that not true (and the data generating process isn’t accurate enough to make that statement either). “The algorithm not only identifies the correct terms in the dynamics, but it accurately determines the coefficients to within .03% of the true values”, it wouldn’t predict long trajectories well, but it would get you something on the attractor and it’s the right function. So this paper checks out and it works because their method does not do any long time integration.

PS: Interesting little find

When gathering notes for this I realized that I should mention this paper How PINNs cheat: Predicting chaotic motion of a double pendulum. Quote: “Our results demonstrate that PINNs do not exhibit any sensitivity to perturbations in the initial condition. Instead, the PINN optimization consistently converges to physically correct solutions that violate the initial condition only marginally, but diverge significantly from the desired solution due to the chaotic nature of the system. In fact, the PINN predictions primarily exhibit low-frequency components with a smaller magnitude of higher-order derivatives, which favors lower physics loss values compared to the desired solution. We thus hypothesize that the PINNs “cheat” by shifting the initial conditions to values that correspond to physically correct solutions that are easier to learn.”

That’s the shadowing lemma.

They use the double pendulum system, another classic chaotic system but only take it to t=5 so that’s really not far enough to fully see the chaos, but still is far enough to get some drift. And what they effectively find is that the PINN learns to use a small perturbation to the initial condition to match the trajectory. The shadowing lemma as stated earlier is the lemma that states that such an initial condition will always exist! Just a neat little find I thought should be mentioned.

Conclusion: So what can you actually do with chaotic systems?

So finally, we see that forecasts of chaotic systems are simply not possible. I hate to brake it to you, you cannot and will not make accurate forecasts of a chaotic system with some ML model with 32-bit accuracy on a GPU. Just storing x(5.0) at 32-bits of accuracy already means you don’t have enough information to accurately start a forecast from there to get to the correct value at t=25. For Lorenz (0,100), the standard picture, you need more than 128 bits to store the state at each time to even have a chance of having enough information to compute the next step accurately enough (I would guess you need probably 512 bits? (0,60) was 128 but it gets exponentially harder). So its very clear that if someone claims this, they didn’t pass the smell test unless the paper has a whole methods section on how they did everything in 512-bit floats.

So what can you actually do? If you feel hopeless, don’t. Because in this blog post I mentioned a few things that are hopeful. It comes from the shadowing lemma and ergodic theory. Recall that although you cannot get a forecast of the right trajectory, you do still get some shadow trajectory on the attractor. Interesting right? And you also get things like (the average of x(t) at t->infty), you get properties like those accurately estimated w.r.t. the accuracy of the ODE solver and the length of the trajectory (note for math folks: Lorenz is known to not be hyperbolic because it’s dissipative, but it’s still ergodic, just harder to prove). So properties of chaotic attractors and statistical quantities are knowable and potentially learnable! Remember, long-time trajectories of chaotic systems can only be thought of in a probabilistic sense because of all of the mixing that goes on. But can you learn those distributions? Maybe. Automatic differentiation (which relies on having a correct forward pass, which we know isn’t true for chaotic systems) needs to be changed or corrected in order to be derivatives of ergodic properties on chaotic systems, so there’s work in AD/differentiable programming that can be done to make the learning better (fun fact: Lyopunov exponents are the exponential growth of errors in the tangent space, and forward-mode AD uses a pushforward of derivatives in the tangent space… so you have exponential growth of errors there for any chaotic system QED). Maybe the Panda architecture is really good at zero-shot predicting the attractor properties of chaotic systems? I think there’s a lot of good work to do in Scientific Machine Learning on chaotic systems, and this is where you would look.

So let’s drop all of this talk of L2 time series forecasting error on chaotic systems and turn to ergodic properties.

5 thoughts on “How chaotic is chaos? How some AI for Science / SciML papers are overstating accuracy claims

  1. Yasir Çatal

    says:

    Terrific work as always. Question: How much of a problem is this in the case of inverse problems or global optimization? I’m assuming doing inverse problem solutions based on just the time series would be super difficult if not hopeless, but using certain statistical properties such as power spectrum or cross spectral density would be more resistant to the problem maybe?


    • Very good question. What I mention at the bottom is a big issue for inverse problems with shooting methods since the derivative estimates given by automatic differentiation, forward sensitivity, and continuous adjoint methods are not convergent and diverge to infinity, and thus gradient-based optimization has major issues. All fitting needs to be changed to be fitting in a statistical sense, , , etc. and in order to do that you need to differentiate d/dp which needs different adjoint methods, the shadow adjoint methods in Frank’s blog post and documented in SciMLSensitivity.jl. Or you can rephrase inverse problems to not rely on simulating beyond Lyapunov times, such as with multiple shooting or certain collocation approaches, which can learn not because of matching the time series but instead because getting the statistical properties correct implies correct parameters (and vice versa, with identifiability). For example, u’=f(u,p), getting that right at a bunch of random points along the trajectory, i.e. if you have data (u_i, u’_i), then it will work not because it’s fitting a long trajectory, but instead because there’s a statistical relationship implied by ergodicity between , , and p. You can also as you say learn predictions of statistics to power spectrums or cross spectral densities, which means you also need to get the gradient of these quantities w.r.t parameters which can be the hard part. Global optimization or derivative-free optimization can make it so you don’t have to derive such gradient estimates, but those converge slower and do not scale well with larger dimensionality of the parameters.

      In order to stabilize things for scientific machine learning though you can need other tricks. I’m going to have a paper coming out in a month or so that shows a few tricks that be done to help these kinds of cases.


  2. Walker Powell

    says:

    I have seen them mention off-hand in a few instances that they resort to filtering to get derivative estimates when things are too noisy. I think it is generally underemphasized in SciML and UQ literature the degree to which various regression, sensitivity analysis, and UQ problems are sensitive not just to data fidelity but also to properties of the underlying dynamics. Chaos tends to throw a wrench in everything


  3. Walker Powell

    says:

    I don’t think the authors have ever noted this interpretation, but when trained on sufficiently long/many trajectories on a strange attractor, SINDy can be written in a very ergodic fashion. When summing the L2 regression error over every point on a trajectory, you can view it as integration against an empirical measure that can be made to approximate the ergodic measure on the strange attractor. So you replace any sort of time series regression with state space L2(\mu) regression with a \mu that captures many features of the attractor geometry and on-attractor dynamics. The shadowing lemma allows one to be more confident in estimating this \mu from simulated trajectories compared to pointwise estimates of individual trajectories. There has also been some work on building chaotic benchmarks that span a wide range of attractor properties (William Gilpin has a nice repository of this stuff) so hopefully more people can begin testing and validating on this stuff soon.


    • Yes very good point! I wasn’t entirely sure how much time to spend on that digression but yes because it’s using a local property (the state/derivative map values at a bunch of different effectively random points in the attractor), it builds a dataset that is effectively able to measure ergodic properties (i.e. just the statistical correlation between state and derivative) which is something that converges, and that’s why it works. Now, it does require that the points are sampled close enough to get accurate derivative estimates, and that’s one major assumption I don’t think is really highlighted, as if you sample time points beyond a Lyapunov time then the derivative estimates from the splines are junk. But if you do satisfy that assumption then it will converge as the time series of data grows infinitely long to being an accurate estimate and the way you’d prove it is by showing it guarantees learning a system with the same ergodic properties.


Write a Reply or Comment

Your email address will not be published. Required fields are marked *


*

This site uses Akismet to reduce spam. Learn how your comment data is processed.