Open
Description
This section is broken (again?) https://turinglang.org/v0.31/tutorials/10-bayesian-differential-equations/#inference-of-a-stochastic-differential-equation
The plot in the tutorials shows no sampling (the chains are just flat lines).
Running the section gives this warning (over and over):
┌ Warning: Incorrect ϵ = NaN; ϵ_previous = 0.07500000000000001 is used instead.
└ @ AdvancedHMC.Adaptation ~/.julia/packages/AdvancedHMC/AlvV4/src/adaptation/stepsize.jl:131
The summary is just the initial parameters:
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
σ 1.5000 0.0000 NaN NaN NaN NaN NaN
α 1.3000 0.0000 0.0000 NaN NaN NaN NaN
β 1.2000 0.0000 0.0000 NaN NaN NaN NaN
γ 2.7000 0.0000 0.0000 NaN NaN NaN NaN
δ 1.2000 0.0000 0.0000 NaN NaN NaN NaN
ϕ1 0.1200 0.0000 0.0000 NaN NaN NaN NaN
ϕ2 0.1200 0.0000 0.0000 NaN NaN NaN NaN
(turing_sde) pkg> st
Project turing_sde v0.1.0
Status `~/repos/scratch/julia/turing_sde/Project.toml`
[0c46a032] DifferentialEquations v7.13.0
[f3b207a7] StatsPlots v0.15.7
[fce5fe82] Turing v0.31.3
[37e2e46d] LinearAlgebra
julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Activity
devmotion commentedon May 6, 2024
Possibly related: #420
yebai commentedon May 7, 2024
The error message indicates that
NUTS
struggles to find a suitable leapfrog integration step size. This is often caused by challenging geometry in the target distribution. If that is the case, we might need to tweak the prior and its parameterization. However, it is strange why the model worked previously.torfjelde commentedon May 7, 2024
Another question I think is worth raising: is this example even worth including in the tutorial? It's a bit unclear what it's actually inferring, no? It feels like it's almost a ABC approach where we are also inferring the epsilon?
(Someone called the Turing.jl docs out for this at BayesComp 2023 actually, haha)
jerlich commentedon May 10, 2024
Btw, the posterior estimation for the ODE in this tutorial are also not robust. I tried different generative parameter values and the posterior credible intervals do not include the generative parameters.
torfjelde commentedon May 10, 2024
What do you say @devmotion @yebai ? Should we just drop it from the tutorial?
yebai commentedon May 10, 2024
I know some people find this useful for illustrating how to use DiffEqs in Turing -- maybe we can try to improve it?
torfjelde commentedon May 10, 2024
I'm not suggesting removing the tutorial; I'm suggesting we remove only the SDE example.
Can someone explain what we're actually doing in the SDE example? We're somehow defining a likelihood based on the mean of the SDE simulations, but also treating the variance of the likelihood as a parameter to be inferred? Does this make sense to do? Seems almost like an ABC thingy but without decreasing the "epsilon" (which is the variance being inferred here)
devmotion commentedon May 10, 2024
We don't base it on the mean of the SDE simulations, do we? I don't remember the previous iterations of the example but based on its explanation the intention was to compute the likelihood of a single SDE simulation based on all trajectories of the SDE simulations ("data"). So indeed the use of
odedata
seems unintentional and the model was supposed to be based onsdedata
.Additionally, the example mentions SGHMC might be more suitable - I wonder why we don't use it?
But generally I agree: We should fix broken examples or remove them temporarily. Failing examples give a bad impression.
Related: I had completely forgotten, but based on a suggestion by @mschauer I had played around with the preconditioned Crank-Nicolson algorithm in the context of SDEs a few years ago (with AdvancedMH but also with DynamicPPL/Turing): https://gist.github.com/devmotion/37d8d706938364eeb900bc3678860da6 I haven't touched it in a while, so I assume it might require a few updates by now.
gjgress commentedon May 16, 2024
I'm currently working on using Turing to parameter fit an SDE, so this topic is particularly of interest to me.
Since I already am testing out MCMC samplers with my model, maybe I can help with fixing the current example.
But, I don't quite understand the goal of the example either. When I first saw the example in the documentation, I assumed the point was to simulate multiple trajectories of the stochastic differential equation, calculate the likelihood of each trajectory based on the data independently, then combine them into one likelihood (by multiplying the likelihoods). Is this the goal? Or how exactly are they using multiple trajectories to calculate the likelihood?
The way it is currently using
odedata
also doesn't make sense me, but it seems that that may be a mistake.torfjelde commentedon May 16, 2024
Oh no, you're completely right! 👍 I don't know why I had this impression, but it was clearly wrong 🤦
Lovely:)
No, this is effectively what the example is doing:)
This is probably a mistake, yeah.
That indeed seems like it would be very useful here!
gjgress commentedon May 22, 2024
So I am working on this example and I'm still trying to figure out where in the process things go wrong.
I experimented with the priors and initial parameters, but even when set to the true values, the sampler still fails, so I think the problem runs deeper.
During my troubleshooting, I tried using samplers other than NUTS, and I got an error that may be insightful when I attempted to use DynamicHMC. But to be honest, I'm unfamiliar with how the internals of Turing and DynamicHMC work, so maybe someone else here can give a little more insight.
Here is the code I used. Notable differences are that I generated sdedata by simulating the equations, taking the mean of the trajectories, then adding noise to the mean. Of course I also added
DynamicHMC.NUTS()
as an external sampler.The error that resulted is as follows:
Somehow the sampler is choosing negative values for the starting point, which of course causes any density calculations to be non-finite, so choosing an initial step value is impossible. I suspect that this is what is going on with the default NUTS sampler.
When researching this error, I came across this issue in DiffEqBayes, which may be of relevance.
Any insight as to why this might be happening?
gjgress commentedon May 23, 2024
Okay, so update: I noticed that the simulations from the sampled parameters in the model were not being ran as an ensemble, but rather a single event. Since the intention is to use all the trajectories together to compute the likelihood, I computed 1000 trajectories then calculated the likelihood from them in a for-loop. Somehow, doing this change fixed the initial step size issue. I cannot figure out why this would be the case, since each observation in the loop should have the same types as the original code-- one would expect that running an ensemble with one trajectory would thus have the same output as the original code I ran.
Here is the updated code I ran:
I don't have an output to show for it because the sampling takes a long time on my machine (it's been a few hours and it's still only at 8%), but it did give this little tidbit which suggests that the problem was solved.
EDIT:
Ah! Having reviewed my last two comments, I think I have figured it out.
I made a change in the code that I thought was insignificant, but I think was actually quite important.
In the original example, there is a line to exit early if the simulation does not succeed:
However, the original way that return codes are handled by SciMLBase has been deprecated. Now the return codes are of type Enum(Int), so
predicted.retcode !== :Success
will always evaluate tofalse
, and so the log probability was always set to-Inf
. Instead, the line should now be:By the way, how should this be handled with the ensemble simulation? In my updated version, I computed 1000 trajectories rather than computing only one (as is done in the original example). We can iterate over the 1000 trajectories and check for failed return codes, but if even just one out of a thousand fails, the entire set of simulations is thrown out. Is this a result we want?
torfjelde commentedon May 23, 2024
Lovely stuff @gjgress !
Thank you for discovering this 🙏
Regarding many vs. a single trajectory, I believe we have the following scenarios:
Single trajectory
This is using a random likelihood, which is going to be invalid for something like NUTS if you don't also infer the noise used in the SDE solve (as @devmotion is doing in his PCN example).
Hence, the example in the tutorial is incorrect. Ofc we can also view it as a one-sample approx to the multiple trajectories approach, but this will have too high of a variance to be a valid thing to use in NUTS (we're entering the territory of psuedo-marginal samplers).
Multiple trajectories
Here there are two things we can do: a) use the expectation of the
predicted
as the mean in our likelihood, i.e. we end up approximating the likelihoodor b) if we do what you did in your last example, i.e. computing the likelihood for every realization, you're approximating the likelihood
where the expectation is taken over
predicted
(this requires psuedo-marginal samplers, but if the number of samples is sufficiently high so that the variance of the estimator is sufficiently low, then we can probably get away withNUTS
or the like)It seems to me that what we want here is the latter case, i.e.
Technically, this is actually
.
where
So all in all, yes, I think you're correct that we want to use multiple trajectorie and compute the likelihood for each of the realizations, i.e. your last example code:)
torfjelde commentedon May 23, 2024
Ideally we should reject all the trajecotries if only one of them fails to solve (since this means the particular realization of the parameters resulted in "invalid" trajectories). It might also be crucial to the adaptation of the parameters in something like NUTS.
Note you might also be able to speed this computation up signfiicantly by using threads (if you'r e not already).
gjgress commentedon May 23, 2024
@torfjelde Thanks for the insight on the statistical side of things! Glad to hear that my approach works best here after all.
With all this said, is there anything in my last example that ought to be changed for the documentation (aside from the comments I put here and there)?
If it is fine, I can submit a pull request for the documentation. I'll also regenerate the plots that were used in the example.
By the way, are there any pseudo-marginal samplers available in Turing or related libraries?
This was my conclusion as well, but I don't have much experience with this type of problem, so I wanted to be sure 👍
I already am for my project, but I wanted to reproduce the code exactly as written when I was testing this example. I may use threads when generating the plots for the documentation.
yebai commentedon May 23, 2024
The SDE tutorial has been extracted into a new tutorial and converted to a Quarto notebook. Feel free to push changes to the PR #446
runzegan commentedon May 24, 2024
I found the 'entire trajectory simulation from prior'-based inference used in the tutorial's SDE example quite unusual from a statistical signal processing perspective. Generally, the variance of the SDE's prior can become unboundedly large at large time steps (unless the process is stationary). This means that simulating the entire trajectory can result in inefficient samples that are very far from the observations at large time steps. To address this issue, we usually use sequential Monte Carlo methods when the parameter is fixed. This will simulate/infer p(trajectories|data, parameter) incrementally, point by point or piece by piece, and produce an (unnormalised) likelihood approximation that can be used for correcting the parameter. Generally, an SDE defines a state-space model prior for time series, making the parameter estimation of an SDE a static parameter estimation problem in state-space models. For a review, see "On Particle Methods for Parameter Estimation in State-Space Models" This is also precisely the problem that PMCMC was designed to address. A canonical PMCMC should produce reliable results for this task, albeit perhaps inefficiently.
odewit8 commentedon Dec 2, 2024
Any updates on this? Not sure where I can find the updated notebook on SDEs, would be appreciated.
I guess it would be nice if there's an example that just does Bayesian inference for the diffusion coefficient of a Brownian motion, because of the multiple/single trajectories and ensembles interpretation, in contrast with ODEs.
Also: I got the following error trying the retcode solution:
type EnsembleSolution has no field retcode
gjgress commentedon Dec 3, 2024
@odewit8 The branch has been merged, but I think there is also the additional step of running the Quarto notebook and regenerating the webpage, which has not been done yet (not sure how the process works, correct me if I'm wrong).
That being said, you can view the merge and use the code I wrote there as a reference.
The EnsembleSolution itself won't have a retcode; each trajectory of the EnsembleSolution has its own retcode. You'll see in the merged branch that I replaced the retcode line with predicted[i] to rectify this.