Search code, repositories, users, issues, pull requests...

Member

The error message indicates that NUTS struggles to find a suitable leapfrog integration step size. This is often caused by challenging geometry in the target distribution. If that is the case, we might need to tweak the prior and its parameterization. However, it is strange why the model worked previously.

Member

Another question I think is worth raising: is this example even worth including in the tutorial? It's a bit unclear what it's actually inferring, no? It feels like it's almost a ABC approach where we are also inferring the epsilon?

(Someone called the Turing.jl docs out for this at BayesComp 2023 actually, haha)

jerlich

Author

Btw, the posterior estimation for the ODE in this tutorial are also not robust. I tried different generative parameter values and the posterior credible intervals do not include the generative parameters.

Member

What do you say @devmotion @yebai ? Should we just drop it from the tutorial?

Member

I know some people find this useful for illustrating how to use DiffEqs in Turing -- maybe we can try to improve it?

Member

I'm not suggesting removing the tutorial; I'm suggesting we remove only the SDE example.

Can someone explain what we're actually doing in the SDE example? We're somehow defining a likelihood based on the mean of the SDE simulations, but also treating the variance of the likelihood as a parameter to be inferred? Does this make sense to do? Seems almost like an ABC thingy but without decreasing the "epsilon" (which is the variance being inferred here)

devmotion

Member

We're somehow defining a likelihood based on the mean of the SDE simulations

We don't base it on the mean of the SDE simulations, do we? I don't remember the previous iterations of the example but based on its explanation the intention was to compute the likelihood of a single SDE simulation based on all trajectories of the SDE simulations ("data"). So indeed the use of odedata seems unintentional and the model was supposed to be based on sdedata.

Additionally, the example mentions SGHMC might be more suitable - I wonder why we don't use it?

But generally I agree: We should fix broken examples or remove them temporarily. Failing examples give a bad impression.

Related: I had completely forgotten, but based on a suggestion by @mschauer I had played around with the preconditioned Crank-Nicolson algorithm in the context of SDEs a few years ago (with AdvancedMH but also with DynamicPPL/Turing): https://gist.github.com/devmotion/37d8d706938364eeb900bc3678860da6 I haven't touched it in a while, so I assume it might require a few updates by now.

I'm currently working on using Turing to parameter fit an SDE, so this topic is particularly of interest to me.

Since I already am testing out MCMC samplers with my model, maybe I can help with fixing the current example.

But, I don't quite understand the goal of the example either. When I first saw the example in the documentation, I assumed the point was to simulate multiple trajectories of the stochastic differential equation, calculate the likelihood of each trajectory based on the data independently, then combine them into one likelihood (by multiplying the likelihoods). Is this the goal? Or how exactly are they using multiple trajectories to calculate the likelihood?

The way it is currently using odedata also doesn't make sense me, but it seems that that may be a mistake.

Member

We don't base it on the mean of the SDE simulations, do we?

Oh no, you're completely right! 👍 I don't know why I had this impression, but it was clearly wrong 🤦

Since I already am testing out MCMC samplers with my model, maybe I can help with fixing the current example.

Lovely:)

When I first saw the example in the documentation, I assumed the point was to simulate multiple trajectories of the stochastic differential equation, calculate the likelihood of each trajectory based on the data independently, then combine them into one likelihood (by multiplying the likelihoods). Is this the goal? Or how exactly are they using multiple trajectories to calculate the likelihood?

No, this is effectively what the example is doing:)

The way it is currently using odedata also doesn't make sense me, but it seems that that may be a mistake.

This is probably a mistake, yeah.

I had played around with the preconditioned Crank-Nicolson algorithm

That indeed seems like it would be very useful here!

So I am working on this example and I'm still trying to figure out where in the process things go wrong.

I experimented with the priors and initial parameters, but even when set to the true values, the sampler still fails, so I think the problem runs deeper.

During my troubleshooting, I tried using samplers other than NUTS, and I got an error that may be insightful when I attempted to use DynamicHMC. But to be honest, I'm unfamiliar with how the internals of Turing and DynamicHMC work, so maybe someone else here can give a little more insight.

Here is the code I used. Notable differences are that I generated sdedata by simulating the equations, taking the mean of the trajectories, then adding noise to the mean. Of course I also added DynamicHMC.NUTS() as an external sampler.

using Turing
using DynamicHMC
using DifferentialEquations
using DifferentialEquations.EnsembleAnalysis

# Load StatsPlots for visualizations and diagnostics.
using StatsPlots

using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(14);

u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
function multiplicative_noise!(du, u, p, t)
    x, y = u
    du[1] = p[5] * x
    du[2] = p[6] * y
end
p = [1.5, 1.0, 3.0, 1.0, 0.1, 0.1]

function lotka_volterra!(du, u, p, t)
    x, y = u
    α, β, γ, δ = p
    du[1] = dx = α * x - β * x * y
    du[2] = dy = δ * x * y - γ * y
end

prob_sde = SDEProblem(lotka_volterra!, multiplicative_noise!, u0, tspan, p)

ensembleprob = EnsembleProblem(prob_sde)
data = solve(ensembleprob, SOSRI(); saveat=0.1, trajectories=1000)
plot(EnsembleSummary(data))

# Take the mean of the 1000 simulations and add noise to the mean to simulate noisy observations
sdedata = reduce(hcat, timeseries_steps_mean(data).u) + 0.8 * randn(size(reduce(hcat, timeseries_steps_mean(data).u)))

# Plot simulation and noisy observations.
scatter!(data[1].t, sdedata'; color=[1 2], label="")


@model function fitlv_sde(data, prob)
    # Prior distributions.
    σ ~ InverseGamma(2, 3)
    α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
    β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
    γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
    δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
    ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
    ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)

    # Simulate stochastic Lotka-Volterra model.
    p = [α, β, γ, δ, ϕ1, ϕ2]
    predicted = solve(prob, SOSRI(); p=p, saveat=0.1)

    # Early exit if simulation could not be computed successfully.
    if predicted.retcode !== :Success
        Turing.@addlogprob! -Inf
        return nothing
    end

    # Observations.
    for i in 1:length(predicted)
        data[:, i] ~ MvNormal(predicted[i], σ^2 * I)
#        end
    end

    return nothing
end;

model_sde = fitlv_sde(sdedata, prob_sde)

dynNUTS = externalsampler(DynamicHMC.NUTS())
chain_sde = sample(
    model_sde,
    dynNUTS,
    5000;
    initial_params=[1.5, 1.3, 1.2, 2.7, 1.2, 0.12, 0.1],
    progress=true,
)
plot(chain_sde)

The error that resulted is as follows:

┌ Info: failed to find initial stepsize
│   q =
│    7-element Vector{Float64}:
│     0.4054651081081644
│     ⋮
│   p =
│    7-element Vector{Float64}:
│     0.14413067672624455
│     ⋮
└   κ = Gaussian kinetic energy (Diagonal), √diag(M⁻¹): Fill(1.0, 7)
ERROR: DynamicHMC error: DynamicHMC.DynamicHMCError("Starting point has non-finite density.", (hamiltonian_logdensity = -Inf, logdensity = -Inf, position = [0.4054651081081644, -0.4054651081081643, -0.13353139262452288, -0.9555114450274363, -0.13353139262452288, -0.6190392084062238, -1.0986122886681098]))
  hamiltonian_logdensity = -Inf
  logdensity = -Inf
  position = [0.4054651081081644, -0.4054651081081643, -0.13353139262452288, -0.9555114450274363, -0.13353139262452288, -0.6190392084062238, -1.0986122886681098]
Stacktrace:
  [1] _error(message::String; kwargs::@Kwargs{hamiltonian_logdensity::Float64, logdensity::Float64, position::Vector{…}})
    @ DynamicHMC ~/.julia/packages/DynamicHMC/gSe0t/src/utilities.jl:27
  [2] local_log_acceptance_ratio
    @ ~/.julia/packages/DynamicHMC/gSe0t/src/stepsize.jl:77 [inlined]
  [3] warmup(sampling_logdensity::DynamicHMC.SamplingLogDensity{…}, stepsize_search::InitialStepsizeSearch, warmup_state::DynamicHMC.WarmupState{…})
    @ DynamicHMC ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:137
  [4] TuringLang/Turing.jl#30
    @ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:430 [inlined]
  [5] BottomRF
    @ ./reduce.jl:86 [inlined]
  [6] afoldl(::Base.BottomRF{…}, ::Tuple{…}, ::InitialStepsizeSearch, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…}, ::TuningNUTS{…})
    @ Base ./operators.jl:544
  [7] _foldl_impl
    @ ./reduce.jl:68 [inlined]
  [8] foldl_impl
    @ ./reduce.jl:48 [inlined]
  [9] mapfoldl_impl
    @ ./reduce.jl:44 [inlined]
 [10] mapfoldl
    @ ./reduce.jl:175 [inlined]
 [11] foldl
    @ ./reduce.jl:198 [inlined]
 [12] _warmup
    @ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:428 [inlined]
 [13] #mcmc_keep_warmup#32
    @ ~/.julia/packages/DynamicHMC/gSe0t/src/mcmc.jl:505 [inlined]
 [14] initialstep(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}, vi::DynamicPPL.TypedVarInfo{…}; kwargs::@Kwargs{…})
    @ TuringDynamicHMCExt ~/.julia/packages/Turing/cH3wV/ext/TuringDynamicHMCExt.jl:81
 [15] step(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}; initial_params::Vector{…}, kwargs::@Kwargs{})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:116
 [16] step
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:99 [inlined]
 [17] macro expansion
    @ ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:130 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
 [19] macro expansion
    @ ~/.julia/packages/AbstractMCMC/YrmkI/src/logging.jl:9 [inlined]
 [20] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{…})
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:120
 [21] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; chain_type::Type, resume_from::Nothing, initial_state::Nothing, kwargs::@Kwargs{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:93
 [22] sample
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:83 [inlined]
 [23] #sample#4
    @ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:263 [inlined]
 [24] sample
    @ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:256 [inlined]
 [25] #sample#3
    @ ~/.julia/packages/Turing/cH3wV/src/mcmc/Inference.jl:253 [inlined]
 [26] top-level scope
    @ ~/workspace/BayenesianEstimationofSDE-Lotka-Volterra.jl:109
Some type information was truncated. Use `show(err)` to see complete types.

Somehow the sampler is choosing negative values for the starting point, which of course causes any density calculations to be non-finite, so choosing an initial step value is impossible. I suspect that this is what is going on with the default NUTS sampler.

When researching this error, I came across this issue in DiffEqBayes, which may be of relevance.

Any insight as to why this might be happening?

Okay, so update: I noticed that the simulations from the sampled parameters in the model were not being ran as an ensemble, but rather a single event. Since the intention is to use all the trajectories together to compute the likelihood, I computed 1000 trajectories then calculated the likelihood from them in a for-loop. Somehow, doing this change fixed the initial step size issue. I cannot figure out why this would be the case, since each observation in the loop should have the same types as the original code-- one would expect that running an ensemble with one trajectory would thus have the same output as the original code I ran.

Here is the updated code I ran:

using Turing
using DifferentialEquations
using DifferentialEquations.EnsembleAnalysis

# Load StatsPlots for visualizations and diagnostics.
using StatsPlots

using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(14);

u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
function multiplicative_noise!(du, u, p, t)
    x, y = u
    du[1] = p[5] * x
    return du[2] = p[6] * y
end
p = [1.5, 1.0, 3.0, 1.0, 0.1, 0.1]

function lotka_volterra!(du, u, p, t)
    x, y = u
    α, β, γ, δ = p
    du[1] = dx = α * x - β * x * y
    return du[2] = dy = δ * x * y - γ * y
end

prob_sde = SDEProblem(lotka_volterra!, multiplicative_noise!, u0, tspan, p)

ensembleprob = EnsembleProblem(prob_sde)
data = solve(ensembleprob, SOSRI(); saveat=0.1, trajectories=1000)
plot(EnsembleSummary(data))

# Take the mean of the 1000 simulations and add noise to the mean to simulate noisy observations
sdedata = reduce(hcat, timeseries_steps_mean(data).u) + 0.8 * randn(size(reduce(hcat, timeseries_steps_mean(data).u)))

# Plot simulation and noisy observations.
scatter!(data[1].t, sdedata'; color=[1 2], label="")

@model function fitlv_sde(data, prob)
    # Prior distributions.
    σ ~ InverseGamma(2, 3)
    α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
    β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
    γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
    δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
    ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
    ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)

    # Simulate stochastic Lotka-Volterra model.
    p = [α, β, γ, δ, ϕ1, ϕ2]
    remake(prob, p = p)
    ensembleprob = EnsembleProblem(prob)
    predicted = solve(ensembleprob, SOSRI(); p=p, saveat=0.1, trajectories=1000)

## I think this is outdated; in my code I've had to do Symbol(predicted.retcode) to get the same output
## I also don't know the best way to handle this with an ensemble simulation; should all the simulations be ignored if one did not compute successfully?
# Early exit if simulation could not be computed successfully.
#    if predicted.retcode !== :Success
#        Turing.@addlogprob! -Inf
#        return nothing
#    end

    # Observations.
    for j in 1:length(predicted)
        for i in 1:length(predicted[j])
            data[:, i] ~ MvNormal(predicted[j][i], σ^2 * I)
        end
    end

    return nothing
end;

model_sde = fitlv_sde(sdedata, prob_sde)

chain_sde = sample(
    model_sde,
    NUTS(0.25),
    5000;
    initial_params=[1.5, 1.3, 1.2, 2.7, 1.2, 0.12, 0.1],
    progress=false,
)
plot(chain_sde)

I don't have an output to show for it because the sampling takes a long time on my machine (it's been a few hours and it's still only at 8%), but it did give this little tidbit which suggests that the problem was solved.

┌ Info: Found initial step size
└   ϵ = 0.0015625

EDIT:

Ah! Having reviewed my last two comments, I think I have figured it out.

I made a change in the code that I thought was insignificant, but I think was actually quite important.

In the original example, there is a line to exit early if the simulation does not succeed:

    # Early exit if simulation could not be computed successfully.
    if predicted.retcode !== :Success
        Turing.@addlogprob! -Inf
        return nothing
    end

However, the original way that return codes are handled by SciMLBase has been deprecated. Now the return codes are of type Enum(Int), so predicted.retcode !== :Success will always evaluate to false, and so the log probability was always set to -Inf. Instead, the line should now be:

    # Early exit if simulation could not be computed successfully.
    if !SciMLBase.successful_retcode(predicted)
        Turing.@addlogprob! -Inf
        return nothing
    end

By the way, how should this be handled with the ensemble simulation? In my updated version, I computed 1000 trajectories rather than computing only one (as is done in the original example). We can iterate over the 1000 trajectories and check for failed return codes, but if even just one out of a thousand fails, the entire set of simulations is thrown out. Is this a result we want?

Member

Lovely stuff @gjgress !

However, the original way that return codes are handled by SciMLBase has been deprecated.

Thank you for discovering this 🙏

Regarding many vs. a single trajectory, I believe we have the following scenarios:

Single trajectory

This is using a random likelihood, which is going to be invalid for something like NUTS if you don't also infer the noise used in the SDE solve (as @devmotion is doing in his PCN example).

Hence, the example in the tutorial is incorrect. Ofc we can also view it as a one-sample approx to the multiple trajectories approach, but this will have too high of a variance to be a valid thing to use in NUTS (we're entering the territory of psuedo-marginal samplers).

Multiple trajectories

Here there are two things we can do: a) use the expectation of the predicted as the mean in our likelihood, i.e. we end up approximating the likelihood
$p (d a t a ∣ E [p r e d i c t e d], σ^{2})$
or b) if we do what you did in your last example, i.e. computing the likelihood for every realization, you're approximating the likelihood
$p (d a t a ∣ σ^{2}) = E [p (d a t a ∣ p r e d i c t e d, σ^{2})]$
where the expectation is taken over predicted (this requires psuedo-marginal samplers, but if the number of samples is sufficiently high so that the variance of the estimator is sufficiently low, then we can probably get away with NUTS or the like)

It seems to me that what we want here is the latter case, i.e.
$p (d a t a ∣ σ^{2}) = E [p (d a t a ∣ p r e d i c t e d, σ^{2})]$

Technically, this is actually
$p (d a t a ∣ θ, σ^{2}) = E [p (d a t a ∣ p r e d i c t e d, σ^{2}) | θ]$
where $θ = (σ, α, β, \dots)$ .

So all in all, yes, I think you're correct that we want to use multiple trajectorie and compute the likelihood for each of the realizations, i.e. your last example code:)

Member

By the way, how should this be handled with the ensemble simulation? In my updated version, I computed 1000 trajectories rather than computing only one (as is done in the original example). We can iterate over the 1000 trajectories and check for failed return codes, but if even just one out of a thousand fails, the entire set of simulations is thrown out. Is this a result we want?

Ideally we should reject all the trajecotries if only one of them fails to solve (since this means the particular realization of the parameters resulted in "invalid" trajectories). It might also be crucial to the adaptation of the parameters in something like NUTS.

Note you might also be able to speed this computation up signfiicantly by using threads (if you'r e not already).

@torfjelde Thanks for the insight on the statistical side of things! Glad to hear that my approach works best here after all.

With all this said, is there anything in my last example that ought to be changed for the documentation (aside from the comments I put here and there)?

If it is fine, I can submit a pull request for the documentation. I'll also regenerate the plots that were used in the example.

where the expectation is taken over predicted (this requires psuedo-marginal samplers, but if the number of samples is sufficiently high so that the variance of the estimator is sufficiently low, then we can probably get away with NUTS or the like)

By the way, are there any pseudo-marginal samplers available in Turing or related libraries?

Ideally we should reject all the trajecotries if only one of them fails to solve (since this means the particular realization of the parameters resulted in "invalid" trajectories). It might also be crucial to the adaptation of the parameters in something like NUTS.

This was my conclusion as well, but I don't have much experience with this type of problem, so I wanted to be sure 👍

Note you might also be able to speed this computation up signfiicantly by using threads (if you'r e not already).

I already am for my project, but I wanted to reproduce the code exactly as written when I was testing this example. I may use threads when generating the plots for the documentation.

Member

The SDE tutorial has been extracted into a new tutorial and converted to a Quarto notebook. Feel free to push changes to the PR #446

mentioned this

on May 24, 2024

Fixes to Bayes SDE notebook #449

runzegan

I found the 'entire trajectory simulation from prior'-based inference used in the tutorial's SDE example quite unusual from a statistical signal processing perspective. Generally, the variance of the SDE's prior can become unboundedly large at large time steps (unless the process is stationary). This means that simulating the entire trajectory can result in inefficient samples that are very far from the observations at large time steps. To address this issue, we usually use sequential Monte Carlo methods when the parameter is fixed. This will simulate/infer p(trajectories|data, parameter) incrementally, point by point or piece by piece, and produce an (unnormalised) likelihood approximation that can be used for correcting the parameter. Generally, an SDE defines a state-space model prior for time series, making the parameter estimation of an SDE a static parameter estimation problem in state-space models. For a review, see "On Particle Methods for Parameter Estimation in State-Space Models" This is also precisely the problem that PMCMC was designed to address. A canonical PMCMC should produce reliable results for this task, albeit perhaps inefficiently.

mentioned this

on May 25, 2024

Plot for samples not working in SDE example #420

odewit8

Any updates on this? Not sure where I can find the updated notebook on SDEs, would be appreciated.

I guess it would be nice if there's an example that just does Bayesian inference for the diffusion coefficient of a Brownian motion, because of the multiple/single trajectories and ensembles interpretation, in contrast with ODEs.

Also: I got the following error trying the retcode solution:
type EnsembleSolution has no field retcode

@odewit8 The branch has been merged, but I think there is also the additional step of running the Quarto notebook and regenerating the webpage, which has not been done yet (not sure how the process works, correct me if I'm wrong).

That being said, you can view the merge and use the code I wrote there as a reference.

The EnsembleSolution itself won't have a retcode; each trajectory of the EnsembleSolution has its own retcode. You'll see in the merged branch that I replaced the retcode line with predicted[i] to rectify this.