Skip to content

Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047

@jcstucki

Description

@jcstucki

Description of your problem

Using the model.add_coord method appears to break pm.sample_posterior_predictive.

Here’s a simple example, estimating the loc and scale of three independent normals:

Please provide a minimal, self-contained, and reproducible example.

# Generate data
data = np.random.normal(loc=np.array([3, 5, 8]), scale=np.array([1.1, 6.3, 9.1]), size=(1000, 3))

# Model 1: No coords
with pm.Model() as no_coords_model:
    mu = pm.Normal('mu', mu=0, sigma=10, size=3)
    sigma = pm.HalfNormal('sigma', sigma=10, size=3)
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    no_coords_trace = pm.sample()
    no_coords_post = pm.sample_posterior_predictive(no_coords_trace)

# Model 2: Context manager
coords = {'name': ['A', 'B', 'C']}
with pm.Model(coords=coords) as context_model:
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    
    context_trace = pm.sample()
    context_post = pm.sample_posterior_predictive(context_trace)

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)



traces = [no_coords_trace, context_trace, within_trace]
mus = [trace.posterior.mu.values[..., i].mean() for trace in traces for i in range(3)]
sigmas = [trace.posterior.sigma.values[..., i].mean() for trace in traces for i in range(3)]
post_df = pd.DataFrame(np.c_[mus, sigmas], columns=['mu', 'sigma'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))
print(post_df.unstack(1).to_string())

                 mu                         sigma                    
                  A         B         C         A         B         C
context    2.977460  4.982624  7.826642  1.081710  6.287514  9.165928
no_coords  2.976785  4.984743  7.827109  1.081657  6.289910  9.174939
within     2.976568  4.990646  7.825051  1.081552  6.286198  9.167916


pps = [no_coords_post, context_post, within_post]
mean_value = [post.posterior_predictive.obs.values[..., i].mean() for post in pps for i in range(3)]
post_df = pd.DataFrame(mean_value, columns=['mean_ppc'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))

           mean_ppc                    
                  A         B         C
context    2.977167  4.985852  7.825006
no_coords  2.976837  4.982244  7.818495
within    -0.045788 -0.594845 -0.270400

"The dims on within_post are the same as the others, but it seems like totally wrong values are getting sampled. It is telling that the mean of each distribution is not the mean of the means, suggesting it’s not a case of correct values being shuffled."

This is pretty breaking when attempting to do out-of-sample predictions, since coords needs to be set somehow, and it can't (to my knowledge) be re-added to the model context after instantiation.

Versions and main components

  • PyMC/PyMC3 Version: 4.1.3
  • Aesara/Theano Version: 2.7.7
  • Python Version: 3.8
  • Operating system: Mac OS
  • How did you install PyMC/PyMC3: pip

Activity

michaelosthege

michaelosthege commented on Aug 13, 2022

@michaelosthege
Member

This sounds serious!

If you look at the control flow of Model.__init__(coords=...) you can see that internally it gets routed into Model.add_coords and thereby Model.add_coord.
But in the example you posted, there's a mutable=True setting which creates the difference.
Via Model(coords=...) the dimension will be immutable by default.

So I would hypothesize that this is an issue with mutable dims.
And since the likelihood is broadcasted from (3,) to observed of (1000, 3) this is probably (yet another) broadcasting bug..

👉 Does it go away if you pass mutable=False in the third example?

👉 What if instead of size=3 in the first example you do three = aesara.shared(3) and size=three?

👉 What if in the third example you manually broadcast the parameters already? (ll = pm.Normal('obs', mu=mu[None, :], sigma=sigma[None, :], observed=data))

jessegrabowski

jessegrabowski commented on Aug 15, 2022

@jessegrabowski
Member

Reporting back on the three suggestions:

  1. Passing Mutable = False in the third example fixes the bug.
  2. Passing a shared variable to size in the first example causes the bug.
  3. Manually broadcasting does not fix the bug, at least not by expanding on the first dimension.

A note on (2). This code snippet:

three = aesara.shared(3)
pm.Normal('normal', size=three)

Raised a type error for me on version 4.1.4 (windows). I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

michaelosthege

michaelosthege commented on Aug 15, 2022

@michaelosthege
Member

Thanks @jessegrabowski, I'd say this confirms my hypothesis: It's a bug related to broadcasted, symbolic sizes.
@pymc-devs/dev-core can someone familiar with aeppl take a look at this? It does sound a little dangerous..

I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

This sounds like a smaller bug whereby a shared variable is not automatically wrapped in a tuple.
This is the test case where we're testing these "lazy" flavors of passing scalars. The test doesn't cover passing a (symbolic) scalar to size yet.

def test_lazy_flavors(self):
assert pm.Uniform.dist(2, [4, 5], size=[3, 2]).eval().shape == (3, 2)
assert pm.Uniform.dist(2, [4, 5], shape=[3, 2]).eval().shape == (3, 2)
with pm.Model(coords=dict(town=["Greifswald", "Madrid"])):
assert pm.Normal("n1", mu=[1, 2], dims="town").eval().shape == (2,)
assert pm.Normal("n2", mu=[1, 2], dims=["town"]).eval().shape == (2,)

ricardoV94

ricardoV94 commented on Aug 15, 2022

@ricardoV94
Member

Perhaps the variable mu is being resampled by mistake. #5973 could help

jessegrabowski

jessegrabowski commented on Aug 20, 2022

@jessegrabowski
Member

As an additional test, I changed the mean of the prior distribution I put over the estimated parameters. It seems that, when using model.add_coord, pm.sample_posterior actually samples from the prior. Here's my tests:

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=3, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> 2.83320012

with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=-10, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> -9.98125202

No idea why that would be happening, but it seems to suggest its not a shape or broadcasting issue. Instead, when a coord is a SharedVariable, the prior is being sample as if it were the posterior.

jessegrabowski

jessegrabowski commented on Aug 20, 2022

@jessegrabowski
Member

Here are some more through outputs to show that pm.sample_posterior is sampling from the prior for all variables, and that the trigger is a shared variable on the shape:

with pm.Model() as model:
    three = aesara.shared(3)
    mu = pm.Normal('mu', mu=-10, sigma=10, shape=(three, ))
    sigma = pm.HalfNormal('sigma', sigma=11, shape=(three, ))
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    trace = pm.sample()
    post = pm.sample_posterior_predictive(within_trace, var_names=['mu', 'sigma', 'obs'])

Compare posterior and posterior_predictive for mu:

# Posterior
print(trace.posterior.mu.mean(dim=['chain', 'draw']).values)
print(trace.posterior.mu.std(dim=['chain', 'draw']).values)

# Output
# [3.01789945 5.26561783 8.11427295]
# [0.03474686 0.20847781 0.29282972]

# Posterior Predictive
print(post.posterior_predictive.mu.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.mu.std(dim=['chain', 'draw']).values)

# Output:
# [-10.33329768  -9.96841615  -9.62960897]
# [ 9.6122322  10.23897464 10.21420589]

And for sigma:

# Posterior
print(trace.posterior.sigma.mean(dim=['chain', 'draw']).values)
print(trace.posterior.sigma.std(dim=['chain', 'draw']).values)
# Output 
# [1.08169416 6.39748516 9.09000896]
# [0.02457604 0.14622595 0.20021742]

# Posterior Predictive
print(post.posterior_predictive.sigma.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.sigma.std(dim=['chain', 'draw']).values)

# Output
# [9.01756853 8.45109316 8.63402421]
# [6.54332978 6.36108147 6.72434589]
lucianopaz

lucianopaz commented on Aug 20, 2022

@lucianopaz
Member

Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue.
The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?

ricardoV94

ricardoV94 commented on Aug 20, 2022

@ricardoV94
Member

This is not a shape problem per se. The question is what should the default be when any input to a model variable is a shared variable.

with pm.Model() as m:
  mu = pm.MutableData("mu", 0)
  x = pm.Normal("x", mu)
  y = pm.Normal("y, x, observed=[10, 10, 10])

  idata = pm.sample()
  pp = pm.sample_posterior_predictive(idata)

In that example x will be resampled from the prior again, and so will be y by association, which has nothing to do with posterior predictive.

The reason we did these changes was to resample deterministics in GLM models where you have mu=x@beta and x is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables in var_names, and not to all variables in the model by default.

changed the title [-]Using the model.add_coord method appears to break pm.sample_posterior_predictive[/-] [+]Variables with shared inputs are always resampled from the prior in sample_posterior_predictive[/+] on Aug 20, 2022
jbh1128d1

jbh1128d1 commented on Aug 29, 2022

@jbh1128d1

I'm having issues getting an out of sample prediction with coords changing for the out of sample data as well. Any progress on this issue?

ricardoV94

ricardoV94 commented on Aug 30, 2022

@ricardoV94
Member

For now the solution is to not use MutableData for inputs of unobserved distributions

32 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @twiecki@lucianopaz@bersavosh@michaelosthege@jcstucki

      Issue actions

        Variables with shared inputs are always resampled from the prior in sample_posterior_predictive · Issue #6047 · pymc-devs/pymc