-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
Description
Description of your problem
Using the model.add_coord method appears to break pm.sample_posterior_predictive.
Here’s a simple example, estimating the loc and scale of three independent normals:
Please provide a minimal, self-contained, and reproducible example.
# Generate data
data = np.random.normal(loc=np.array([3, 5, 8]), scale=np.array([1.1, 6.3, 9.1]), size=(1000, 3))
# Model 1: No coords
with pm.Model() as no_coords_model:
mu = pm.Normal('mu', mu=0, sigma=10, size=3)
sigma = pm.HalfNormal('sigma', sigma=10, size=3)
ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
no_coords_trace = pm.sample()
no_coords_post = pm.sample_posterior_predictive(no_coords_trace)
# Model 2: Context manager
coords = {'name': ['A', 'B', 'C']}
with pm.Model(coords=coords) as context_model:
mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
context_trace = pm.sample()
context_post = pm.sample_posterior_predictive(context_trace)
# Model 3: Within model
with pm.Model() as within_model:
within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
within_trace = pm.sample()
within_post = pm.sample_posterior_predictive(within_trace)
traces = [no_coords_trace, context_trace, within_trace]
mus = [trace.posterior.mu.values[..., i].mean() for trace in traces for i in range(3)]
sigmas = [trace.posterior.sigma.values[..., i].mean() for trace in traces for i in range(3)]
post_df = pd.DataFrame(np.c_[mus, sigmas], columns=['mu', 'sigma'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))
print(post_df.unstack(1).to_string())
mu sigma
A B C A B C
context 2.977460 4.982624 7.826642 1.081710 6.287514 9.165928
no_coords 2.976785 4.984743 7.827109 1.081657 6.289910 9.174939
within 2.976568 4.990646 7.825051 1.081552 6.286198 9.167916
pps = [no_coords_post, context_post, within_post]
mean_value = [post.posterior_predictive.obs.values[..., i].mean() for post in pps for i in range(3)]
post_df = pd.DataFrame(mean_value, columns=['mean_ppc'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))
mean_ppc
A B C
context 2.977167 4.985852 7.825006
no_coords 2.976837 4.982244 7.818495
within -0.045788 -0.594845 -0.270400
"The dims on within_post are the same as the others, but it seems like totally wrong values are getting sampled. It is telling that the mean of each distribution is not the mean of the means, suggesting it’s not a case of correct values being shuffled."
This is pretty breaking when attempting to do out-of-sample predictions, since coords needs to be set somehow, and it can't (to my knowledge) be re-added to the model context after instantiation.
Versions and main components
- PyMC/PyMC3 Version: 4.1.3
- Aesara/Theano Version: 2.7.7
- Python Version: 3.8
- Operating system: Mac OS
- How did you install PyMC/PyMC3: pip
Activity
michaelosthege commentedon Aug 13, 2022
This sounds serious!
If you look at the control flow of
Model.__init__(coords=...)
you can see that internally it gets routed intoModel.add_coords
and therebyModel.add_coord
.But in the example you posted, there's a
mutable=True
setting which creates the difference.Via
Model(coords=...)
the dimension will be immutable by default.So I would hypothesize that this is an issue with mutable dims.
And since the likelihood is broadcasted from (3,) to observed of (1000, 3) this is probably (yet another) broadcasting bug..
👉 Does it go away if you pass
mutable=False
in the third example?👉 What if instead of
size=3
in the first example you dothree = aesara.shared(3)
andsize=three
?👉 What if in the third example you manually broadcast the parameters already? (
ll = pm.Normal('obs', mu=mu[None, :], sigma=sigma[None, :], observed=data)
)jessegrabowski commentedon Aug 15, 2022
Reporting back on the three suggestions:
Mutable = False
in the third example fixes the bug.A note on (2). This code snippet:
Raised a type error for me on version 4.1.4 (windows). I instead had to pass the size in as a 1-tuple (
size=(three, )
). I don't know if this is relevant. It doesn't appear to be in simple tests ofpm.Normal(...).eval()
, but I still thought it worth mentioning.michaelosthege commentedon Aug 15, 2022
Thanks @jessegrabowski, I'd say this confirms my hypothesis: It's a bug related to broadcasted, symbolic sizes.
@pymc-devs/dev-core can someone familiar with
aeppl
take a look at this? It does sound a little dangerous..This sounds like a smaller bug whereby a shared variable is not automatically wrapped in a
tuple
.This is the test case where we're testing these "lazy" flavors of passing scalars. The test doesn't cover passing a (symbolic) scalar to
size
yet.pymc/pymc/tests/test_shape_handling.py
Lines 439 to 444 in 906fcdc
ricardoV94 commentedon Aug 15, 2022
Perhaps the variable
mu
is being resampled by mistake. #5973 could helpjessegrabowski commentedon Aug 20, 2022
As an additional test, I changed the mean of the prior distribution I put over the estimated parameters. It seems that, when using
model.add_coord
,pm.sample_posterior
actually samples from the prior. Here's my tests:No idea why that would be happening, but it seems to suggest its not a shape or broadcasting issue. Instead, when a coord is a SharedVariable, the prior is being sample as if it were the posterior.
jessegrabowski commentedon Aug 20, 2022
Here are some more through outputs to show that
pm.sample_posterior
is sampling from the prior for all variables, and that the trigger is a shared variable on the shape:Compare posterior and posterior_predictive for mu:
And for sigma:
lucianopaz commentedon Aug 20, 2022
Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue.
The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?
ricardoV94 commentedon Aug 20, 2022
This is not a shape problem per se. The question is what should the default be when any input to a model variable is a shared variable.
In that example
x
will be resampled from the prior again, and so will bey
by association, which has nothing to do with posterior predictive.The reason we did these changes was to resample deterministics in GLM models where you have
mu=x@beta
andx
is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables invar_names
, and not to all variables in the model by default.[-]Using the model.add_coord method appears to break pm.sample_posterior_predictive[/-][+]Variables with shared inputs are always resampled from the prior in sample_posterior_predictive[/+]jbh1128d1 commentedon Aug 29, 2022
I'm having issues getting an out of sample prediction with coords changing for the out of sample data as well. Any progress on this issue?
ricardoV94 commentedon Aug 30, 2022
For now the solution is to not use MutableData for inputs of unobserved distributions
32 remaining items