Skip to content

BUG: Issue with Ordered Transform in Ordered Logistic API docs example #6610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NathanielF opened this issue Mar 17, 2023 · 8 comments
Closed
Labels

Comments

@NathanielF
Copy link
Contributor

Describe the issue:

The API docs for the ordered logistic class recommends using the transform ordered to provide cutpoints for the ordinal regression. But the provided example breaks with an error reporting that the random variable for the cutpoints lacks a shape.

image

On the latest version:
image

Reproduceable code example:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import numpy as np
import pytensor as pt

# Generate data for a simple 1 dimensional example problem
n1_c = 300; n2_c = 300; n3_c = 300
cluster1 = np.random.randn(n1_c) + -1
cluster2 = np.random.randn(n2_c) + 0
cluster3 = np.random.randn(n3_c) + 2

x = np.concatenate((cluster1, cluster2, cluster3))
y = np.concatenate((1*np.ones(n1_c),
                    2*np.ones(n2_c),
                    3*np.ones(n3_c))) - 1

# Ordered logistic regression
with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2, 
                          transform=pm.distributions.transforms.Ordered)
    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
    idata = pm.sample()

Error message:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 14
     12 # Ordered logistic regression
     13 with pm.Model() as model:
---> 14     cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2, 
     15                           transform=pm.distributions.transforms.Ordered)
     16     y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
     17     idata = pm.sample()

File ~/mambaforge/envs/pymc_examples_new/lib/python3.11/site-packages/pymc/distributions/distribution.py:312, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, *args, **kwargs)
    308         kwargs["shape"] = tuple(observed.shape)
    310 rv_out = cls.dist(*args, **kwargs)
--> 312 rv_out = model.register_rv(
    313     rv_out,
    314     name,
    315     observed,
    316     total_size,
    317     dims=dims,
    318     transform=transform,
    319     initval=initval,
    320 )
    322 # add in pretty-printing support
    323 rv_out.str_repr = types.MethodType(str_for_dist, rv_out)
...
---> 95     y = at.zeros(value.shape)
     96     y = at.inc_subtensor(y[..., 0], value[..., 0])
     97     y = at.inc_subtensor(y[..., 1:], at.log(value[..., 1:] - value[..., :-1]))

AttributeError: 'RandomGeneratorSharedVariable' object has no attribute 'shape'

PyMC version information:

Last updated: Fri Mar 17 2023

Python implementation: CPython
Python version : 3.11.0
IPython version : 8.11.0

pytensor: 2.10.1

numpy : 1.24.2
arviz : 0.15.1
matplotlib: 3.7.1
pandas : 1.5.3
pymc : 5.1.1
pytensor : 2.10.1

Watermark: 2.3.1

Context for the issue:

I was going to try and write up docs on the technique of ordinal regression, but i think failure of the ordered transform makes the entire class of models less straightforward to implement. I'm pretty sure it's related to this line:

x = pt.zeros(value.shape)

But i don't know enough about the random variable implementation to know why the shape attribute is not available now.

@ricardoV94
Copy link
Member

Can you share the whole traceback?

@NathanielF
Copy link
Contributor Author

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[47], line 14
12 # Ordered logistic regression
13 with pm.Model() as model:
---> 14 cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2,
15 transform=pm.distributions.transforms.Ordered)
16 y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
17 idata = pm.sample()

File ~/mambaforge/envs/pymc_examples_new/lib/python3.11/site-packages/pymc/distributions/distribution.py:312, in Distribution.new(cls, name, rng, dims, initval, observed, total_size, transform, *args, **kwargs)
308 kwargs["shape"] = tuple(observed.shape)
310 rv_out = cls.dist(*args, **kwargs)
--> 312 rv_out = model.register_rv(
313 rv_out,
314 name,
315 observed,
316 total_size,
317 dims=dims,
318 transform=transform,
319 initval=initval,
320 )
322 # add in pretty-printing support
323 rv_out.str_repr = types.MethodType(str_for_dist, rv_out)

File ~/mambaforge/envs/pymc_examples_new/lib/python3.11/site-packages/pymc/model.py:1328, in Model.register_rv(self, rv_var, name, observed, total_size, dims, transform, initval)
1326 raise ValueError("total_size can only be passed to observed RVs")
1327 self.free_RVs.append(rv_var)
-> 1328 self.create_value_var(rv_var, transform)
1329 self.add_named_variable(rv_var, dims)
1330 self.set_initval(rv_var, initval)

File ~/mambaforge/envs/pymc_examples_new/lib/python3.11/site-packages/pymc/model.py:1521, in Model.create_value_var(self, rv_var, transform, value_var)
1518 value_var.tag.test_value = rv_var.tag.test_value
1519 else:
1520 # Create value variable with the same type as the transformed RV
-> 1521 value_var = transform.forward(rv_var, *rv_var.owner.inputs).type()
1522 value_var.name = f"{rv_var.name}_{transform.name}__"
1523 value_var.tag.transform = transform

File ~/mambaforge/envs/pymc_examples_new/lib/python3.11/site-packages/pymc/distributions/transforms.py:95, in Ordered.forward(self, value, *inputs)
94 def forward(self, value, *inputs):
---> 95 y = at.zeros(value.shape)
96 y = at.inc_subtensor(y[..., 0], value[..., 0])
97 y = at.inc_subtensor(y[..., 1:], at.log(value[..., 1:] - value[..., :-1]))

AttributeError: 'RandomGeneratorSharedVariable' object has no attribute 'shape'`

@ricardoV94
Copy link
Member

Ah maybe Ordered is not an initialized transform? Try to use distributions.transforms.univariate_ordered or ...ordered instead.

I think the docstring example uses the latter (small o for the instance instead of big O for the class)

@NathanielF
Copy link
Contributor Author

That looks right now:
image

image

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 18, 2023

The warning might be a bug we fixed just now in pymc 5.1.2

Not sure what the plot tells you? Good or bad?

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 18, 2023

Ah you said it looks right. Shall we close the issue?

Maybe we can update the docs to use univariate_ordered?

This way reduces the chance of typos, and it's actually the right one going forward.

@NathanielF
Copy link
Contributor Author

I think the plot looks good. Yes, i'll open a PR now to change the docstring

@ricardoV94 ricardoV94 added docs and removed bug labels Mar 18, 2023
NathanielF added a commit to NathanielF/pymc that referenced this issue Mar 18, 2023
@ricardoV94
Copy link
Member

Closed via #6611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants