Skip to content

propose new ConstrainedUniform distribution #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
drbenvincent opened this issue Nov 20, 2021 · 3 comments
Open

propose new ConstrainedUniform distribution #32

drbenvincent opened this issue Nov 20, 2021 · 3 comments

Comments

@drbenvincent
Copy link

drbenvincent commented Nov 20, 2021

Following the discussion #5066 (with input from @aseyboldt), I propose a new ConstrainedUniform distribution. An ideal use case for this is as a distribution for cutpoints in an ordinal regression.

The idea is that the distribution would be a vector of N>2 uniformly distributed values where the first entry is constrained to take on a given min value and the final entry is constrained to take on a given max value. The first value is hard constrained to take on a particular value, and the final value is also constrained to take on a particular value - because of the use of the Dirichlet (which sums to 1). Use of the cumsum was also found to be necessary to enforce ordering and avoid divergences in sampling.

A ConstrainedUniform(N) would have N-2 degrees of freedom.

Minimum working example

The new distribution would be along these lines

def constrainedUniform(N, min=0, max=1):
    return pm.Deterministic('theta',
                             at.concatenate([
                                 np.ones(1)*min,
                                 at.extra_ops.cumsum(pm.Dirichlet("theta_unknown", a=np.ones(N-2))) (min+(max-min))
                             ])
                           )

If you had the following observations

y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
       3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6])

K = len(np.unique(y))
print(f"K = {K} categories")

Then you could have a pleasingly concise ordinal regression model

with pm.Model() as model:
    theta = constrainedUniform(K)
    mu = pm.Normal('mu', mu=K/2, sigma=K)
    sigma = pm.HalfNormal("sigma", 1)
    pm.OrderedProbit("y_obs", cutpoints=theta, eta=mu, sigma=sigma, observed=y)

Note that this amounts to a very simple 'intercept only' model, where mu is this intercept. There are no predictor variables in this example

Implementation issues

When converting that function constrainedUniform into a PyMC distribution, then at the very least, you'd have to consider:

@tomicapretto
Copy link

@ricardoV94
Copy link
Member

Sounds like a good candidate for pymc-experimental

@ricardoV94
Copy link
Member

This should be a simple helper that looks like a PyMC distribution (but is not one) with new and dist methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants