Skip to content

Discrete choice #544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 13, 2023
Merged

Discrete choice #544

merged 27 commits into from
Jul 13, 2023

Conversation

NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Apr 26, 2023

Discrete Choice Modelling

Working out some of the details of how to represent the various incarnations of the discrete choice models in PyMC building to the hierarchical or random coefficients logit described for instance in Jim Savage's work in Stan here: https://khakieconomics.github.io/2019/03/17/Logit-models-of-discrete-choice.html. They are quite distinct from reinforcement learning strategies and are used to estimate aggregate demand and supply curves in differentiated goods markets and more individually looking at the substitution pattern customers have between goods.

This is a draft primarily because i need to find an efficient representation of the models in PyMC and write up more detail on motivating the models.

Helpful links

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,4369 @@
{
Copy link
Member

@twiecki twiecki Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use dims?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @twiecki

To be honest. I think this one needs quite a bit more work. The modelling is brittle to different randomly generated data and I'm not sure I have the right approach in general.

Experimenting with it when I can. I think this is an interesting class of models, but getting the translation from Stan right is proving a bit trickier than I hoped.

@@ -0,0 +1,5710 @@
{
Copy link
Member

@ricardoV94 ricardoV94 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use pm.math.softmax to avoid that annoying warning


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah I'll adjust that. I think the modelling strategy is coming along now. I got overly hung up on the long-data format stan was using. Sticking with the wide data format for the minute. Seems more transparent to me and better pedagogically.

@NathanielF NathanielF marked this pull request as ready for review June 16, 2023 10:42
@NathanielF
Copy link
Contributor Author

Marking this one as ready for review @twiecki and @ricardoV94.

I've found a representation of the discrete choice models that i'm pretty happy with. I'm demonstrating them on two different but canonical data sets.

(1) On the choice of heating systems, where i focus just on specifying the utility matrix for the individual alternatives
(2) On the choice of crackers with repeated decisions over the same decision maker.

In (1) i demonstrate how you can add alternative specific parameters e.g. intercepts and beta parameters for income on the specific alternative.

In (2) i focus on how you can add person specific modifications of utility and how you can use prior constraints in the Bayesian context to ensure that the parameter estimates "make sense" i.e. negative parameter estimates for the effect of price on utility.

All models fit well, and in reasonable time. However, in (2) i've truncated the data set a little because i ran into a bracket nesting error on the full data set. Would keen to know how to replace my for-loop here with scan.... but i wasn't sure how to do that...,.?

I think i'll likely add some more to the text-write up, but would like some interim feedback if you have any on the modelling design.

@drbenvincent in case of interest.

@NathanielF NathanielF requested a review from ricardoV94 June 16, 2023 10:52
@review-notebook-app
Copy link

review-notebook-app bot commented Jun 19, 2023

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:18Z
----------------------------------------------------------------

Line #21.        p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))

We shouldn't wrap anything in a Deterministic that we are not going to use. For large models/datasets this can slowdown sampling quite a lot and make it seem "slower" than it actually is.


NathanielF commented on 2023-06-19T17:42:53Z
----------------------------------------------------------------

But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I can remove the Deterministic wrap the utilities 'u' for the reason you mention...

NathanielF commented on 2023-06-19T19:51:34Z
----------------------------------------------------------------

Removed any redundant Deterministics.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 19, 2023

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:18Z
----------------------------------------------------------------

Define what is a "marginal rate of substitution"?


NathanielF commented on 2023-06-19T17:43:40Z
----------------------------------------------------------------

Yep. Will adjust next commit.

NathanielF commented on 2023-06-19T19:50:32Z
----------------------------------------------------------------

Resolved

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 19, 2023

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:19Z
----------------------------------------------------------------

Line #20.        for id, indx in zip(uniques, range(len(uniques))):

Why is the loop needed? Couldn't grasp immediately why fancy index can't do the job.

Do you have a non-square matrix of some sort? If so can it be padded with zeros or whatever to make it work as if was square?

Reducing the number of stacks would probably speedup a lot this kind of model


NathanielF commented on 2023-06-19T17:53:30Z
----------------------------------------------------------------

It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?

I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:

| Person ID | Choice ID | Choice |

|-----------|-----------|---------|

| Person 1 | 1     | Nabisco |

| Person 1 | 2     | Keebler |

| Person 2 | 1     | Nabisco |

| Person 2 | 2     | Nabisco |

| Person 2 | 3     | Nabisco |

So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!? I'll experiment a little more, but the loop was primarily because it was easier for me to think through and follow....

NathanielF commented on 2023-06-19T19:00:36Z
----------------------------------------------------------------

Yeah, ok I think i have it now.

NathanielF commented on 2023-06-19T19:51:11Z
----------------------------------------------------------------

Updated in latest commit. Thanks for the push. This is much neater.

Copy link
Contributor Author

But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I didn't wrap the utilities for the reason you mention...


View entire conversation on ReviewNB

Copy link
Contributor Author

Yep. Will adjust next commit.


View entire conversation on ReviewNB

Copy link
Contributor Author

NathanielF commented Jun 19, 2023

It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?

I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:

| Person ID | Choice ID | Choice |

|-----------|-----------|---------|

| Person 1 | 1     | Nabisco |

| Person 1 | 2     | Keebler |

| Person 2 | 1     | Nabisco |

| Person 2 | 2     | Nabisco |

| Person 2 | 3     | Nabisco |

So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three another coefficient... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!?


View entire conversation on ReviewNB

Copy link
Contributor Author

Oh actually i think i see it now!


View entire conversation on ReviewNB

…d indexing for final model instead of for loop.
Copy link
Contributor Author

Yeah, ok I think i have it now.


View entire conversation on ReviewNB

Copy link
Contributor Author

Resolved


View entire conversation on ReviewNB

Copy link
Contributor Author

Updated in latest commit. Thanks for the push. This is much neater.


View entire conversation on ReviewNB

Copy link
Contributor Author

Removed any redundant Deterministics.


View entire conversation on ReviewNB

@NathanielF
Copy link
Contributor Author

Incorporated all those changes now @ricardoV94 , thanks for the nudge on the indexing. It's much neater now than using the for loop. Plus I was able to use the full crackers dataset and it fits fast!.

@NathanielF NathanielF requested a review from twiecki June 26, 2023 08:51
@NathanielF
Copy link
Contributor Author

Giving this one another gentle nudge: @twiecki and @ricardoV94

Let me know if there is some concern or anything outstanding?

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a couple xarray comments, only skimmed the notebook so far but it looks great

@NathanielF
Copy link
Contributor Author

Made those changes @OriolAbril . Thanks for the nudge. Much neater

@NathanielF NathanielF requested a review from OriolAbril July 12, 2023 09:09
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some minor comments after reading the notebook better. This is a great notebook!

@NathanielF
Copy link
Contributor Author

NathanielF commented Jul 13, 2023

Thanks @OriolAbril for the review. I've made those changes now. I've slightly updated and i think improved the final plot and I think it should be ready now for merging.

interval_dfs = pd.concat(interval_dfs, axis=1)
interval_dfs = interval_dfs.sort_values("means_nabisco")
interval_dfs.head()
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the fact this is a DataFrame isn't really used. A potential alternative could be:

beta_individual = idata_m4["posterior"]["beta_individual"].rename(alt_intercept="brand")
predicted = beta_individual.mean(("chain", "draw"))
predicted = predicted.sortby(predicted.sel(brand="nabisco"))
ci_lb = beta_individual.quantile(0.025, ("chain", "draw")).sortby(predicted.sel(brand="nabisco"))
ci_ub = beta_individual.quantile(0.975, ("chain", "draw")).sortby(predicted.sel(brand="nabisco"))

then use predicted.sel(brand=brand) below instead of df["means_{brand}"] and similar. I haven't tried and might have messed the rename swapping key-value thing. Let me know what you thing, is the code still clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OriolAbril I've changed that again. It's much less verbose to use the xarray approach. I guess i think the tradeoff for the reader is in the familiarity with pandas. I often think if i can just show the head of the data frame the reader will more quickly get what i'm doing. In this case i don't think it's that crucial which approach is used and i've swapped to use the xarrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opposed to DataFrames, here the data has already been reduced significantly, so it is not very cumbersome to convert to it. I always try to check the xarray alternative for two reasons.

First and foremost because if we identify tasks that are too difficult to do with xarray objects we can work on fixing that. It also helps me grok xarray (to the point now after some time I can confidently post xarray code I haven't run with some guarantees it will run or nearly run).

Second, because we all know numpy and pandas very well (or most of us quite well at least), so we definitely know how to do X with numpy/pandas, so converting our data to that is often faster/easier to do (even if more verbose) because we know after that we'll be able to do X, whereas if we have to fight with xarray and its docs it might end up taking twice as time. However, if the docs do this, I feel users get the impression that it is not worth it to fight with xarray objects/docs and have no easily accessible examples of how they are useful to them, which is incoherent with the deafult returned object being an InferenceData

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense!

Thanks again for your time on this one! I really appreciate it.

@OriolAbril OriolAbril merged commit b94c073 into pymc-devs:main Jul 13, 2023
@OriolAbril
Copy link
Member

Also, not sure if you are already aware, but while the author metadata isn't very prominently displayed, it is used for aggregating posts by author for example, here is the link to your page: https://www.pymc.io/projects/examples/en/latest/blog/author/nathaniel-forde.html

@NathanielF
Copy link
Contributor Author

Also, not sure if you are already aware, but while the author metadata isn't very prominently displayed, it is used for aggregating posts by author for example, here is the link to your page: https://www.pymc.io/projects/examples/en/latest/blog/author/nathaniel-forde.html

I did not know that! That's lovely to see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants