-
-
Notifications
You must be signed in to change notification settings - Fork 273
Discrete choice #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrete choice #544
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,4369 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @twiecki
To be honest. I think this one needs quite a bit more work. The modelling is brittle to different randomly generated data and I'm not sure I have the right approach in general.
Experimenting with it when I can. I think this is an interesting class of models, but getting the translation from Stan right is proving a bit trickier than I hoped.
@@ -0,0 +1,5710 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yeah I'll adjust that. I think the modelling strategy is coming along now. I got overly hung up on the long-data format stan was using. Sticking with the wide data format for the minute. Seems more transparent to me and better pedagogically.
Marking this one as ready for review @twiecki and @ricardoV94. I've found a representation of the discrete choice models that i'm pretty happy with. I'm demonstrating them on two different but canonical data sets. (1) On the choice of heating systems, where i focus just on specifying the utility matrix for the individual alternatives In (1) i demonstrate how you can add alternative specific parameters e.g. intercepts and beta parameters for income on the specific alternative. In (2) i focus on how you can add person specific modifications of utility and how you can use prior constraints in the Bayesian context to ensure that the parameter estimates "make sense" i.e. negative parameter estimates for the effect of price on utility. All models fit well, and in reasonable time. However, in (2) i've truncated the data set a little because i ran into a bracket nesting error on the full data set. Would keen to know how to replace my for-loop here with scan.... but i wasn't sure how to do that...,.? I think i'll likely add some more to the text-write up, but would like some interim feedback if you have any on the modelling design. @drbenvincent in case of interest. |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2023-06-19T17:19:18Z Line #21. p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs")) We shouldn't wrap anything in a NathanielF commented on 2023-06-19T17:42:53Z But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I can remove the Deterministic wrap the utilities 'u' for the reason you mention... NathanielF commented on 2023-06-19T19:51:34Z Removed any redundant Deterministics. |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2023-06-19T17:19:18Z Define what is a "marginal rate of substitution"?
NathanielF commented on 2023-06-19T17:43:40Z Yep. Will adjust next commit. NathanielF commented on 2023-06-19T19:50:32Z Resolved |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2023-06-19T17:19:19Z Line #20. for id, indx in zip(uniques, range(len(uniques))): Why is the loop needed? Couldn't grasp immediately why fancy index can't do the job.
Do you have a non-square matrix of some sort? If so can it be padded with zeros or whatever to make it work as if was square?
Reducing the number of stacks would probably speedup a lot this kind of model NathanielF commented on 2023-06-19T17:53:30Z It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?
I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:
| Person ID | Choice ID | Choice | |-----------|-----------|---------| | Person 1 | 1 | Nabisco | | Person 1 | 2 | Keebler | | Person 2 | 1 | Nabisco | | Person 2 | 2 | Nabisco | | Person 2 | 3 | Nabisco |
So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!? I'll experiment a little more, but the loop was primarily because it was easier for me to think through and follow.... NathanielF commented on 2023-06-19T19:00:36Z Yeah, ok I think i have it now. NathanielF commented on 2023-06-19T19:51:11Z Updated in latest commit. Thanks for the push. This is much neater. |
But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I didn't wrap the utilities for the reason you mention... View entire conversation on ReviewNB |
Yep. Will adjust next commit. View entire conversation on ReviewNB |
It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?
I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:
| Person ID | Choice ID | Choice | |-----------|-----------|---------| | Person 1 | 1 | Nabisco | | Person 1 | 2 | Keebler | | Person 2 | 1 | Nabisco | | Person 2 | 2 | Nabisco | | Person 2 | 3 | Nabisco |
So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three another coefficient... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!? View entire conversation on ReviewNB |
Oh actually i think i see it now! View entire conversation on ReviewNB |
…d indexing for final model instead of for loop.
Yeah, ok I think i have it now. View entire conversation on ReviewNB |
…e plot. Tidying
Resolved View entire conversation on ReviewNB |
Updated in latest commit. Thanks for the push. This is much neater. View entire conversation on ReviewNB |
Removed any redundant Deterministics. View entire conversation on ReviewNB |
Incorporated all those changes now @ricardoV94 , thanks for the nudge on the indexing. It's much neater now than using the for loop. Plus I was able to use the full crackers dataset and it fits fast!. |
Giving this one another gentle nudge: @twiecki and @ricardoV94 Let me know if there is some concern or anything outstanding? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple xarray comments, only skimmed the notebook so far but it looks great
examples/generalized_linear_models/GLM-discrete-choice_models.myst.md
Outdated
Show resolved
Hide resolved
examples/generalized_linear_models/GLM-discrete-choice_models.myst.md
Outdated
Show resolved
Hide resolved
Made those changes @OriolAbril . Thanks for the nudge. Much neater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some minor comments after reading the notebook better. This is a great notebook!
examples/generalized_linear_models/GLM-discrete-choice_models.myst.md
Outdated
Show resolved
Hide resolved
examples/generalized_linear_models/GLM-discrete-choice_models.myst.md
Outdated
Show resolved
Hide resolved
examples/generalized_linear_models/GLM-discrete-choice_models.myst.md
Outdated
Show resolved
Hide resolved
Thanks @OriolAbril for the review. I've made those changes now. I've slightly updated and i think improved the final plot and I think it should be ready now for merging. |
interval_dfs = pd.concat(interval_dfs, axis=1) | ||
interval_dfs = interval_dfs.sort_values("means_nabisco") | ||
interval_dfs.head() | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the fact this is a DataFrame isn't really used. A potential alternative could be:
beta_individual = idata_m4["posterior"]["beta_individual"].rename(alt_intercept="brand")
predicted = beta_individual.mean(("chain", "draw"))
predicted = predicted.sortby(predicted.sel(brand="nabisco"))
ci_lb = beta_individual.quantile(0.025, ("chain", "draw")).sortby(predicted.sel(brand="nabisco"))
ci_ub = beta_individual.quantile(0.975, ("chain", "draw")).sortby(predicted.sel(brand="nabisco"))
then use predicted.sel(brand=brand)
below instead of df["means_{brand}"]
and similar. I haven't tried and might have messed the rename swapping key-value thing. Let me know what you thing, is the code still clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OriolAbril I've changed that again. It's much less verbose to use the xarray approach. I guess i think the tradeoff for the reader is in the familiarity with pandas. I often think if i can just show the head of the data frame the reader will more quickly get what i'm doing. In this case i don't think it's that crucial which approach is used and i've swapped to use the xarrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not opposed to DataFrames, here the data has already been reduced significantly, so it is not very cumbersome to convert to it. I always try to check the xarray alternative for two reasons.
First and foremost because if we identify tasks that are too difficult to do with xarray objects we can work on fixing that. It also helps me grok xarray (to the point now after some time I can confidently post xarray code I haven't run with some guarantees it will run or nearly run).
Second, because we all know numpy and pandas very well (or most of us quite well at least), so we definitely know how to do X with numpy/pandas, so converting our data to that is often faster/easier to do (even if more verbose) because we know after that we'll be able to do X, whereas if we have to fight with xarray and its docs it might end up taking twice as time. However, if the docs do this, I feel users get the impression that it is not worth it to fight with xarray objects/docs and have no easily accessible examples of how they are useful to them, which is incoherent with the deafult returned object being an InferenceData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes total sense!
Thanks again for your time on this one! I really appreciate it.
Also, not sure if you are already aware, but while the author metadata isn't very prominently displayed, it is used for aggregating posts by author for example, here is the link to your page: https://www.pymc.io/projects/examples/en/latest/blog/author/nathaniel-forde.html |
I did not know that! That's lovely to see! |
Discrete Choice Modelling
Working out some of the details of how to represent the various incarnations of the discrete choice models in PyMC building to the hierarchical or random coefficients logit described for instance in Jim Savage's work in Stan here: https://khakieconomics.github.io/2019/03/17/Logit-models-of-discrete-choice.html. They are quite distinct from reinforcement learning strategies and are used to estimate aggregate demand and supply curves in differentiated goods markets and more individually looking at the substitution pattern customers have between goods.
This is a draft primarily because i need to find an efficient representation of the models in PyMC and write up more detail on motivating the models.
Helpful links