Discrete choice #544

NathanielF · 2023-04-26T14:50:46Z

Discrete Choice Modelling

Working out some of the details of how to represent the various incarnations of the discrete choice models in PyMC building to the hierarchical or random coefficients logit described for instance in Jim Savage's work in Stan here: https://khakieconomics.github.io/2019/03/17/Logit-models-of-discrete-choice.html. They are quite distinct from reinforcement learning strategies and are used to estimate aggregate demand and supply curves in differentiated goods markets and more individually looking at the substitution pattern customers have between goods.

This is a draft primarily because i need to find an efficient representation of the models in PyMC and write up more detail on motivating the models.

Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
PR description contains a link to the relevant issue:
- a tracker one for existing notebooks (tracker issues have the "tracker id" label)
- or a proposal one for new notebooks
Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

Helpful links

https://github.com/pymc-devs/pymc-examples/blob/main/CONTRIBUTING.md

Signed-off-by: Nathaniel <[email protected]>

review-notebook-app · 2023-04-26T14:50:51Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

twiecki · 2023-06-09T10:04:12Z

examples/generalized_linear_models/GLM-discrete-choice_models.ipynb

@@ -0,0 +1,4369 @@
+{


Can you use dims?

Reply via ReviewNB

Hi @twiecki

To be honest. I think this one needs quite a bit more work. The modelling is brittle to different randomly generated data and I'm not sure I have the right approach in general.

Experimenting with it when I can. I think this is an interesting class of models, but getting the translation from Stan right is proving a bit trickier than I hoped.

ricardoV94 · 2023-06-14T15:49:35Z

examples/generalized_linear_models/GLM-discrete-choice_models.ipynb

@@ -0,0 +1,5710 @@
+{


You can use pm.math.softmax to avoid that annoying warning

Reply via ReviewNB

Thanks, yeah I'll adjust that. I think the modelling strategy is coming along now. I got overly hung up on the long-data format stan was using. Sticking with the wide data format for the minute. Seems more transparent to me and better pedagogically.

…andom effects

… feat

NathanielF · 2023-06-16T10:51:16Z

Marking this one as ready for review @twiecki and @ricardoV94.

I've found a representation of the discrete choice models that i'm pretty happy with. I'm demonstrating them on two different but canonical data sets.

(1) On the choice of heating systems, where i focus just on specifying the utility matrix for the individual alternatives
(2) On the choice of crackers with repeated decisions over the same decision maker.

In (1) i demonstrate how you can add alternative specific parameters e.g. intercepts and beta parameters for income on the specific alternative.

In (2) i focus on how you can add person specific modifications of utility and how you can use prior constraints in the Bayesian context to ensure that the parameter estimates "make sense" i.e. negative parameter estimates for the effect of price on utility.

All models fit well, and in reasonable time. However, in (2) i've truncated the data set a little because i ran into a bracket nesting error on the full data set. Would keen to know how to replace my for-loop here with scan.... but i wasn't sure how to do that...,.?

I think i'll likely add some more to the text-write up, but would like some interim feedback if you have any on the modelling design.

@drbenvincent in case of interest.

…olicy change

… plots

review-notebook-app · 2023-06-19T17:19:18Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:18Z
----------------------------------------------------------------

Line #21.        p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))

We shouldn't wrap anything in a Deterministic that we are not going to use. For large models/datasets this can slowdown sampling quite a lot and make it seem "slower" than it actually is.

NathanielF commented on 2023-06-19T17:42:53Z
----------------------------------------------------------------

But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I can remove the Deterministic wrap the utilities 'u' for the reason you mention...

NathanielF commented on 2023-06-19T19:51:34Z
----------------------------------------------------------------

Removed any redundant Deterministics.

review-notebook-app · 2023-06-19T17:19:19Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:18Z
----------------------------------------------------------------

Define what is a "marginal rate of substitution"?

NathanielF commented on 2023-06-19T17:43:40Z
----------------------------------------------------------------

Yep. Will adjust next commit.

NathanielF commented on 2023-06-19T19:50:32Z
----------------------------------------------------------------

Resolved

review-notebook-app · 2023-06-19T17:19:20Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-06-19T17:19:19Z
----------------------------------------------------------------

Line #20.        for id, indx in zip(uniques, range(len(uniques))):

Why is the loop needed? Couldn't grasp immediately why fancy index can't do the job.

Do you have a non-square matrix of some sort? If so can it be padded with zeros or whatever to make it work as if was square?

Reducing the number of stacks would probably speedup a lot this kind of model

NathanielF commented on 2023-06-19T17:53:30Z
----------------------------------------------------------------

It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?

I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:

| Person ID | Choice ID | Choice |

|-----------|-----------|---------|

| Person 1 | 1 | Nabisco |

| Person 1 | 2 | Keebler |

| Person 2 | 1 | Nabisco |

| Person 2 | 2 | Nabisco |

| Person 2 | 3 | Nabisco |

So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!? I'll experiment a little more, but the loop was primarily because it was easier for me to think through and follow....

NathanielF commented on 2023-06-19T19:00:36Z
----------------------------------------------------------------

Yeah, ok I think i have it now.

NathanielF commented on 2023-06-19T19:51:11Z
----------------------------------------------------------------

Updated in latest commit. Thanks for the push. This is much neater.

NathanielF · 2023-06-19T17:42:54Z

But i do use the probabilities 'p' in nearly every plot, It's kind of one of the main quantities of interest. I didn't wrap the utilities for the reason you mention...

View entire conversation on ReviewNB

NathanielF · 2023-06-19T17:43:41Z

Yep. Will adjust next commit.

View entire conversation on ReviewNB

NathanielF · 2023-06-19T17:53:32Z

It's possibly not needed. I wrote it in the way which was most intuitive to me. I'd be keen to understand how a fancy index approach could work?

I don't think we have any non-square matrices that we're looking to construct. But we do have varying length sets of rows per person. So we might have something like:

| Person ID | Choice ID | Choice |

|-----------|-----------|---------|

| Person 1 | 1 | Nabisco |

| Person 1 | 2 | Keebler |

| Person 2 | 1 | Nabisco |

| Person 2 | 2 | Nabisco |

| Person 2 | 3 | Nabisco |

So i'm building per person 3 equations and i want to add an alternative specific beta-coefficient to the alternative specific intercept alpha. In this case i'd need the person specific beta coefficient to be applied to the first two rows and then the next three another coefficient... It was just allot of indexes to keep in my head, but if you have ideas about how to make it cleaner, that'd be great!?

View entire conversation on ReviewNB

NathanielF · 2023-06-19T18:24:38Z

Oh actually i think i see it now!

View entire conversation on ReviewNB

…d indexing for final model instead of for loop.

NathanielF · 2023-06-19T19:00:37Z

Yeah, ok I think i have it now.

View entire conversation on ReviewNB

…e plot. Tidying

NathanielF · 2023-06-19T19:50:33Z

Resolved

View entire conversation on ReviewNB

NathanielF · 2023-06-19T19:51:12Z

Updated in latest commit. Thanks for the push. This is much neater.

View entire conversation on ReviewNB

NathanielF · 2023-06-19T19:51:35Z

Removed any redundant Deterministics.

View entire conversation on ReviewNB

NathanielF · 2023-06-19T19:54:39Z

Incorporated all those changes now @ricardoV94 , thanks for the nudge on the indexing. It's much neater now than using the for loop. Plus I was able to use the full crackers dataset and it fits fast!.

…ed dependencies

NathanielF · 2023-07-03T09:33:08Z

Giving this one another gentle nudge: @twiecki and @ricardoV94

Let me know if there is some concern or anything outstanding?

OriolAbril

added a couple xarray comments, only skimmed the notebook so far but it looks great

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md

NathanielF · 2023-07-05T19:59:51Z

Made those changes @OriolAbril . Thanks for the nudge. Much neater

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md

OriolAbril

left some minor comments after reading the notebook better. This is a great notebook!

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md

…feedback

NathanielF · 2023-07-13T16:23:02Z

Thanks @OriolAbril for the review. I've made those changes now. I've slightly updated and i think improved the final plot and I think it should be ready now for merging.

OriolAbril · 2023-07-13T17:10:05Z

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md

+interval_dfs = pd.concat(interval_dfs, axis=1)
+interval_dfs = interval_dfs.sort_values("means_nabisco")
+interval_dfs.head()
+```


It looks like the fact this is a DataFrame isn't really used. A potential alternative could be:

beta_individual = idata_m4["posterior"]["beta_individual"].rename(alt_intercept="brand") predicted = beta_individual.mean(("chain", "draw")) predicted = predicted.sortby(predicted.sel(brand="nabisco")) ci_lb = beta_individual.quantile(0.025, ("chain", "draw")).sortby(predicted.sel(brand="nabisco")) ci_ub = beta_individual.quantile(0.975, ("chain", "draw")).sortby(predicted.sel(brand="nabisco"))

then use predicted.sel(brand=brand) below instead of df["means_{brand}"] and similar. I haven't tried and might have messed the rename swapping key-value thing. Let me know what you thing, is the code still clear?

Thanks @OriolAbril I've changed that again. It's much less verbose to use the xarray approach. I guess i think the tradeoff for the reader is in the familiarity with pandas. I often think if i can just show the head of the data frame the reader will more quickly get what i'm doing. In this case i don't think it's that crucial which approach is used and i've swapped to use the xarrays.

I am not opposed to DataFrames, here the data has already been reduced significantly, so it is not very cumbersome to convert to it. I always try to check the xarray alternative for two reasons.

First and foremost because if we identify tasks that are too difficult to do with xarray objects we can work on fixing that. It also helps me grok xarray (to the point now after some time I can confidently post xarray code I haven't run with some guarantees it will run or nearly run).

Second, because we all know numpy and pandas very well (or most of us quite well at least), so we definitely know how to do X with numpy/pandas, so converting our data to that is often faster/easier to do (even if more verbose) because we know after that we'll be able to do X, whereas if we have to fight with xarray and its docs it might end up taking twice as time. However, if the docs do this, I feel users get the impression that it is not worth it to fight with xarray objects/docs and have no easily accessible examples of how they are useful to them, which is incoherent with the deafult returned object being an InferenceData

Makes total sense!

Thanks again for your time on this one! I really appreciate it.

OriolAbril · 2023-07-13T20:33:52Z

Also, not sure if you are already aware, but while the author metadata isn't very prominently displayed, it is used for aggregating posts by author for example, here is the link to your page: https://www.pymc.io/projects/examples/en/latest/blog/author/nathaniel-forde.html

NathanielF · 2023-07-14T08:37:50Z

Also, not sure if you are already aware, but while the author metadata isn't very prominently displayed, it is used for aggregating posts by author for example, here is the link to your page: https://www.pymc.io/projects/examples/en/latest/blog/author/nathaniel-forde.html

I did not know that! That's lovely to see!

NathanielF added 3 commits April 22, 2023 21:43

[Discrete Choice pymc-devs#543] early commit exploring model specs

45ba298

[Discrete Choice pymc-devs#543] added hierarchical covariance structure

65fb575

[Discrete Choice pymc-devs#543] slow model run crackers data

6521f5b

Signed-off-by: Nathaniel <[email protected]>

NathanielF added 2 commits May 22, 2023 07:37

[Discrete Choice pymc-devs#543] trying a simpler appraoch

02bb08b

[Discrete Choice pymc-devs#543] trying a simpler approach again

47c9761

twiecki reviewed Jun 9, 2023

View reviewed changes

NathanielF added 2 commits June 13, 2023 20:25

[Discrete Choice pymc-devs#543] working example with heating df

f0eb988

[Discrete Choice pymc-devs#543] tidying model iterations

ad7f156

ricardoV94 reviewed Jun 14, 2023

View reviewed changes

NathanielF added 5 commits June 14, 2023 22:15

[Discrete Choice pymc-devs#543] adding some explanatory text

7418b95

[Discrete Choice pymc-devs#543] updated plots and headings

c8f9dc4

[Discrete Choice pymc-devs#543] added cracker choice and individual r…

257d471

…andom effects

[Discrete Choice pymc-devs#543] added prior constraint on price

cc8a2c7

[Discrete Choice pymc-devs#543] added prior constraint on display and…

e4efca1

… feat

NathanielF marked this pull request as ready for review June 16, 2023 10:42

NathanielF requested a review from ricardoV94 June 16, 2023 10:52

NathanielF added 6 commits June 16, 2023 20:10

[Discrete Choice pymc-devs#543] adding reference and counterfactual p…

e464e12

…olicy change

[Discrete Choice pymc-devs#543] fixed ref

23805b6

[Discrete Choice pymc-devs#543] tidied up final plots

a819a0f

[Discrete Choice pymc-devs#543] tidied some writing

3fe5058

[Discrete Choice pymc-devs#543] fixed some writing, added exploratory…

5126233

… plots

[Discrete Choice pymc-devs#543] more tidying

fa15f52

[Discrete Choice pymc-devs#543] adjusted with Ricardo's comments. Use…

c299e4b

…d indexing for final model instead of for loop.

[Discrete Choice pymc-devs#543] fixed label ordering in cracker choic…

0d4df2b

…e plot. Tidying

NathanielF added 2 commits June 20, 2023 21:52

[Discrete Choice pymc-devs#543] fixed data imports try except and add…

e49d596

…ed dependencies

[Discrete Choice pymc-devs#543] added include extra installs markdown

f06d679

NathanielF requested a review from twiecki June 26, 2023 08:51

OriolAbril reviewed Jul 4, 2023

View reviewed changes

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md Outdated Show resolved Hide resolved

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md Outdated Show resolved Hide resolved

NathanielF added 3 commits July 5, 2023 20:20

[Discrete Choice pymc-devs#543] tidied xarray dims usage as per comments

c3671f8

[Discrete Choice pymc-devs#543] updated metadata with myst ignore

fb25e6d

[Discrete Choice pymc-devs#543] updated jupytext toml

f19f39e

OriolAbril reviewed Jul 9, 2023

View reviewed changes

examples/generalized_linear_models/GLM-discrete-choice_models.myst.md Show resolved Hide resolved

NathanielF requested a review from OriolAbril July 12, 2023 09:09

OriolAbril reviewed Jul 13, 2023

View reviewed changes

[Discrete Choice pymc-devs#543] updated final plot and adjusted with …

baf5029

…feedback

OriolAbril reviewed Jul 13, 2023

View reviewed changes

[Discrete Choice pymc-devs#543] swapped dataframe for xarray

662c164

OriolAbril approved these changes Jul 13, 2023

View reviewed changes

OriolAbril merged commit b94c073 into pymc-devs:main Jul 13, 2023

Uh oh!

Discrete choice #544

Discrete choice #544

Uh oh!

Conversation

NathanielF commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Discrete Choice Modelling

Helpful links

Uh oh!

review-notebook-app bot commented Apr 26, 2023

Uh oh!

twiecki Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NathanielF Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NathanielF Jun 14, 2023

Choose a reason for hiding this comment

Uh oh!

NathanielF commented Jun 16, 2023

Uh oh!

review-notebook-app bot commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jun 19, 2023

Uh oh!

NathanielF commented Jul 3, 2023

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NathanielF commented Jul 5, 2023

Uh oh!

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanielF commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OriolAbril Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

NathanielF Jul 13, 2023

Choose a reason for hiding this comment

NathanielF commented Apr 26, 2023 •

edited

Loading

twiecki Jun 9, 2023 •

edited

Loading

ricardoV94 Jun 14, 2023 •

edited

Loading

review-notebook-app bot commented Jun 19, 2023 •

edited

Loading

review-notebook-app bot commented Jun 19, 2023 •

edited

Loading

review-notebook-app bot commented Jun 19, 2023 •

edited

Loading

NathanielF commented Jun 19, 2023 •

edited

Loading

NathanielF commented Jul 13, 2023 •

edited

Loading