Skip to content

Integration with InferenceObjects.jl #381

Closed
@sethaxen

Description

@sethaxen

On Twitter, @yebai suggested adding integration with InferenceObjects to MCMCChains: https://twitter.com/Hong_Ge2/status/1560343482216103938. I'm opening this issue for further discussion.

InferenceObjects.InferenceData is the storage format for Monte Carlo draws used by ArviZ.jl. Along with Python's arviz.InferenceData, it follows the cross-language InferenceData schema. PyMC uses Python's implementation as its official sample storage format. InferenceData can be serialized to NetCDF to standardize communicating results of Bayesian analyses across languages and PPLs. In Julia, it is built on DimensionalData. See example usage and plotting examples (using the Tables interface).

@yebai's suggestion is ultimately to deprecate Chains to instead use InferenceData. I see several upsides of this approach:

  1. Chains is based on the somewhat outdated AxisArrays, while DimensionalData is more modern.
  2. Chains flattens all draws and sampling statistics into a single 3D float array, which discards a lot of the structure of the sampled types (which may themselves be multidimensional or have non-float eltypes, such as Int or even Cholesky).
  3. InferenceData's features are a superset of Chains. It can get closer to the original structure of the user's samples with named dimensions, but it also supports storing other metadata and can store prior, predictive, log-likelihood, and warmup draws, as well as the original data.
  4. InferenceObjects is a relatively light dependency (~0.120-0.2s load time on Julia v1.7-1.8 vs MCMCChains with 1.7-3.6s) so would not add much to MCMCChains's load time.

Currently ArviZ.jl has a converter from_mcmcchains, which is used to convert Chains to InferenceData. Integration between Chains and InferenceData might look like the following steps:

  1. Move ArviZ.from_mcmcchains here (with a better name)
  2. Make InferenceData a supported chain_type for AbstractMCMC.sample (https://beta.turing.ml/AbstractMCMC.jl/dev/api/#Chains), which would bypass Chains's flattening entirely. I'm not sure this should live here, but it should not live in InferenceObjects.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions