Integration with InferenceObjects.jl

On Twitter, @yebai suggested adding integration with [InferenceObjects](https://github.com/arviz-devs/InferenceObjects.jl) to MCMCChains: https://twitter.com/Hong_Ge2/status/1560343482216103938. I'm opening this issue for further discussion.

`InferenceObjects.InferenceData` is the storage format for Monte Carlo draws used by ArviZ.jl. Along with Python's `arviz.InferenceData`, it follows the cross-language [InferenceData schema](https://python.arviz.org/en/latest/schema/schema.html). PyMC uses Python's implementation as its official sample storage format. `InferenceData` can be serialized to NetCDF to standardize communicating results of Bayesian analyses across languages and PPLs. In Julia, it is built on DimensionalData. See [example usage](https://julia.arviz.org/stable/working_with_inference_data/) and [plotting examples](https://julia.arviz.org/stable/creating_custom_plots/) (using the Tables interface).

@yebai's suggestion is ultimately to deprecate `Chains` to instead use `InferenceData`. I see several upsides of this approach:
1. `Chains` is based on the somewhat outdated AxisArrays, while DimensionalData is more modern.
2. `Chains` flattens all draws and sampling statistics into a single 3D float array, which discards a lot of the structure of the sampled types (which may themselves be multidimensional or have non-float eltypes, such as `Int` or even `Cholesky`).
3. `InferenceData`'s features are a superset of `Chains`. It can get closer to the original structure of the user's samples with named dimensions, but it also supports storing other metadata and can store prior, predictive, log-likelihood, and warmup draws, as well as the original data.
4. `InferenceObjects` is a relatively light dependency (~0.120-0.2s load time on Julia v1.7-1.8 vs MCMCChains with 1.7-3.6s) so would not add much to MCMCChains's load time.

Currently ArviZ.jl has a converter `from_mcmcchains`, which is used to convert `Chains` to `InferenceData`. Integration between `Chains` and `InferenceData` might look like the following steps:
1. Move `ArviZ.from_mcmcchains` here (with a better name)
2. Make `InferenceData` a supported `chain_type` for `AbstractMCMC.sample` (https://beta.turing.ml/AbstractMCMC.jl/dev/api/#Chains), which would bypass `Chains`'s flattening entirely. I'm not sure this should live here, but it should not live in InferenceObjects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integration with InferenceObjects.jl #381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integration with InferenceObjects.jl #381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions