Description
On Twitter, @yebai suggested adding integration with InferenceObjects to MCMCChains: https://twitter.com/Hong_Ge2/status/1560343482216103938. I'm opening this issue for further discussion.
InferenceObjects.InferenceData
is the storage format for Monte Carlo draws used by ArviZ.jl. Along with Python's arviz.InferenceData
, it follows the cross-language InferenceData schema. PyMC uses Python's implementation as its official sample storage format. InferenceData
can be serialized to NetCDF to standardize communicating results of Bayesian analyses across languages and PPLs. In Julia, it is built on DimensionalData. See example usage and plotting examples (using the Tables interface).
@yebai's suggestion is ultimately to deprecate Chains
to instead use InferenceData
. I see several upsides of this approach:
Chains
is based on the somewhat outdated AxisArrays, while DimensionalData is more modern.Chains
flattens all draws and sampling statistics into a single 3D float array, which discards a lot of the structure of the sampled types (which may themselves be multidimensional or have non-float eltypes, such asInt
or evenCholesky
).InferenceData
's features are a superset ofChains
. It can get closer to the original structure of the user's samples with named dimensions, but it also supports storing other metadata and can store prior, predictive, log-likelihood, and warmup draws, as well as the original data.InferenceObjects
is a relatively light dependency (~0.120-0.2s load time on Julia v1.7-1.8 vs MCMCChains with 1.7-3.6s) so would not add much to MCMCChains's load time.
Currently ArviZ.jl has a converter from_mcmcchains
, which is used to convert Chains
to InferenceData
. Integration between Chains
and InferenceData
might look like the following steps:
- Move
ArviZ.from_mcmcchains
here (with a better name) - Make
InferenceData
a supportedchain_type
forAbstractMCMC.sample
(https://beta.turing.ml/AbstractMCMC.jl/dev/api/#Chains), which would bypassChains
's flattening entirely. I'm not sure this should live here, but it should not live in InferenceObjects.