Description
Hi !
I would love to use this crate in my application, but I have a single issue with it, which is that the ffi bindings are generated at compile time for the specific mpi implementation used by using bindgen.
While I like bindgen, it has a lot of issues, especially when using it at compile time: the final user needs to have libclang installed, at the right version and at the right place to be sure that code generation works.
I think this can be a big hurdle when trying to use this crate. Here are a few examples:
- I had hard time just compiling the example, because I am using CentOS 7, which comes with clang 3.4, which is too old for bindgen. Then I tried using linuxbrew to get a more modern libclang, and got weird errors concerning GLIBC_VERSION symbols.
- On a big HPC cluster, there is a very low probability to have a modern libclang installed, and most users will not bother trying to install one. This prevent them from using any application based on this crate.
- Just on this repo, issues Compilation failed, 'stddef.h' file not found #1, Bindgen Cargo dependencies cause rsmpi build to fail #6, I couldn't compile example #20 and No-longer builds with openMPI 1.1.x #22 are build errors related to bindgen.
Why is bindgen used
This is my understanding of why bindgen is used at compile time in this crate, please correct me if I am wrong!
MPI does not have any stable ABI, only a specification and a C API. This crate uses a small C shim to ensure #define
d symbols are available to Rust, and bindgen to parse the mpi headers and then generate corresponding ffi declaration for Rust.
Removing bindgen dependency
I think it could be possible to remove bindgen dependency at build time by pre-generating the FFI declaration for all the different MPI implementations and versions, and then detecting which one to use in build.rs
by parsing the output of mpicc
.
Of course generating the declaration for every single implementation and every single version is not going to be practical, and thus one could generate the declaration for some implementation/versions couples (starting with the latest release of OpenMPI and MPICH for example), and then defaulting to use bindgen for all the others cases.This would keep the benefits of having an easy way to use this crate, even with exotic MPI implementation, while having smaller build time and simpler build for 80% of the cases.
Please tell me what you think of this! Is there something I overlooked?
If you agree with this proposal, I could try to implement it, I have some experience with bindgen and Rust FFI.
Activity
bsteinb commentedon Nov 6, 2017
Hi there!
I understand your concerns about having
bindgen
as a build dependency. There is indeed some tension between the policies regarding selection of software versions in HPC environments (where often in my experience a somewhat conservative choice is made) and having a quite recent version of Clang as a requirement. I do not think it is quite as bad as you make it seem (I am not convinced #20 or #22 are connected tobindgen
), but I concede that not havingbindgen
as a build dependency would be less painful (for build times alone).Your observations as to why
bindgen
is used inrsmpi
are correct, as is your conjecture that the FFI declarations could – in principle – be pre-generated and shipped alongrsmpi
. The problem with this strategy is precisely as you sayand
I rarely get around to working on
rsmpi
and I do not think the project is at a point where I should devote my time to simplifying the installation procedure for production users. However, if you offer to make a contribution in this area, I am inclined to accept it, especially since – as it only concerns the build infrastructure – it should not influence refactorings of the library itself.An acceptable contribution should – I think – contain at least the following:
bindgen
) should still be available, one the default (preferably the pre-generated FFI declarations) and the other via a cargo feature.bindgen
mechanism to add new pre-generated FFI declarationsIf this list of requirements makes this task too daunting, I completely understand, I already admitted that I am also not willing to work on this (at least for now). However I feel like anything less would only make this a brittle work-around. If you do still want to work on it, go for it!
Something that could make this task easier is this initiative by the MPICH project which aims to offer a somewhat stable ABI across certain versions of MPICH and various other MPI libraries based on it http://www.mpich.org/abi/. However, the information on that page seems to be a bit stale. Similar information for Open MPI can be found here https://www.open-mpi.org/software/ompi/versions/.
Luthaf commentedon Nov 6, 2017
Thank you for the quick answer!
Yes, this is how I see this implemented too.
I feel like this would be the hardest part. I don't really know that much about MPI, so I guess the following would be relevant:
I don't see rustc version relevant here, the FFI will always work the same due to backward compatibility requirements. But as we are talking about C software here, maybe the C compiler used will have an influence? Or some compiler flags maybe too (like how in fortran you can specify the default size of integer at the command line)? Or is this all abstracted by
mpicc
?This should be as easy as copying the generated file and adding some lines in build.rs to emit the corresponding
cfg
.I did not knew about this, this is very nice! Does this mean that all the listed implementation on the MPICH page are ABI compatible? And maybe the compatibility extends to the following compatible releases (for some definition of compatible ^^)
bsteinb commentedon Nov 7, 2017
I agree this is probably the largest chunk of work. It is not really about MPI though. None of this is specified by the standard.
Yes, note that MPI version here means MPI library version (as in Open MPI 1.10.0) not the version of the MPI standard.
Yeah, I am pretty sure that it is not of concern at the moment. I do think
bindgen
can emit things that go beyond just type declarations, likeimpl
s forClone
. One could think of a scenario where this extra stuff could in the future come to rely onrustc
features that are not backwards compatible. Probably I am just being too paranoid.Your guess is as good as mine. I would say if the headers of an MPI library built with two different C compilers are the same than the compiler does not matter.
One other thing I just thought of: it might not be legal to distribute pre-generated FFI declarations that are based on header files of some of the commercial MPI libraries. E.g. the
mpi.h
shipped with Intel MPI contains some writing that makes me very reluctant to distribute something that is based on it.Luthaf commentedon Nov 13, 2017
I did not thought of this =/ Yeah, it might be hard to distributed some of the bindings.
For Intel MPI specifically, it looks like it is based on MPICH, so we might get around the issue by using the same bindings for MPICH and Intel.
But maybe rust-lang/rust-bindgen#918 is a better solution to this problem. Bundling libclang would make most of my initial problems go away. I'll try to investigate both solutions.
bsteinb commentedon Jan 30, 2018
There has been no movement here or over in the
bindgen
issue (which I have subscribed to) since November. Closing this for now.AndrewGaspar commentedon Mar 1, 2018
I've got an idea for a perhaps more tractible solution to this problem.
We could add some tool that generates a vendored version of mpi-sys (as a tar ball or something like that). Then the user can use the
[patch]
directive to replacempi-sys
with their vendored version. My impression is that HPC systems tend to have a finite combination of compiler-MPI-version tuples, so you could easily pre-generate thempi-sys
for each MPI version you need once. When new MPI versions or compiler versions are added, you can just re-vendor.The benefits of this are:
Downsides:
mpi-sys
versions on to the user[0] Produced using
cargo graph --optional-deps false
(note: libffi also depends on bindgen).Luthaf commentedon Mar 2, 2018
This should work but I am not sure this is the best solution.
This is a bit of a bummer, because the idea was to completely get rid of the hard to install libclang. Plus pushing management of mpi-sys on the user is less than ideal if you want to have users who are nor rust developers.
Luthaf commentedon Mar 19, 2018
Just though of another possible fix for the problem here: shipping an implementation of MPI (possibly MPICH/OpenMPI) with rsmpi itself.
This would be under an optional feature, and the MPI implementation would be compiled before compiling rsmpi. This means that we can control and ship the bindgen output for this particular, blessed implementation of MPI.
I am not sure if this could work, or if I am just showing my complete lack of understanding of MPI, but it look like one can install and use it's own MPI implementation on a cluster, without relying on the one provided with the cluster.
What do you think?
bsteinb commentedon Mar 24, 2018
Well, shipping an open source implementation of MPI with the
mpi-sys
crate is possible. Although:mpi.h
(and thus a stable output ofbindgen
that we can ship withmpi-sys
) every time.bindgen
and its dependencies working.I will try to do some experiments regarding the first point by installing Open MPI or MPICH an different platforms and seeing whether the resulting
mpi.h
andbindgen
output are compatible.marmistrz commentedon Mar 29, 2018
For student cluster competitions we always build OpenMPI 3.0 from source because CentOS ships the ancient 1.x version (and the difference in performance is significant)
On production clusters you often have to build your own version of compilers, etc. because admins don't want to put the newer version, even as a
module
module. I have rebuilt GCC and binutils myself.Luthaf commentedon Feb 27, 2020
I have an other idea to fix the issue here. My understanding of the problem is that some MPI types use different ABI in different implementations, meaning that is it not possible to assume all mpi.h are equivalent.
A possible way to work around this would be to provide a small shim around MPI functions where rsmpi would be in control of the ABI, which would call into the local MPI installation.
Something like this
Then, this file would be compiled with user-provided
mpicc
, taking care of all the specific for the current MPI installation. We could also ship bindgen's generatedextern
function definitions, since we would control them.The main drawback I can see with this approach are
What do you think?
AndrewGaspar commentedon Feb 28, 2020
I think there's some merit to the idea - I've thought about doing this in the past.
A couple things:
Though a possibility could be that you compile and run a program as part of
build.rs
than outputs all the handle type sizes.Luthaf commentedon Feb 28, 2020
This is another alternative, and how the Julia bindings to MPI do it. You may also want to get the alignment of the types right. I don't know how one would create a fully opaque type with given size and alignment in safe rust though.
jedbrown commentedon Feb 9, 2022
I think it's more appropriate for a wrapper layer to live outside rsmpi. This project, for example, has been around for a while, but is now nicely licensed.
https://github.com/cea-hpc/wi4mpi
At this point, I think avoiding bindgen has nontrivial maintenance costs and a specialized
mpi-sys
is the way to go if it's important to you. If you do create a specializedmpi-sys
, we can add it to CI. I'll close this issue now, but feel free to reopen if you think that's inappropriate or you would like to put some effort toward a different strategy.