Description
Overview
Let's get numerical integration into stdlib. As a baseline, here's what I'd want in a general-purpose integration module.
- Interfaces to QUADPACK routines, both simple wrappers and a more modern interface a la SciPy's
scipy.integrate.quad
. - Routines to integrate arrays as if they are tabulated function values. At a minimum, we'd want trapezoid rule and Simpson's rule.
- Some niche or less well-known rules that are still useful. Examples might be endpoint-corrected discrete Fourier transforms (see Numerical Recipes Sec. 13.9) and double-exponential rules.
The idea would be to provide easy-to-use and reasonably general numerical integration routines that cover 95% of what people need. Some things which are out of scope in my opinion:
- ODE integration
- Monte Carlo integration
- Basis representations of abstract integral operators
I envision having two modules: one being a light f90 wrapper for QUADPACK which expert users can use, the other being the main module with a nicer interface to QUADPACK's functionality, as well as array-integrating routines.
Integrating functions
I've already written wrappers and tests for "non-expert" QUADPACK routines (link). All these do is create explicit interfaces and give names to magic numbers, e.g., error codes and integrand weight selectors. It will need a little refactoring to port to stdlib, but I am pretty confident that my module quadpack_interface
uses all the right types, kinds, and intents to agree with the F77 routines. No need to duplicate that effort here.
From a modern programmer's perspective, QUADPACK has some unattractive qualities:
- Explicitly passed workspace arrays with sizes requirements that are documented but not enforced in code. These mattered in the days of tight memory constraints, but nowadays, we can relieve the caller of this burden.
- Out-parameters that are not usually useful should be made optional.
- Technical-detail in-parameters should be made optional with sensible default values.
- The integrand function must be a scalar function of one variable. This can make it clumsy to integrate functions with additional parameters, and doing multiple integrals by nested QUADPACK calls gets cumbersome.
- Routines for weighted integrands can be greatly simplified and clarified. For instance,
qaws
implements weight functions that remove algebraic/logarithmic singularities at the integration endpoints. The type of singularity is controlled by an integer argument, whose value affects the meaning of other arguments.
Solving these issues should be the motivating goal of a module that delivers high-level integration functions that call down to the appropriate QUADPACK subroutines, internally handling the magic numbers, workspace array allocation, and translating return codes to proper runtime errors if necessary. I have some design sketches that I'll talk about in a later post.
Integrating arrays
Integrating arrays is a simpler task. We just need to decide how featureful we want to be with this. For reference, NumPy/SciPy provide
numpy.trapz
: trapezoidal rule with either equidistant or arbitrary abscissas.scipy.simps
: Simpson's rule with either equidistant or arbitrary abscissas.scipy.romb
: Romberg integration, which is restricted to equidistant abscissas whose length is2**k + 1
, e.g., 1, 3, 5, 9, 17...scipy.cumtrapz
: cumulative trapezoidal rule with optional initial value
We should have all these, except maybe Romberg integration. I don't think I have ever used it "for real", but maybe other people can make a case for keeping it. To this basic set of routines, I'd also like to add companion functions to trapz
and simps
that return the weights for a particular array of abscissas. This is useful for integrating many different arrays that represent functions tabulated on a common grid. Once you have the weights from, e.g., w = trapz_weights(x)
, then computing the integral is as simple as integral = dot_product(y, w)
.
Details
- Are there any licensing show-stoppers with QUADPACK? You always wonder with these Netlib libraries.
- We need to decide (elsewhere) what conventions to adopt for wrapping F77 codes. Relevant aspects here are
- translating
real
anddouble precision
to appropriate real kinds - explicit interfaces for
external
procedure - managing workspace arrays (Module-level or subprogram-level? With or without
save
?)
- translating
Activity
certik commentedon Jan 19, 2020
I would suggest to concentrate on the API. We can use QUADPACK as an implementation, or we can use something else; either way I would consider that secondary. The API however, is essential.
Here is an implementation of Gaussian quadrature that I have been using in all my codes: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/quadrature.f90, it returns the points
x
and weightsw
, and the user does the actual integration usingsum(f(x)*w)
.ivan-pi commentedon Jan 20, 2020
Scipy is using the QUADPACK routines and is itself under the BSD license, so it should be possible.
https://github.com/scipy/scipy/tree/v1.4.1/scipy/integrate/quadpack
There is a great Python package with tons of references for different quadratures: https://github.com/nschloe/quadpy
For generating classic Gaussian quadratures there are a lot of legacy codes available, unfortunately none of them have permissive licenses:
http://morespace.unimore.it/danielefunaro/routines/
https://dl.acm.org/doi/abs/10.1145/35078.214351
https://dl.acm.org/doi/10.1145/174603.174605
https://www.cs.purdue.edu/archives/2001/wxg/codes/ORTHPOL
https://www.tildentechnologies.com/Numerics/index.html (no license info)
Given the large palette of Gaussian quadratures perhaps they would fit better into a separate package, but simple rules like the trapezoidal and Simpson methods could be part of stdlib.
nshaffer commentedon Jan 20, 2020
@ivan-pi Thanks for the research! I am in the process of writing up my "vision" for this module, but I want to respond to one specific point. I think the huge variety of existing implementations should not stop us from defining canonical interfaces. For instance, we should not provide routines that are explicitly Gauss-Kronrod integrators or Clenshaw-Curtis integrators, or Bill-and-Ted integrators. We should have routines for "integrate f(x) from a to b, with both a and b finite" and "integrate w(x)*f(x) from a to infinity with w(x)=sin(c*x)". Then specific implementations can decide how to map the different cases to actual quadrature rules. Obviously, we will have to provide reference implementations, but that's OK.
On the other hand, I agree that fancy quadrature routines are a good candidate for splitting off into a separate package once stdlib matures. At that point, we can open the doors to specializations like "clenshaw_curtis_quadrature.f90" and "gauss_kronrod_quadrature.f90".
nshaffer commentedon Jan 26, 2020
Here's a first cut at some interfaces.
Trapezoidal rule for arrays
Integrate arrays with trapezoidal rule, either with implicit abscissas (assumed equally spaced) or explicit abscissas
Trapezoidal weights for given abscissas
Simpson's rule for arrays
Integrating arrays with Simpson's rule should look the same as the trapezoidal rule. The only catch is the need to do something for the case of even-length arrays. There are three approaches to choose from:
Option 1 is the simplest but also most restrictive. Option 2 is the most flexible, but a good API probably requires that all the
simps_*
routines take a third optional argument that gives the user some control over how the patching occurs. Option 3 is simple but may not count as "Simpson's rule" from a purist's perspective.Adaptive integration of functions
Adaptive integration of functions can be implemented in many, many ways. This is where it's most useful to specify a fairly general API and let specific implementations decide what algorithms to use. I think adaptive 1-D integration should let users request
An actual implementation does not need to honor all requests literally. For instance, if an integrator detects that the integrand decays rapidly as x->infinity, it should be allowed to truncate at a "big enough" x that it thinks will satisfy the requested tolerance. Or, if the user specifies a weight function, it should not be required that the implementation actually use that information in deciding what specific quadrature rule to use.
Here are some examples of what I think adaptive integration calls should look like (not being careful about real kinds for now)
In all cases,
f
is a function with the interfaceThis is the simplest thing to start with and can be changed depending on the findings in #117.
There are two further non-obvious things illustrated: weights and infinite integration limits.
Weighted integrands
I suggest that weights be specified as instances of
class(weight_t)
, whose extensions allow for parameterization of the weight. For instance, the definition ofsin_weight
might beand the definition of
pole_weight
might beThere are lots of other weight functions one could consider including, e.g., the weights corresponding to the classical orthogonal polynomials. Making each one a distinct type makes it easier to implement weight-specific quadrature rules.
Infinite integration bounds
"Infinity" is a non-standard concept outside the IEEE modules. I opened some discussion on different mechanisms for representing infinite values in #118. Some consensus needs to be reached there before we can decide what do to for (semi-)infinite integration domains here.
Non-adaptive integration of functions
The approach @certik shared for non-adaptive quadrature looks good to me. The strategy of returning the abscissas and weights make a lot of sense. Not only does it avoid recalculation for the case of doing many similar integrals, but it also allows more flexibility in the integrand function. I think we should also offer corresponding all-in-one functions that generate the weights and abscissas internally, akin to having both
trapz
andtrapz_weights
.One case that doesn't work well with the "explicit summation" approach is when the caller wants a rule for infinite or semi-infinite domains. This is because there is no standard way to refer to "infinity" (again, see #118).
certik commentedon Jan 27, 2020
I think generally this API looks good to me.
Regarding your last point, I agree we should do both --- returning the points and weights as well as have a higher level API that hides it from the user and instead requests the user to provide a callback.
Regarding returning points and weights for an infinite domain, I've used exactly the same API as for a finite domain. Here is how it works:
https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/quad_laguerre.f90#L11
I only implemented a few orders. It returns the points on an infinite interval
[0, oo)
.fiolj commentedon Jan 27, 2020
Hi, I've just come across this proposal. I've been working on trying to produce something similar with some of the methods I've been in contact. My idea is/was to make an interface similar to numpy/scipy in a modern fortran, with good documentation and examples. The idea is that the project is open and driven by a community, but at this point is just starting with my vision of it. So, If possible, I'd rather merge/contribute in this ongoing project with a shared vision.
Currently, for integration I implemented:
All integrators, work for real and complex functions, for functions f(x) and f(x,args) (a single interface)
The project is at https://github.com/numericfor/numfor with documentation at https://numericfor.github.io/numfor/
Of course, working alone, I've made some decision about some of the topics that are being discussed here (API, treatment of infinite, etc).
If something of it is useful, I'd be glad to help to integrate it to stdlib and work with the community to adapt it to a consensus API.
Regards,
Juan
certik commentedon Jan 27, 2020
Hi @fiolj, welcome! Yes, that would be awesome if you could contribute. Indeed, the main value of
stdlib
is that we agree on the API as a community. The actual implementation is straightforward, as we have all done that already or can relatively easily do.If you want to help, go ahead and help us reach an agreement regarding APIs and feel free to propose everything that you already implemented. You can follow the WORKFLOW.
fiolj commentedon Jan 28, 2020
Thanks @certik, I agree with all points by @nshaffer, the above API seems sensible.
I would also add that may be useful to integrate not only real but also complex functions
Other decisions (infinite limits, f(x, *args)) depend on other issues.
Regarding the implementation, besides wrapping, I've been doing some refactoring of the QUADPACK routines. Probably there are some things to modify but I could help to put them in the preferred way starting from the version already in
https://github.com/numericfor/numfor/tree/master/src/integrate/quadpack
certik commentedon Jan 28, 2020
@fiolj thank you. Looking forward!
@nshaffer it looks like you will get an agreement on the API, so it seems the next step is to create an implementation and then we can discuss the details once you send a PR.
nshaffer commentedon Jan 29, 2020
@fiolj That is an impressive effort! I don't have the time right now to look carefully through the repo, but it looks like there are good ideas in there I haven't considered yet. Is it the case that the real "meat" of your computational routines lives inside
cpp
include files? Do you think those include files will be easy to reuse with a different preprocessor, or are they written in a verycpp
-reliant way?@certik I will get the ball rolling with a PR that includes the simplest things. For features we're still working out (e.g., adaptive quadrature), is it OK to have stub functions that just illustrate the proposed API?
certik commentedon Jan 29, 2020
I think so. It's about getting the ball rolling.
fiolj commentedon Jan 29, 2020
@nshaffer I was thinking on starting a project just like this...
I think that we can use my reformatted quadpack routines as a starting point if we agree to it. I didn't want to depend on a different program and just used cpp but using fypp will make the implementation really much easier, and I think it will be quite straightforward the change. Though I haven't tried to use it yet.
It would be good if you can start with the proposal for the API
fiolj commentedon Feb 4, 2020
Sorry about the noise, but rethinking the API for trapz (and may be simps) may be it would make sense to implement those functions for arrays of rank larger than one, like Numpy trapz.
Looking to the implementation of stats_mean, I think we could do something similar.
@nshaffer is working in an implementation, thoughts on this?
I was thinking that may be we could use something like this for the interface:
and, similarly for the implementation:
nshaffer commentedon Feb 4, 2020
@fiolj I've been thinking about that too. I support
trapz
(andsimps
) being rank-generic. In turn,trapz_weights
should also be rank-generic, I would think. The use case is obvious.In general, though, I think we should be judicious about which routines we make rank-generic, at least for now. Implementing generic ranks makes the source very difficult to read. One also has a great many more tests to write (in principle).
We are still in early stages of developing, trying different APIs and such. I think it is not a good use of our time and effort to add generic ranks to all routines that could conceivably take in a multi-dimensional array (but which cannot be made elemental). Again, for
trapz
and friends, the use case is obvious.Even so, in my initial PR, I intend to only show the 1-d API. The code will be easy to read and review. With that in place, the subsequent PR (yours, if you like) to implement generic ranks will be easier to understand compared with doing it all in one shot.
fiolj commentedon Feb 5, 2020
I agree with that strategy. I just didn't think on doing trapz and simps rank-generic at the time we discussed the interface, and wanted to open it for discussion.
I still am not sure what is the best strategy for collaboration on medium-sizes branches like this. I don't know if it is possilbe/desirable to update two forks in parallell and make intermediate PRs between those, or just make several PRs with small pieces of code to the main repository.
nshaffer commentedon Feb 7, 2020
Thinking on it some more, there are some API questions we need to resolve before implementing generic kinds for
trapz
and such. Before getting into the details, I am thinking that the most general API for, say,trapz
would beresult = trapz(y, dx, dim [, mask])
result = trapz(y, dx [, mask])
result = trapz(y, x, dim [, mask])
result = trapz(y, x [, mask])
dim
is present, the rank of the result is one less than the rank ofy
.dx
is either a scalar or, ifrank(y) >= 2
, thendx
may instead be a 1-d array withsize(dx) == rank(y)
.x
has the same shape asy
.With that established, here are two questions I'd like opinions on:
dim
is absent what should be the result? Most intrisic reductions (sum
,maxval
, etc.) produce a scalar when nodim
is specified. In the context of integration, that means performing a multi-dimensional integral. To me this is the natural thing to do, but I am open to other ideas.mask
do? In the intrinsic reduction functions, it's obvious. Here, I can think of three reasonable approaches:y
where the mask is.false.
as if they were zero. This parallels howsum
behaves, but I do not think it would be useful.mask = [.true., .true., .false., .true., .true. true.]
, thentrapz(y, x, mask)
would be equivalent totrapz(y(1:2), x(1:2)) + trapz(y(4:6), x(4:6))
. This is more useful, I think, but it has the potential to get complicated in higher dimensions.fiolj commentedon Feb 20, 2020
I like your API and implementation as it is. Regarding your two questions:
When
dim
is absent, Numpy integrates along the last dimension. In this case I like your idea of multidimensional integration and returning a scalar better.On the use of
mask
:I think the combinations of option 2 are too many to make something universally useful and we, at least at this stage, keep it simple. I would not support masked integration now.
milancurcic commentedon Mar 11, 2021
Closing this issue as implemented, with follow up active threads being #277 and #313.