Description
When doing the "query-ification" of MIR, we handled MIR inlining by doing a "try-get", meaning that we attempt to inline all callees but ignore if a cycle arises. This is OK but leads to nondeterministic results where the order of queries in a SCC can affect what gets optimized into what. On #42633, I wrote:
MIR inlining uses try_get to handle cycles in the call graph. This was a clever idea but I think ultimately not the right way -- it makes things nondeterministic. Imagine that you have foo() which calls bar() which calls foo(). If we start by asking for the optimized MIR for foo, it can inline bar, but bar will not inline foo. However, if we start by asking for the optimized MIR for bar, you get the opposite result. I think you should get the same result whichever optimized MIR you ask for first.
The workaround that I think we should be doing is to have a separate query (let's call it MIR-call-graph or something) and then having the inlining passes all request this call-graph. The call-graph would just walk the MIR for everything. This isn't great for responsiveness, but this call-graph would only be used when we are doing inlining and MIR optimization, so that's probably not an IDE use case, nor is it a prime incremental use-case. (You can make incremental do better by projecting out the SCC for particular def-ids, and then red-green can observe that they have not changed.)
We could probably do better here but I think for now this would be fine, and it's not clear that we need to do better ever.
And @michaelwoerister later wrote:
As a side-note regarding inlining: With #43183, the trans-collector builds up a fairly accurate, global "call graph" between translation items. This could be useful for inlining too. It is post-monomorphization though, which is good for the quality of information it can provide, but which also means that it is a bit late in the pipeline.
This issue is intending to track the proposed refactoring of the MIR inlining pass to not use try-get.
Activity
nikomatsakis commentedon Jul 29, 2017
cc @matthewhammer, who may take this on
eddyb commentedon Jul 29, 2017
What I came up with previously was not caching the result of queries which observed a cycle via
try_get
, although I'm not sure how many recomputations that would result in or if it's correct.matthewhammer commentedon Aug 25, 2017
Above, @nikomatsakis outlines two problems (the way that I view it now, at least)
I've talked to @nikomatsakis a couple of times about this since he created this issue. In my view, the first problem and the second problem can be decoupled, where we solve the first problem without necessarily eschewing the query framework's cycle detection mechanism. (Either way would probably work, I think, and I don't have strong feelings about the choice).
In particular, I recently proposed a solution to the first problem that would require knowing some SCC information, and would use that to determine inline decisions at callsites, in place of (solely) using the cycle detection mechanism.
Restating the problem
I'd rephrase the first problem above as follows:
The current inlining logic is "asymmetric", in that as it makes inlining decisions, it favors one root for each SCC, rather than favoring each SCC root equally.
My proposal for inlining assumes that we want to specialize each SCC of "inline" nodes for each root of this SCC. This may or may not be what we actually want, since it will lead to lots of code specialization (maybe good, maybe not).
Tiny example
To illustrate the idea, suppose we have an SCC with two inline functions,
f
andg
, which mutually call each other, are each roots of the SCC. Specifically, they are SCC roots in the sense thatf
andg
are each called by non-inline functions that are outside of the SCC (f
andg
do not call these inter-SCC functions, directly or transitively).The current approach has the issue that either
f
org
will be processed first, making a single choice of "root" for the SCC, which will not reflect the fact thatf
andg
are both roots of the SCC. One may call this "non-deterministic" (as @nikomatsakis does above); I think term makes sense.I'd also offer the term "asymmetric", in the sense that every root of the SCC should get its own "view" of the SCC, rather than choosing a single root (as determined by the larger compilation context, with its various queries and their dependencies).
Specialize MIR for each SCC root
My proposal addresses this asymmetry issue by assuming that we want to specialize each SCC of "inline" nodes for each root of this SCC.
In our running example, we'd generate two versions of the single SCC involving
f
andg
, where the versions differ in that one usesf
as the root, and the other usesg
as the root. In the version wheref
is the root, the calls fromg
tof
would not be inlined, but the calls fromf
tog
would be inlined. In the version whereg
is the root, the situation is reversed.And in general, the per-SCC-root approach is as follows: For each SCC, for each root, we generate one inlined MIR; so during inlining any MIR, have a root function that gives the call edges of the SCC a "direction" (e.g., via a topological sort from this root), allowing us to distinguish the “backward” call edges which are not inlined from the (inlined) “forward” call edges.
In general, for an SCC of
n
nodes marked as inline, withm
roots, we'd generatenm
nodes of output from the inlining these nodes.Include SCC root among query key
My current thinking is that the inlining approach above (the "per-SCC-root approach") can be implemented in two different ways. In both ways, we key the queries for inlined output by at least two DefIDs:
That is to say, we compute inlined output in reference to a "root" inline function of the inlined code. This additional root parameter addresses the asymmetry issue, as explained above in the example. In sum, for each root we get a different view of the SCC.
Given this definition for the (rooted) queries, I see two implementation strategies:
matthewhammer commentedon Aug 25, 2017
My main questions for @nikomatsakis and others:
matthewhammer commentedon Aug 25, 2017
Example from above, as a figure:
hanna-kruppe commentedon Aug 25, 2017
This is an interesting strategy. It certainly avoids the non-determinism that is the main problem. I don't know if it's a good optimization strategy, though. If inlining potentially duplicates the functions in an SCC once per "root", then that should probably be reflected in the cost metric for inlining. However, I don't see how this can be done in the "go depth-first, use cycle recovery, but include the root in the query key" implementation (since the SCC is never explicitly computed).
Furthermore, even if the SCC is available, it seems somewhat tricky to predict what inlining decisions other, separate runs of the inlining pass (starting from a different root) will perform. (This information would presumably be needed for the cost metric.) Of course, each query could run the same analyses that those other invocations would already perform, but that would duplicate a ton of work. Perhaps this is just my lack of imagination speaking, but I fear that going through SCC multiple times, from different starting points, will just make it harder to make good inlining decisions.
(Besides, if the SCC is available, there's no particular need to innovate on this. There are other strategies for dealing with SCCs that are more battle-tested.)
Auto merge of #44071 - alexcrichton:no-cycles, r=nikomatsakis
nikomatsakis commentedon Sep 1, 2017
@rkruppe
The whole goal is to move away from that strategy -- it is nondeterministic, and cycle recovery is problematic anyhow (we would like the invariant that "cycle detected => compile fails eventually"). My current plan is to do the following:
I think that @matthewhammer's scheme fits into this by adding, essentially, another layering of inlining that happens afterwards. That is, what I described above yields a result that is effectively the sort of "bottom-up" inlining that we are trying to do now. If MIR0 is the pre-inlining representation (from which the SCC is computed), then we have now built MIR1. We can then imagine inlining within an SCC -- that would be inlining MIR1 into MIR1, to yield MIR2.
This, however, may well be true!
hanna-kruppe commentedon Sep 1, 2017
@nikomatsakis re: cycle recovery, I was referring to this part (one of two possible implementation strategies) of @matthewhammer's post:
This would be deterministic, but would still use cycle recovery (in a bening way). There might be other reasons to remove cycle recovery entirely, then this implementation strategy is undesirable anyway. I was just saying, even if this implementation strategy remains viable, it may not give very good results.
Aside to be clear: it is absolutely possible (and profitable) to inline within an SCC. LLVM does this sometimes. It may not be as easy (or at all possible) to achieve with the "query optimized MIR for other functions" model, and it may not be necessary or very useful for MIR optimizations, but in theory it's possible.
10 remaining items