change how MIR inlining handles cycles #43542

New issue

Closed

change how MIR inlining handles cycles#43542

Labels

A-MIRA-incr-compA-mir-optA-mir-opt-inliningT-compiler

nikomatsakis

Contributor

When doing the "query-ification" of MIR, we handled MIR inlining by doing a "try-get", meaning that we attempt to inline all callees but ignore if a cycle arises. This is OK but leads to nondeterministic results where the order of queries in a SCC can affect what gets optimized into what. On #42633, I wrote:

MIR inlining uses try_get to handle cycles in the call graph. This was a clever idea but I think ultimately not the right way -- it makes things nondeterministic. Imagine that you have foo() which calls bar() which calls foo(). If we start by asking for the optimized MIR for foo, it can inline bar, but bar will not inline foo. However, if we start by asking for the optimized MIR for bar, you get the opposite result. I think you should get the same result whichever optimized MIR you ask for first.

The workaround that I think we should be doing is to have a separate query (let's call it MIR-call-graph or something) and then having the inlining passes all request this call-graph. The call-graph would just walk the MIR for everything. This isn't great for responsiveness, but this call-graph would only be used when we are doing inlining and MIR optimization, so that's probably not an IDE use case, nor is it a prime incremental use-case. (You can make incremental do better by projecting out the SCC for particular def-ids, and then red-green can observe that they have not changed.)

We could probably do better here but I think for now this would be fine, and it's not clear that we need to do better ever.

And @michaelwoerister later wrote:

As a side-note regarding inlining: With #43183, the trans-collector builds up a fairly accurate, global "call graph" between translation items. This could be useful for inlining too. It is post-monomorphization though, which is good for the quality of information it can provide, but which also means that it is a bit late in the pipeline.

This issue is intending to track the proposed refactoring of the MIR inlining pass to not use try-get.

added

ContributorAuthor

cc @matthewhammer, who may take this on

eddyb

Member

What I came up with previously was not caching the result of queries which observed a cycle via try_get, although I'm not sure how many recomputations that would result in or if it's correct.

added

mentioned this

rustc: Start moving toward "try_get is a bug" for incremental #44071

matthewhammer

Above, @nikomatsakis outlines two problems (the way that I view it now, at least)

Inlining decisions are determined by incidental query-order details
Inlining decisions are, under the hood, using the query framework's cycle detection mechanism

I've talked to @nikomatsakis a couple of times about this since he created this issue. In my view, the first problem and the second problem can be decoupled, where we solve the first problem without necessarily eschewing the query framework's cycle detection mechanism. (Either way would probably work, I think, and I don't have strong feelings about the choice).

In particular, I recently proposed a solution to the first problem that would require knowing some SCC information, and would use that to determine inline decisions at callsites, in place of (solely) using the cycle detection mechanism.

Restating the problem

I'd rephrase the first problem above as follows:

The current inlining logic is "asymmetric", in that as it makes inlining decisions, it favors one root for each SCC, rather than favoring each SCC root equally.

My proposal for inlining assumes that we want to specialize each SCC of "inline" nodes for each root of this SCC. This may or may not be what we actually want, since it will lead to lots of code specialization (maybe good, maybe not).

Tiny example

To illustrate the idea, suppose we have an SCC with two inline functions, f and g, which mutually call each other, are each roots of the SCC. Specifically, they are SCC roots in the sense that f and g are each called by non-inline functions that are outside of the SCC (f and g do not call these inter-SCC functions, directly or transitively).

The current approach has the issue that either f or g will be processed first, making a single choice of "root" for the SCC, which will not reflect the fact that f and g are both roots of the SCC. One may call this "non-deterministic" (as @nikomatsakis does above); I think term makes sense.

I'd also offer the term "asymmetric", in the sense that every root of the SCC should get its own "view" of the SCC, rather than choosing a single root (as determined by the larger compilation context, with its various queries and their dependencies).

Specialize MIR for each SCC root

My proposal addresses this asymmetry issue by assuming that we want to specialize each SCC of "inline" nodes for each root of this SCC.

In our running example, we'd generate two versions of the single SCC involving f and g, where the versions differ in that one usesf as the root, and the other uses g as the root. In the version where f is the root, the calls from g to f would not be inlined, but the calls from f to g would be inlined. In the version where g is the root, the situation is reversed.

And in general, the per-SCC-root approach is as follows: For each SCC, for each root, we generate one inlined MIR; so during inlining any MIR, have a root function that gives the call edges of the SCC a "direction" (e.g., via a topological sort from this root), allowing us to distinguish the “backward” call edges which are not inlined from the (inlined) “forward” call edges.

In general, for an SCC of n nodes marked as inline, with m roots, we'd generate nm nodes of output from the inlining these nodes.

Include SCC root among query key

My current thinking is that the inlining approach above (the "per-SCC-root approach") can be implemented in two different ways. In both ways, we key the queries for inlined output by at least two DefIDs:

the “source” MIR being transformed (e.g., containing callsites to be inlined) and
the “root” MIR that is marked as “inline”, if any (could be None).

That is to say, we compute inlined output in reference to a "root" inline function of the inlined code. This additional root parameter addresses the asymmetry issue, as explained above in the example. In sum, for each root we get a different view of the SCC.

Given this definition for the (rooted) queries, I see two implementation strategies:

Use the SCC graph to determine the SCCs, their roots, and how to inline each call. No need to use cycle detection any longer to make decisions; all decisions stem from the SCC graph.
Do not use or compute an SCC graph; The additional "root" DefID of each query addresses the asymmetry issue, so it is okay to use cycle detection (not an SCC graph) to avoid inlining cyclic calls.

matthewhammer

My main questions for @nikomatsakis and others:

Does this per-SCC-root approach make sense?
If so, which implementation approach makes the most sense in the near term?

matthewhammer

Example from above, as a figure:

hanna-kruppe

Contributor

This is an interesting strategy. It certainly avoids the non-determinism that is the main problem. I don't know if it's a good optimization strategy, though. If inlining potentially duplicates the functions in an SCC once per "root", then that should probably be reflected in the cost metric for inlining. However, I don't see how this can be done in the "go depth-first, use cycle recovery, but include the root in the query key" implementation (since the SCC is never explicitly computed).

Furthermore, even if the SCC is available, it seems somewhat tricky to predict what inlining decisions other, separate runs of the inlining pass (starting from a different root) will perform. (This information would presumably be needed for the cost metric.) Of course, each query could run the same analyses that those other invocations would already perform, but that would duplicate a ton of work. Perhaps this is just my lack of imagination speaking, but I fear that going through SCC multiple times, from different starting points, will just make it harder to make good inlining decisions.

(Besides, if the SCC is available, there's no particular need to innovate on this. There are other strategies for dealing with SCCs that are more battle-tested.)

bors

added a commit that references this issue

on Aug 26, 2017

Auto merge of #44071 - alexcrichton:no-cycles, r=nikomatsakis

e7070dd

nikomatsakis

ContributorAuthor

@rkruppe

However, I don't see how this can be done in the "go depth-first, use cycle recovery, but include the root in the query key" implementation (since the SCC is never explicitly computed).

The whole goal is to move away from that strategy -- it is nondeterministic, and cycle recovery is problematic anyhow (we would like the invariant that "cycle detected => compile fails eventually"). My current plan is to do the following:

compute the SCC as a query, starting from the "pre-inlining" MIR (this will wind up forcing the MIR for every function in the crate, unfortunately, but I don't see much of an alternative -- we could possibly have queries for computing the "SCC as seen from X", but it'd be complicated to share results then);
when doing inlining, we check that the callee is not part of our SCC, and inline them only in that case (this gives a tree structure).

I think that @matthewhammer's scheme fits into this by adding, essentially, another layering of inlining that happens afterwards. That is, what I described above yields a result that is effectively the sort of "bottom-up" inlining that we are trying to do now. If MIR0 is the pre-inlining representation (from which the SCC is computed), then we have now built MIR1. We can then imagine inlining within an SCC -- that would be inlining MIR1 into MIR1, to yield MIR2.

Furthermore, even if the SCC is available, it seems somewhat tricky to predict what inlining decisions other, separate runs of the inlining pass (starting from a different root) will perform.

This, however, may well be true!

hanna-kruppe

Contributor

@nikomatsakis re: cycle recovery, I was referring to this part (one of two possible implementation strategies) of @matthewhammer's post:

Do not use or compute an SCC graph; The additional "root" DefID of each query addresses the asymmetry issue, so it is okay to use cycle detection (not an SCC graph) to avoid inlining cyclic calls.

This would be deterministic, but would still use cycle recovery (in a bening way). There might be other reasons to remove cycle recovery entirely, then this implementation strategy is undesirable anyway. I was just saying, even if this implementation strategy remains viable, it may not give very good results.

when doing inlining, we check that the callee is not part of our SCC, and inline them only in that case (this gives a tree structure).

Aside to be clear: it is absolutely possible (and profitable) to inline within an SCC. LLVM does this sometimes. It may not be as easy (or at all possible) to achieve with the "query optimized MIR for other functions" model, and it may not be necessary or very useful for MIR optimizations, but in theory it's possible.

10 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

A-MIRA-incr-compA-mir-optA-mir-opt-inliningT-compiler

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!