Skip to content

rewrite liveness analysis to be based on MIR #51003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikomatsakis opened this issue May 23, 2018 · 20 comments · May be fixed by #142390
Open

rewrite liveness analysis to be based on MIR #51003

nikomatsakis opened this issue May 23, 2018 · 20 comments · May be fixed by #142390
Assignees
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

The current liveness code does a simple liveness computation (actually a few such things) and tells you when e.g. assignments are dead and that sort of thing. It does this on the HIR. It would be better to do this on the MIR — in fact, the NLL computation is already computing liveness across all of MIR, so we ought to be able to piggy back on those results I imagine.

It may be a good idea to wait though until the MIR borrowck stuff "settles down" a bit before attempting this.

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll labels May 23, 2018
@nikomatsakis nikomatsakis added C-cleanup Category: PRs that clean code up or issues documenting cleanup. and removed WG-compiler-nll labels Jul 3, 2018
@nikomatsakis
Copy link
Contributor Author

I'm removing WG-compiler-nll as this isn't really related to NLL, it's just general cleanup. Here are some tips into the code for future reference:

The liveness code I am referring to, which ought to be ported:

pub fn check_crate<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>) {

The MIR-based liveness computation:

pub fn liveness_of_locals<'tcx>(mir: &Mir<'tcx>, mode: LivenessMode) -> LivenessResult {

@cjgillot
Copy link
Contributor

Is this issue still relevant? Can I pick it up?

@matthewjasper
Copy link
Contributor

It's still relevant. Feel free to ping me with questions.

@jonas-schievink jonas-schievink added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Apr 10, 2020
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented May 10, 2020

I looked into this this morning. A few roadblocks remain.

For one, the MIR has no record of statements like let _ = x even before SimplifyCfg. More complex expressions seem to be lowered even when they are assigned to _, so I think the solution is to look for an assignment of an ExprKind::VarRef specifically and lower it to a FakeRead of some kind.

There's also the issue of associating Locals with their HirIds for diagnostics. This was attempted in #51275, but it caused issues with incremental compilation. That PR worked around the issue by computing the spans needed for MIR borrowck errors ahead-of-time. I think a better solution would be to stop including VarBindingInfo in the StableHash of a MIR body. I don't fully understand the implications of ignoring VarBindingInfo for the purposes of incremental. I think it's okay as long as we don't use the HirId to determine whether to emit a lint/error, only to associate it with a Span. This suggests that we need to precompute the level for the "unused variables" lint. Alternatively, we could duplicate the approach in #51275, but that won't scale as more checking is moved to the MIR.

Finally, I was seeing spurious warnings for variables that were bound in a match arm and only used in a guard. Presumably this will also require some changes to MIR building, but I've not investigated this one yet.

Since I don't have a good intuition for incremental compilation or MIR building, I would appreciate any advice on solving either of these problems. @pnkfelix, this was a while ago, but did you consider ignoring VarBindingInfo during stable hashing in #51275?

@matthewjasper
Copy link
Contributor

The only reason Spans work while HirIds don't is because the incremental tests make stable hashing ignore spans. I think just using HirIds and clearing the VarBindingInfo after borrow checking so that it only affects unoptimized MIR would be fine.

@nikomatsakis
Copy link
Contributor Author

I think this agrees with what @matthewjasper said, but I want to encourage us not to do anything "clever" around incremental -- that is to say, we should not try to tweak hashing schemes or something (imo), we should hash all the data. If we don't want something to be hashed, the right way is to remove it from the struct itself (which should then prove we don't have a dependency on it).

That said, when it comes to spans, we've been talking for some time about a better scheme to handle spans and incremental compilation (see e.g. #47389), which seems related.

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented May 11, 2020

@nikomatsakis FWIW, my plan was to encapsulate this so that it would be clear to both contributors and reviewers when this HirId was used for something besides diagnostic spans or suggestions.

Regardless, switching VarBindingForm to be a single HirId seems to be somewhat contentious, and doing so is not necessary to fix this issue since we can encode the spans ahead of time like #51275 did. I'll open a separate issue.

@camsteffen
Copy link
Contributor

Is this something I could contribute to with some mentorship?

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Jan 25, 2022

I can answer specific questions on anything MIR related, but for the incremental comp stuff we discussed above you'll need to look elsewhere. I never found a solution I was satisfied with.

The other big stumbling block I remember was async fns, since the MIR we generate for them is more disconnected from the HIR than for ordinary fns; some things that appear as used in the HIR appear as unused in the MIR and vice versa. I don't think I tried particularly hard to solve this in #72164, so it's possible you can do better with not much effort.

I was seeing spurious warnings for variables that were bound in a match arm and only used in a guard. Presumably this will also require some changes to MIR building, but I've not investigated this one yet.

the MIR has no record of statements like let _ = x even before SimplifyCfg. More complex expressions seem to be lowered even when they are assigned to _,

These two issues are solved in #72164. See this comment for a discussion of the first.

It's also worth mentioning that with #91032, there's another big HIR dataflow analysis in addition to liveness. Going from 2 big analyses to 1 might be a less attractive proposition than going from 1 to 0.

@nikomatsakis
Copy link
Contributor Author

I'm of mixed minds here. I sometimes think that we should do our safety analyses on "THIR" instead of "MIR". MIR currently serves two masters, analysis and optimization -- I think that produces more tension than I anticipated initially.

@camsteffen
Copy link
Contributor

For some context I was hoping to work towards #65467.

@camsteffen
Copy link
Contributor

I'm of mixed minds here. I sometimes think that we should do our safety analyses on "THIR" instead of "MIR". MIR currently serves two masters, analysis and optimization -- I think that produces more tension than I anticipated initially.

I would think that the decision between MIR or THIR (or HIR) for analyses should be based on which one would allow the simplest (or cheapest?) implementation, and that any problems caused with incremental can be solved architecturally. It seems like THIR would allow a simpler implementation than the HIR (especially for #65467), but I don't have enough experience with MIR to comment on how that would compare.

My understanding of the "tension" is this: The MIR has insufficient information about the original code to do some analyses/diagnostics. This is fixed by copying more data from HIR to MIR. But that added data tends to break incremental compilation. So then we need to separate "data needed for codegen" from "data needed for analyses" and not hash the latter. This seems resolvable to me in my limited understanding. Though I wonder if the mere size of the "analyses data" needed should be enough to deter us from using the MIR?

@nikomatsakis
Copy link
Contributor Author

I don't really think this is about incremental compilation, @camsteffen. It's more about what invariants the MIR code must maintain. The simplest example is stripping dead code: it's convenient for MIR to not contain unreachable code, but that also means that any analysis will disregard such code. How smart can MIR be in removing dead code? Can it do CSE? Constant propagation? etc

@camsteffen
Copy link
Contributor

Aren't those all optimizations that happen after lowering, and so analyses can happen in between? (thanks for the discussion)

@bjorn3
Copy link
Member

bjorn3 commented Jan 31, 2022

I believe something like let _ = *foo; results in no MIR at all even before optimizations. Doing the equivalent of let tmp = *foo; let _ = tmp; wouldn't be possible as that would move out of *foo, while let _ = *foo; doesn't.

@Nadrieril
Copy link
Member

let _ = *foo; results in a PlaceMention in MIR now.

@scottmcm
Copy link
Member

scottmcm commented Jun 6, 2025

Not speaking for the team, but this still makes sense to me. The discussions we had that led to unsafeck on THIR I think also clearly lead to saying that initialization checking ought to happen on MIR: initialization is flow-sensitive, not lexical. And that seems to always be clearest when done on the MIR analysis CFG (with the PlaceMentions and FalseEdges and such), especially as we add more complex control flow in the AST, like expanding conditionals in patterns and such.

@Nadrieril
Copy link
Member

@dianne was just telling me that liveness for guard patterns was tricky and that having it in MIR would be much easier.

@dianne
Copy link
Contributor

dianne commented Jun 9, 2025

It's an issue for normal match guards too; the HIR liveness analysis doesn't approximate match control flow correctly. e.g. (playground link)

let mut x = 0;
match () {
    #[expect(unused_assignments)]
    _ if { x = 1; false } => {}
    _ => assert_eq!(x, 1),
}

It treats each match arm separately, regarding control as flowing from the guard to only the arm body, without accounting for failure. To approximate it correctly on the HIR, I think we'd either need to model or-pattern expansion or treat match guards over or-patterns like loops and compute a fixed point; either way, it's substantial added complexity for marginal gain, which I imagine is why it's still wrong.

@cjgillot
Copy link
Contributor

cjgillot commented Jun 9, 2025

I have a prototype in #101500, if anyone wants to pick it up. The main blocker was handling of uninhabited types, and how to port the current behaviour.

rust-bors bot added a commit that referenced this issue Jun 13, 2025
Perform unused assignment and unused variables lints on MIR.

Rebase of #101500

Fixes #51003.

The first commit moves detection of uninhabited types from the current liveness pass to MIR building.

In order to keep the same level of diagnostics, I had to instrument MIR a little more:
- keep for which original local a guard local is created;
- store in the `VarBindingForm` the list of introducer places and whether this was a shorthand pattern.

I am not very proud of the handling of self-assignments. The proposed scheme is in two parts: first detect probable self-assignments, by pattern matching on MIR, and second treat them specially during dataflow analysis. I welcome ideas.

Please review carefully the changes in tests. There are many small changes to behaviour, and I'm not sure all of them are desirable.

<!-- homu-ignore:start -->
<!--
If this PR is related to an unstable feature or an otherwise tracked effort,
please link to the relevant tracking issue here. If you don't know of a related
tracking issue or there are none, feel free to ignore this.

This PR will get automatically assigned to a reviewer. In case you would like
a specific user to review your work, you can assign it to them by using

    r? <reviewer name>
-->
<!-- homu-ignore:end -->
bors added a commit that referenced this issue Jun 13, 2025
Perform unused assignment and unused variables lints on MIR.

Rebase of #101500

Fixes #51003.

The first commit moves detection of uninhabited types from the current liveness pass to MIR building.

In order to keep the same level of diagnostics, I had to instrument MIR a little more:
- keep for which original local a guard local is created;
- store in the `VarBindingForm` the list of introducer places and whether this was a shorthand pattern.

I am not very proud of the handling of self-assignments. The proposed scheme is in two parts: first detect probable self-assignments, by pattern matching on MIR, and second treat them specially during dataflow analysis. I welcome ideas.

Please review carefully the changes in tests. There are many small changes to behaviour, and I'm not sure all of them are desirable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
10 participants