Crate local state for procedural macros? #44034

New issue

Open

Crate local state for procedural macros?#44034

Labels

A-macrosA-proc-macrosC-feature-requestT-lang

LukasKalbertodt

Member

I'm tinkering a bit with procedural macros and encountered a problem that can be solved by keeping state in between proc macro invocations.

Example from my real application: assume my proc-macro crate exposes two macros: config! {} and do_it {}. The user of my lib is supposed to call config! {} only once, but may call do_it! {} multiple times. But do_it!{} needs data from the config!{} invocation.

Another example: we want to write a macro_unique_id!() macro returning a u64 by counting internally.

How am I supposed to solve those problems? I know that somewhat-global state is usually bad. But I do see applications for crate-local state for proc macros.

added

Contributor

Statics and thread-locals should both be safe to use as the proc-macro crate is loaded dynamically and remains resident for the duration of the macro-expansion pass for the current crate (each crate gets its own compiler invocation). This is not necessarily stable as eventually we want to load proc-macros as child processes instead of dynamic libraries, but I don't see why they wouldn't be kept alive for the duration of the crate's compilation run anyway.

durka

Contributor

@abonander I don't think this is reliable for two reasons:

Proc macros may not be run on every compilation, for instance if incremental compilation is on and they are in a module that is clean
There is no guarantee of ordering -- if do_it! needs data from all config! invocations, that's a problem.

matprec

Contributor

Adressing ordering

Declare dependency of macros to enable delaying of macro execution. In practical terms, think of macro Foo and Bar. By declaring macro Bar depending on Foo, all invocations of Foo must complete before any invocation of Bar.
E.g.

#[proc_macro_derive(Foo)]
pub fn foo(input: TokenStream) -> TokenStream {
    ...
}
#[proc_macro_derive(Bar, depends_on(Foo))]
pub fn bar(input: TokenStream) -> TokenStream {
    ...
}

Adressing incremental compilation

A persistant storage, maybe a web-like "local storage", per proc-macro-crate? This would store and load a byte array, which the user could (de-)serialize with e.g. serde fn set_state(Vec<u8>), fn get_state() -> Vec<u8>
Don't know about access though, how would it be provided to the proc macro crate? Global Memory? Wrapped in a Mutex?

Emerging questions

Could a storage system be implemented in a crate? Assuming cargo project layout, store serde state in the target folder as files?
How much state should macros have, if at all? Should the same macro have access to state from previous invocations?
- Pro
  - error message if variable name already in use
  - shared counter
- Contra
  - Harder invalidation of artefacts because now, in worst case, every macro of that kind has to be reinvoked to reach the state when doing semantically equivalent changes like reformatting

Thermatix

Has there been any movement on this issue?

oli-obk

Contributor

A persistant storage, maybe a web-like "local storage", per proc-macro-crate?

The problem with such a scheme is that invocations of the same macro are still not ordered, so you can easily end up in a situation where the order of invocation changes the result.

If an ordering scheme between proc macros is implemented, one could consider giving bar read access to foo's storage. Though, this would depend on foo only ever being called once per crate (due to the output of multiple foo calls having unspecified order).

Thermatix

Perhaps I'm missing some nuance or information but, what if when you defined local storage you also had to define any and all files affected by this? Then when you re-compiled it would then scan those files for changes and re-compile as appropriate?

Whilst I'm all for the idea of automating where possible, the advantage is that you can now get a clear list of files involved, and it would provide some working functionality that would provide what this issue is trying to solve, even if it's not perfect, so long as it's reasonably ergonomic (despite having to list the files) it should be good enough.

Yes, you do have to define each file that gets affected, but no doubt that there is a way to automate even that.

eggyal

mentioned this

on Aug 26, 2019

Enable exported Rust structs to specify prototypal inheritance rustwasm/rfcs#11

bowenwang1996

mentioned this

on Sep 24, 2019

Basic contract metadata proposal near/NEPs#3

filmor

mentioned this

on Oct 18, 2019

Proposal: attribute to generate pre-encoded NIF functions rusterlium/rustler#254

ZNackasha

I would love to have this future to solve the PyO3 add_wrapped requirement.
https://github.com/PyO3/pyo3

andrewreds

A different approach. What do people think?

Stateful Macros

(I need better names, syntax etc. But the idea should be there)

Have a "stateful macro", which is a compile time struct.

Things run in 3x main steps:

The stateful macro object is initialized with the new!() call

// (May need to be tagged for compiler parsing reasons)
const db: postgresql::DBMacro = postgresql::new!(version => 42, schema => "cats.sql")

The stateful macro object can have procedural macros called off it

fn count_cats(con: DBConnection, cuteness: u64) -> u64 {
    // sql! is able to read from data parsed in from new!
    // eg, could type check the database query
    // if a field in db is a set, then it can be added to, but not read from
    // otherwise it can be read from, but not written to
    db.sql!(con, select count(*) from cats where cuteness >= $cuteness)
}

The macro object can have "delayed symbols"

fn main() {
    let con = postgresql::Connection::new('10.42.42.42');

    // all_queries is generated from a function that runs in a later stage
    // of the compilation, and gets linked in like normal.
    // It must have a fixed type signature (can change biased on params to new!)
    // The function is able to read from all of db's variables, but write to none
    con.prepare(db.all_queries!);
    // expanded to: con.prepare(magic_crate_db::all_queries);

    // use the preprocessed query
    println!("There are {} extra cute cats!", count_cats(con, 4242));
}

note: Changes to the values of delayed symbols don't require recompilation of the crate using it.

Another example: for the sql! macro, it may want a unique id, that it can reference this query to the db by. The sql! macro could insert a symbol who's content is the position the query got inserted into the set (evaluated in stage 3). The sql! macro would not be able to see the contents of this symbol, but can inject it into the code.

compiler wise, crates are compiled independently like normal till just before they need to be linked. The compiler would then:

aggregate the sets together
grab a list of all delayed symbols
dynamically build a new crate by calling functions within the stateful macro
- parsing in the stateful macro's state to work out the content of the delayed symbols
compile & link this crate in like normal

Use cases

Handler subscriptions (web router, command line flags, event handler etc)
Plugin architecture
Preprocessing rust data-structures
End user configurable macros

Addressing points raised

'somewhat-global state is usually bad'
- This design is more like standard rust objects, then global state
- If you want 'global' state, you call new! once, and import it to all of your crates
- If you want fine control, you call new! multiple times (even within the same file)
When do you recompile things?
- Stage 2 macros need to get recompiled if changes to new! changes any of the values within the object
- Slow, but rare
Stage 3 magic crate also needs to get recompiled if any of the sets change
- Fast (as it is only recompiling a handful of functions)
Both of these could use hashes to detect if recompilation is needed
- Lazy solution: Recompile stage 2 if the file containing new! is changed. Recompile stage 3 on every compile.
Ordering issues
- Data can only flow into a later stage
- new! can only be called once per instantiation
- Sets are order free

Other points of note

This could get split into 2x parts
I think this is a much more in-depth change then what other people were saying in this thread
I've only written down code from the macro users perspective. I'm after peoples thoughts before looking at the harder side

steveklabnik

Member

To me, it feels like:

properly supporting this feature means adding a new API
a new API would have a large surface area
this means that this should go through the RFC process.

38 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

A-macrosA-proc-macrosC-feature-requestT-lang

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crate local state for procedural macros? #44034

Adressing ordering

Adressing incremental compilation

Emerging questions

Stateful Macros

Use cases

Addressing points raised

Other points of note

38 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Crate local state for procedural macros? #44034

Description

Activity

abonander commented on Mar 8, 2018

durka commented on Mar 8, 2018

matprec commented on May 28, 2018

Adressing ordering

Adressing incremental compilation

Emerging questions

Thermatix commented on Apr 25, 2019

oli-obk commented on Apr 25, 2019

Thermatix commented on Aug 9, 2019

ZNackasha commented on Dec 20, 2019

andrewreds commented on Mar 20, 2020

Stateful Macros

Use cases

Addressing points raised

Other points of note

steveklabnik commented on Mar 22, 2020

38 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions