Skip to content

Crate local state for procedural macros? #44034

Open
@LukasKalbertodt

Description

@LukasKalbertodt
Member

I'm tinkering a bit with procedural macros and encountered a problem that can be solved by keeping state in between proc macro invocations.

Example from my real application: assume my proc-macro crate exposes two macros: config! {} and do_it {}. The user of my lib is supposed to call config! {} only once, but may call do_it! {} multiple times. But do_it!{} needs data from the config!{} invocation.

Another example: we want to write a macro_unique_id!() macro returning a u64 by counting internally.


How am I supposed to solve those problems? I know that somewhat-global state is usually bad. But I do see applications for crate-local state for proc macros.

Activity

added
A-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
C-feature-requestCategory: A feature request, i.e: not implemented / a PR.
T-langRelevant to the language team
on Aug 25, 2017
abonander

abonander commented on Mar 8, 2018

@abonander
Contributor

Statics and thread-locals should both be safe to use as the proc-macro crate is loaded dynamically and remains resident for the duration of the macro-expansion pass for the current crate (each crate gets its own compiler invocation). This is not necessarily stable as eventually we want to load proc-macros as child processes instead of dynamic libraries, but I don't see why they wouldn't be kept alive for the duration of the crate's compilation run anyway.

durka

durka commented on Mar 8, 2018

@durka
Contributor

@abonander I don't think this is reliable for two reasons:

  1. Proc macros may not be run on every compilation, for instance if incremental compilation is on and they are in a module that is clean

  2. There is no guarantee of ordering -- if do_it! needs data from all config! invocations, that's a problem.

matprec

matprec commented on May 28, 2018

@matprec
Contributor

Adressing ordering

Declare dependency of macros to enable delaying of macro execution. In practical terms, think of macro Foo and Bar. By declaring macro Bar depending on Foo, all invocations of Foo must complete before any invocation of Bar.
E.g.

#[proc_macro_derive(Foo)]
pub fn foo(input: TokenStream) -> TokenStream {
    ...
}
#[proc_macro_derive(Bar, depends_on(Foo))]
pub fn bar(input: TokenStream) -> TokenStream {
    ...
}

Adressing incremental compilation

A persistant storage, maybe a web-like "local storage", per proc-macro-crate? This would store and load a byte array, which the user could (de-)serialize with e.g. serde fn set_state(Vec<u8>), fn get_state() -> Vec<u8>
Don't know about access though, how would it be provided to the proc macro crate? Global Memory? Wrapped in a Mutex?

Emerging questions

  1. Could a storage system be implemented in a crate? Assuming cargo project layout, store serde state in the target folder as files?
  2. How much state should macros have, if at all? Should the same macro have access to state from previous invocations?
    • Pro
      • error message if variable name already in use
      • shared counter
    • Contra
      • Harder invalidation of artefacts because now, in worst case, every macro of that kind has to be reinvoked to reach the state when doing semantically equivalent changes like reformatting
Thermatix

Thermatix commented on Apr 25, 2019

@Thermatix

Has there been any movement on this issue?

oli-obk

oli-obk commented on Apr 25, 2019

@oli-obk
Contributor

A persistant storage, maybe a web-like "local storage", per proc-macro-crate?

The problem with such a scheme is that invocations of the same macro are still not ordered, so you can easily end up in a situation where the order of invocation changes the result.

If an ordering scheme between proc macros is implemented, one could consider giving bar read access to foo's storage. Though, this would depend on foo only ever being called once per crate (due to the output of multiple foo calls having unspecified order).

Thermatix

Thermatix commented on Aug 9, 2019

@Thermatix

Perhaps I'm missing some nuance or information but, what if when you defined local storage you also had to define any and all files affected by this? Then when you re-compiled it would then scan those files for changes and re-compile as appropriate?

Whilst I'm all for the idea of automating where possible, the advantage is that you can now get a clear list of files involved, and it would provide some working functionality that would provide what this issue is trying to solve, even if it's not perfect, so long as it's reasonably ergonomic (despite having to list the files) it should be good enough.

Yes, you do have to define each file that gets affected, but no doubt that there is a way to automate even that.

ZNackasha

ZNackasha commented on Dec 20, 2019

@ZNackasha

I would love to have this future to solve the PyO3 add_wrapped requirement.
https://github.com/PyO3/pyo3

andrewreds

andrewreds commented on Mar 20, 2020

@andrewreds

A different approach. What do people think?

Stateful Macros

(I need better names, syntax etc. But the idea should be there)

Have a "stateful macro", which is a compile time struct.

Things run in 3x main steps:

  1. The stateful macro object is initialized with the new!() call
// (May need to be tagged for compiler parsing reasons)
const db: postgresql::DBMacro = postgresql::new!(version => 42, schema => "cats.sql")
  1. The stateful macro object can have procedural macros called off it
fn count_cats(con: DBConnection, cuteness: u64) -> u64 {
    // sql! is able to read from data parsed in from new!
    // eg, could type check the database query
    // if a field in db is a set, then it can be added to, but not read from
    // otherwise it can be read from, but not written to
    db.sql!(con, select count(*) from cats where cuteness >= $cuteness)
}
  1. The macro object can have "delayed symbols"
fn main() {
    let con = postgresql::Connection::new('10.42.42.42');

    // all_queries is generated from a function that runs in a later stage
    // of the compilation, and gets linked in like normal.
    // It must have a fixed type signature (can change biased on params to new!)
    // The function is able to read from all of db's variables, but write to none
    con.prepare(db.all_queries!);
    // expanded to: con.prepare(magic_crate_db::all_queries);

    // use the preprocessed query
    println!("There are {} extra cute cats!", count_cats(con, 4242));
}

note: Changes to the values of delayed symbols don't require recompilation of the crate using it.

Another example: for the sql! macro, it may want a unique id, that it can reference this query to the db by. The sql! macro could insert a symbol who's content is the position the query got inserted into the set (evaluated in stage 3). The sql! macro would not be able to see the contents of this symbol, but can inject it into the code.

compiler wise, crates are compiled independently like normal till just before they need to be linked. The compiler would then:

  • aggregate the sets together
  • grab a list of all delayed symbols
  • dynamically build a new crate by calling functions within the stateful macro
    • parsing in the stateful macro's state to work out the content of the delayed symbols
  • compile & link this crate in like normal

Use cases

  • Handler subscriptions (web router, command line flags, event handler etc)
  • Plugin architecture
  • Preprocessing rust data-structures
  • End user configurable macros

Addressing points raised

  • 'somewhat-global state is usually bad'
    • This design is more like standard rust objects, then global state
    • If you want 'global' state, you call new! once, and import it to all of your crates
    • If you want fine control, you call new! multiple times (even within the same file)
  • When do you recompile things?
    • Stage 2 macros need to get recompiled if changes to new! changes any of the values within the object
    • Slow, but rare
  • Stage 3 magic crate also needs to get recompiled if any of the sets change
    • Fast (as it is only recompiling a handful of functions)
  • Both of these could use hashes to detect if recompilation is needed
    • Lazy solution: Recompile stage 2 if the file containing new! is changed. Recompile stage 3 on every compile.
  • Ordering issues
    • Data can only flow into a later stage
    • new! can only be called once per instantiation
    • Sets are order free

Other points of note

  • This could get split into 2x parts
  • I think this is a much more in-depth change then what other people were saying in this thread
  • I've only written down code from the macro users perspective. I'm after peoples thoughts before looking at the harder side
steveklabnik

steveklabnik commented on Mar 22, 2020

@steveklabnik
Member

To me, it feels like:

  1. properly supporting this feature means adding a new API
  2. a new API would have a large surface area
  3. this means that this should go through the RFC process.

38 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)A-proc-macrosArea: Procedural macrosC-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-langRelevant to the language team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @casey@steveklabnik@durka@shepmaster@Eliah-Lakhin

        Issue actions

          Crate local state for procedural macros? · Issue #44034 · rust-lang/rust