Skip to content

What exactly token streams are passed to procedural macros 1.2 #50038

Closed
@petrochenkov

Description

@petrochenkov
Contributor

This is an issue that needs to be resolved before stabilization of "Macros 1.2".

Procedural macros that we are going to stabilize currently have two flavors - proc_macro and proc_macro_attribute.


proc_macro macros have signature fn(TokenStream) -> TokenStream and can be invoked with "bang" forms like this:

my::proc::macro!( TOKEN_STREAM )
my::proc::macro![ TOKEN_STREAM ]
my::proc::macro! { TOKEN_STREAM }

Only the TOKEN_STREAM part is passed to the macro as TokenStream, the delimiters (brackets) are NOT passed.

Why this is bad:

  • The macro doesn't know what delimiters it was invoked with.
    It was a part of Macro 2.0 promise to give macros control over delimiters in their invocations, so e.g. vec-like macros could require square brackets like vec![1, 2, 3] and reject other brackets.
    We should not prevent this kind of control being implemented in the future.

Why this is good:

  • Brackets are mostly not a part of the "useful payload" for the macro, they are there so macro invocations could be parsed unambiguously in many context in which they can appear - expressions, types, blocks, modules, etc, etc, etc.

proc_macro_attribute macros have signature fn(TokenStream, TokenStream) -> TokenStream and can be invoked with "attribute" forms like this:

#[my::proc::macro TOKEN_STREAM] TARGET
#![my::proc::macro TOKEN_STREAM] TERGET

TARGET is a trait/impl/foreign item, or a statement and it's passed to the macro as the second TokenStream argument, but we are not interested in it right now.

The TOKEN_STREAM part is passed to the macro as the first TokenStream argument, nothing is ignored.

Why this is bad:

  • It's not clear where the path ends and where the token stream starts.
    Something like #[a::b :: + -] seems to match the grammar, but is rejected right now because paths always parsed greedily so :: is interpreted as a path separator rather than a path of the token stream.
    Annoying questions arise with generic arguments in paths like #[a<>::b::c<u8>]. Technically this is a syntactically valid path and c having type arguments is rather a semantic error and the empty <> after the module a is not an error at all, but rigth now this attribute is interpreted as #[a /* <- PATH | TOKEN_STREAM -> */ <>::b::c<u8>].
    Ideally we'd like to avoid these questions completely and have an unambiguous delimiter.
  • It's not clear where the token stream ends.
    With plain #[attr TOKEN_STREAM] it's pretty clear - the stream ends before the ] (in this sense the situation is simpler than with bang macros), but things start breaking when other macros appear.
    macro m($meta1: meta, $meta2: meta) { ... }
    
    // No way to determine where the first attribute starts and the second attribute ends
    m!( a::b::c x , y , z , d::e::f u , v , w )
    So with this attribute syntax we can't support meta anymore!
  • It's not consistent with proc_macro macros. m!(a, b, c) does not include parentheses into the token stream, but #[m(a, b, c)] does.
  • I'm not actually sure people intend to stabilize this attribute syntax suddenly expanded from traditional forms (#[attr], #[attr(list)], #[attr = literal]) to being nearly unlimited (i.e. something like #[a::b::c e f + c ,,, ;_:] being legal) right now.

Proposed solution:

  • Stabilize proc_macro as is for "Macros 1.2".

  • In the future extend the set of proc_macro plugin interfaces with one more signature fn(TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations.

  • In the future possibly support bang macro invocations without delimiters for symmetry with attributes and because they may be legitimately useful (let x = MACRO_CONST!;, see https://internals.rust-lang.org/t/idea-elide-parens-brackets-on-unparametrized-macros/6527) (the Delimiter argument is Delimiter::None in this case).

  • Restrict attribute syntax accepted by proc_macro_attribute for "Macros 1.2" to

    // Symmetric with bang macro invocations
    #[my::proc::macro(TOKEN_STREAM)]
    #[my::proc::macro[TOKEN_STREAM]]
    #[my::proc::macro { TOKEN_STREAM }]
    // Additionally
    #[my::proc::macro]
    #[my::proc::macro = TOKEN_TREE]

    Or, more radically, do not stabilize the = syntax for procedural macros 1.2.
    This is not a fundamental restriction - arbitrary token streams still can be placed inside the brackets (#[a::b::c(e f + c ,,, ;_:)]).

  • The token stream passed to the macro DOES NOT include the delimiters.

  • In the future extend the set of proc_macro_attribute plugin interfaces with one more signature fn(TokenStream, TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations (the delimiter is Delimiter::None for both #[attr] and #[attr = tt] forms but they are still discernable by the token stream being empty or not).

Activity

petrochenkov

petrochenkov commented on Apr 18, 2018

@petrochenkov
ContributorAuthor
petrochenkov

petrochenkov commented on Apr 18, 2018

@petrochenkov
ContributorAuthor

In the future extend the set of ... plugin interfaces with one more signature ... that allows controlling delimiters used in macro invocations

Alternatives:

  • Change the signatures for proc_macro and proc_macro_attribute to include Delimiter before stabilization.
  • Do not change signatures, include delimiters into the token stream for proc_macro before stabilization.
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
A-decl-macros-2-0Area: Declarative macros 2.0 (#39412)
on Apr 18, 2018
abonander

abonander commented on Apr 18, 2018

@abonander
Contributor

I don't know if this deserves its own issue, but I think there should be a way for a proc_macro_attribute to ask what kind of AST node (crate, item, statement, or expression) its input represents.

  • attributes that only accept one kind could assert equality and error immediately instead of attempting to parse their expected kind
  • attributes that accept multiple kinds won't have to guess at what node kind they should attempt to parse
abonander

abonander commented on Apr 18, 2018

@abonander
Contributor

Actually #47786 is probably a better place to suggest that one.

alexcrichton

alexcrichton commented on Apr 18, 2018

@alexcrichton
Member

Thanks for bringing these issues up @petrochenkov! Your proposed solutions sounds pretty good to me, but I wanted to clarify a point or two as well.

For #[proc_macro] I'd be fine either requiring Delimiter today or adding a second signature down the road. I think I'd slightly prefer to have both options in the long run as most authors probably won't mind too much about what Delimiter is used, so I'd probably err on the side of leaving it as-is and possibly adding support for a new signature later on.

For #[proc_macro_attribute] I think it's a great idea to limit the syntax you can possibly work with today. The whitelisted syntaxes you proposed above sound good to me, and do you also think we should limit paths to just one element? (aka disallow #[foo::bar]).

I wanted to clarify, though, are you thinking the delimiter is dropped from the token stream going into #[proc_macro_attribute] as well? If we do that I think we would be required to stabilize and only support a signature that takes a Delimiter (to differentiate #[foo] and #[foo()]). I agree though that in these worlds removing the #[foo = bar] custom attribute is probably the best, and I don't think it'd be too hard to come up with alternate syntaxes for users today doing things like #[foo(baz = bar)].

abonander

abonander commented on Apr 18, 2018

@abonander
Contributor

@alexcrichton Absolute paths in attributes allow them to work at the crate root where they otherwise won't resolve due to scoping rules (#41430, attributes resolve in the parent module but the crate root has no parent). So unless we want to change the inner attribute form to resolve in the current module instead of the parent, absolute paths are the only way to call attributes at the crate root.

alexcrichton

alexcrichton commented on Apr 18, 2018

@alexcrichton
Member

@abonander ah true yeah, but the first pass of stabilization of Macros 1.2 won't stabilize attributes on modules (or crates), only bare items like functions, structs, impls, traits, etc.

abonander

abonander commented on Apr 18, 2018

@abonander
Contributor

@alexcrichton we're not currently feature gating attribute invocations on modules or at the crate root so that needs to be its own issue. It would be a bit more complex as we'd have to wait until the attribute resolves to a #[proc_macro_attribute] before emitting a feature gate error.

alexcrichton

alexcrichton commented on Apr 18, 2018

@alexcrichton
Member

Oh sure yeah when I say only allow one element that's just for now, we'd still, I'd imagine, allow absolute paths and more-than-one-element paths behind a feature gate.

abonander

abonander commented on Apr 18, 2018

@abonander
Contributor

Absolute paths in attributes are already feature gated, actually. Would #[feature(proc_macro)] just imply that feature gate like it does now with use_extern_macros?

alexcrichton

alexcrichton commented on Apr 18, 2018

@alexcrichton
Member

Perhaps yeah, I might be more of a fan of finer-grained feature gates after the next round of stabilization, but either way is fine.

petrochenkov

petrochenkov commented on Apr 19, 2018

@petrochenkov
ContributorAuthor

@alexcrichton

I think I'd slightly prefer to have both options in the long run as most authors probably won't mind too much about what Delimiter is used, so I'd probably err on the side of leaving it as-is and possibly adding support for a new signature later on.

Yeah, I'm not sure what is better too and tend to leave things as is for now and introduce a separate signature later.

do you also think we should limit paths to just one element? (aka disallow #[foo::bar]).

Yes (#35896 (comment)), but that falls more under the "macro modularisation" issue, so I didn't mention it again.
(If by limiting you mean not stabilizing multi-segment paths rather than "unimplementing" them).

I wanted to clarify, though, are you thinking the delimiter is dropped from the token stream going into #[proc_macro_attribute] as well?

Yes.

If we do that I think we would be required to stabilize and only support a signature that takes a Delimiter (to differentiate #[foo] and #[foo()]).

Differentiating between #[foo] and #[foo()] is equivalent to differentiating between foo!() and foo![], so I think we can certainly live without it and it's not required to introduce the signature with Delimiter immediately.
But if this differentiation is seemed sufficiently important, then we should implement/stabilize the Delimiter signature sooner rather than later for both proc_macro and proc_macro_attribute.

petrochenkov

petrochenkov commented on Apr 20, 2018

@petrochenkov
ContributorAuthor

a signature that takes a Delimiter

One more alternative is to keep the delimiter in CURRENT_SESS and extract it from there on demand like we do, for example, with Span::call_site.

12 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-decl-macros-2-0Area: Declarative macros 2.0 (#39412)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @alexcrichton@kennytm@abonander@petrochenkov

        Issue actions

          What exactly token streams are passed to procedural macros 1.2 · Issue #50038 · rust-lang/rust