Skip to content

ast: Different FormattedValue expressions have same col_offset information #81639

@WeijarZ

Description

@WeijarZ
mannequin
Mannequin
BPO 37458
Nosy @ericvsmith, @lysnikolaou, @isidentical

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/ericvsmith'
closed_at = None
created_at = <Date 2019-07-01.02:10:41.834>
labels = ['interpreter-core', 'type-bug', '3.10']
title = 'ast: Different FormattedValue expressions have same col_offset information'
updated_at = <Date 2020-07-04.23:02:28.764>
user = 'https://bugs.python.org/WeijarZ'

bugs.python.org fields:

activity = <Date 2020-07-04.23:02:28.764>
actor = 'eric.smith'
assignee = 'eric.smith'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2019-07-01.02:10:41.834>
creator = 'Weijar Z'
dependencies = []
files = []
hgrepos = []
issue_num = 37458
keywords = []
message_count = 5.0
messages = ['346950', '346981', '372989', '372991', '373002']
nosy_count = 4.0
nosy_names = ['eric.smith', 'lys.nikolaou', 'BTaskaya', 'Weijar Z']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue37458'
versions = ['Python 3.10']

Activity

WeijarZ

WeijarZ commented on Jul 1, 2019

@WeijarZ
MannequinAuthor

This express

f"{x}{x}{y}"

will produce ast tree:

{
"$node": "Module",
"body": [
{
"$node": "Expr",
"value": {
"$node": "JoinedStr",
"values": [
{
"$node": "FormattedValue",
"value": {
"$node": "Name",
"id": "x",
"ctx": {
"$node": "Load"
},
"lineno": 1,
"col_offset": 3
},
"conversion": -1,
"format_spec": null,
"lineno": 1,
"col_offset": 0
},
{
"$node": "FormattedValue",
"value": {
"$node": "Name",
"id": "x",
"ctx": {
"$node": "Load"
},
"lineno": 1,
"col_offset": 3
},
"conversion": -1,
"format_spec": null,
"lineno": 1,
"col_offset": 0
},
{
"$node": "FormattedValue",
"value": {
"$node": "Name",
"id": "y",
"ctx": {
"$node": "Load"
},
"lineno": 1,
"col_offset": 9
},
"conversion": -1,
"format_spec": null,
"lineno": 1,
"col_offset": 0
}
],
"lineno": 1,
"col_offset": 0
},
"lineno": 1,
"col_offset": 0
}
]
}

These two variable 'x' has same col_offset '3', is it wrong?

added
stdlibPython modules in the Lib dir
type-bugAn unexpected behavior, bug, or error
on Jul 1, 2019
ericvsmith

ericvsmith commented on Jul 1, 2019

@ericvsmith
Member

I'm working on overhauling how these are calculated. But it's complex, and is taking a while.

self-assigned this
on Jul 1, 2019
ericvsmith

ericvsmith commented on Jul 4, 2020

@ericvsmith
Member

I still see this problem with 3.10, which I thought might have fixed this.

@lys.nikolaou: any ideas on this?

added
interpreter-core(Objects, Python, Grammar, and Parser dirs)
3.10only security fixes
and removed
stdlibPython modules in the Lib dir
on Jul 4, 2020
lysnikolaou

lysnikolaou commented on Jul 4, 2020

@lysnikolaou
Member

I still see this problem with 3.10, which I thought might have fixed this.
Nope, that's still true in 3.10.

I'm working on overhauling how these are calculated. But it's complex, and is taking a while.
In short, the FormattedValue nodes all have the exact same lineno's and col_offset's, which are also identical to those of the enclosing JoinedStr node. These values are equal to lineno and col_offset of the first STRING token and end_lineno and end_col_offset of the last STRING token (note that the grammar accepts a STRING+ node).

any ideas on this?
Moving f-string parsing to the parser, which we've been talking about lately, would solve this. But this will probably take some time, since I currently do not have the time. It's probably going to be a good project for this coming fall.

Another alternative, in case we don't want to wait until then, would be for the handwritten f-string parser to have its own instances of a lineno and col_offset, so that they can be used when the FormattedValue nodes are created. This would probably also require some effort though, so I'm not sure we want to do it, before we really know if we're gonnna proceed with the "moving f-string parsing to PEG" project.

ericvsmith

ericvsmith commented on Jul 4, 2020

@ericvsmith
Member

I think waiting until we decide what to do with the parser makes sense. This problem has been around for a while, and while it's unfortunate I don't think it's worth heroic measures to fix.

transferred this issue fromon Apr 10, 2022
alexmojaki

alexmojaki commented on Oct 3, 2022

@alexmojaki

This was fixed in #27729, right?

lysnikolaou

lysnikolaou commented on Oct 3, 2022

@lysnikolaou
Member

I'm not sure whether it's been completely fixed, definitely there's been improvements since the issue was opened here. We're working on moving f-string parsing into the PEG parser, which'll certainly help close this once and for all.

alexmojaki

alexmojaki commented on Oct 3, 2022

@alexmojaki

The PR I linked includes fixing FormattedValues which contain identical-looking expressions, which I think is the problem being described here. The specific example is fixed for me in 3.9.7 and 3.10.

Are there any remaining problems you know of relating to incorrect lineno/col_offset in AST nodes?

youknowone

youknowone commented on May 19, 2023

@youknowone
Contributor

It seems not fixed at least until 3.11

Python 3.11.1 (main, Feb  4 2023, 11:11:18) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
>>> import ast
>>> expr = ast.parse('''\n\n\n\n\nf"Warning: '{fieldname}' should be a list, got type '{typename}'"''').body[0]
>>> expr.value.values
[<ast.Constant object at 0x1004e9c90>, <ast.FormattedValue object at 0x1004ebc70>, <ast.Constant object at 0x1004ea2c0>, <ast.FormattedValue object at 0x1004eb3a0>, <ast.Constant object at 0x1004eb190>]
>>> def l(n): return ((n.lineno, n.col_offset), (n.end_lineno, n.end_col_offset))
... 
>>> for v in expr.value.values: print(l(v))
... 
((6, 0), (6, 65))
((6, 0), (6, 65))
((6, 0), (6, 65))
((6, 0), (6, 65))
((6, 0), (6, 65))
lysnikolaou

lysnikolaou commented on May 22, 2023

@lysnikolaou
Member

This has been fixed in 3.12 after the PEP 701 implementation (#102855) was merged. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

3.10only security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @youknowone@ericvsmith@alexmojaki@lysnikolaou

      Issue actions

        ast: Different FormattedValue expressions have same col_offset information · Issue #81639 · python/cpython