Skip to content

plotly with pandas>=3.0 fails to raise name conflict error #4837

Closed
@MarcoGorelli

Description

@MarcoGorelli
Contributor
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(x=[0, 1], y=[1, 10], z=[0.1, 0.8], money=[100, 200]))
df2 = pd.DataFrame(dict(time=[23, 26], money=[100, 200]))
fig = px.scatter(df, x="z", y=df2.money, size=df.y)

With pandas 2.2.3:

Traceback (most recent call last):
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/marimo/_runtime/executor.py", line 157, in execute_cell
    exec(cell.body, glbls)
  Cell marimo://trying_plotly.py#cell=cell-0
, line 5, in <module>
    fig = px.scatter(df, x="z", y=df2.money, size=df.y)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_chart_types.py", line 66, in scatter
    return make_figure(args=locals(), constructor=go.Scatter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 2117, in make_figure
    args = build_dataframe(args, constructor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1513, in build_dataframe
    df_output, wide_id_vars = process_args_into_dataframe(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1271, in process_args_into_dataframe
    col_name = _check_name_not_reserved(field, reserved_names)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1006, in _check_name_not_reserved
    raise NameError(
NameError: A name conflict was encountered for argument 'y'. A column or index with name 'y' is ambiguous.

With the latest pandas nightly (installable with pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas) it just plots, without raising

Image

The difference is due to pandas no longer caching __getitem__ for columns:

in pandas 3.0+

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [3]: df['a'] is df['a']
Out[3]: False

in pandas 2.2.3

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [3]: df['a'] is df['a']
Out[3]: True

Activity

added
bugsomething broken
P1needed for current cycle
on Oct 28, 2024
emilykl

emilykl commented on Oct 28, 2024

@emilykl
Contributor

@MarcoGorelli I can't tell whether size in the generated chart matches df2.money or df.y -- could you post a version where money=[200, 100] in order to clarify?

If the former, this is definitely a bug; if the latter, this seems fine -- I'm guessing it's the former though.

Do you happen to have a link to the Pandas PR containing this behavior change? I see a few related to __getitem__ in the changelog but can't find the exact one.

added
P2considered for next cycle
and removed
P1needed for current cycle
on Oct 29, 2024
MarcoGorelli

MarcoGorelli commented on Oct 29, 2024

@MarcoGorelli
ContributorAuthor

hey - sure, if I do that, the output does actually look correct:

Image

I reported this because there's a test which checks that this raises:

with pytest.raises(NameError) as err_msg:
fig = px.scatter(df, x="z", y=df2.money, size=df.y)
assert "A name conflict was encountered for argument 'y'" in str(err_msg.value)

and it would no longer raise with pandas 3.0. However, to be honest I can't see what's ambiguous here, it looks well-defined.

So, either this is a false positive and this check for reserved names can be removed, or it breaks something else?

Looks like this was introduced in #1768, and the is vs equals topics was brought up #1768 (comment) . The people involved there don't seem to be active in Plotly any more, reckon it's OK to ask them?


The pandas PR is pandas-dev/pandas#56614 (though the removal of the item cache doesn't seem mentioned - maybe it's just considered an internal thing which users weren't meant to be relying on in the first place)

MarcoGorelli

MarcoGorelli commented on Nov 13, 2024

@MarcoGorelli
ContributorAuthor

closed by #4790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

P2considered for next cyclebugsomething broken

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @gvwilson@emilykl@MarcoGorelli

      Issue actions

        plotly with pandas>=3.0 fails to raise name conflict error · Issue #4837 · plotly/plotly.py