Closed
Description
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(x=[0, 1], y=[1, 10], z=[0.1, 0.8], money=[100, 200]))
df2 = pd.DataFrame(dict(time=[23, 26], money=[100, 200]))
fig = px.scatter(df, x="z", y=df2.money, size=df.y)
With pandas 2.2.3:
Traceback (most recent call last):
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/marimo/_runtime/executor.py", line 157, in execute_cell
exec(cell.body, glbls)
Cell marimo://trying_plotly.py#cell=cell-0
, line 5, in <module>
fig = px.scatter(df, x="z", y=df2.money, size=df.y)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_chart_types.py", line 66, in scatter
return make_figure(args=locals(), constructor=go.Scatter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 2117, in make_figure
args = build_dataframe(args, constructor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1513, in build_dataframe
df_output, wide_id_vars = process_args_into_dataframe(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1271, in process_args_into_dataframe
col_name = _check_name_not_reserved(field, reserved_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1006, in _check_name_not_reserved
raise NameError(
NameError: A name conflict was encountered for argument 'y'. A column or index with name 'y' is ambiguous.
With the latest pandas nightly (installable with pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas
) it just plots, without raising
The difference is due to pandas no longer caching __getitem__
for columns:
in pandas 3.0+
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
In [3]: df['a'] is df['a']
Out[3]: False
in pandas 2.2.3
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
In [3]: df['a'] is df['a']
Out[3]: True
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
emilykl commentedon Oct 28, 2024
@MarcoGorelli I can't tell whether
size
in the generated chart matchesdf2.money
ordf.y
-- could you post a version wheremoney=[200, 100]
in order to clarify?If the former, this is definitely a bug; if the latter, this seems fine -- I'm guessing it's the former though.
Do you happen to have a link to the Pandas PR containing this behavior change? I see a few related to
__getitem__
in the changelog but can't find the exact one.MarcoGorelli commentedon Oct 29, 2024
hey - sure, if I do that, the output does actually look correct:
I reported this because there's a test which checks that this raises:
plotly.py/packages/python/plotly/plotly/tests/test_optional/test_px/test_px_input.py
Lines 96 to 98 in 9c5d112
and it would no longer raise with pandas 3.0. However, to be honest I can't see what's ambiguous here, it looks well-defined.
So, either this is a false positive and this check for reserved names can be removed, or it breaks something else?
Looks like this was introduced in #1768, and the
is
vsequals
topics was brought up #1768 (comment) . The people involved there don't seem to be active in Plotly any more, reckon it's OK to ask them?The pandas PR is pandas-dev/pandas#56614 (though the removal of the item cache doesn't seem mentioned - maybe it's just considered an internal thing which users weren't meant to be relying on in the first place)
MarcoGorelli commentedon Nov 13, 2024
closed by #4790