Merge remote-tracking branch 'upstream/main' into mamba_solver_issue

datapythonista · datapythonista · commit 9011525f4246 · 2025-06-07T02:00:15.000+04:00
diff --git a/.github/workflows/wheels.yml b/.github/workflows/wheels.yml
@@ -95,6 +95,7 @@ jobs:
         - [ubuntu-24.04, manylinux_x86_64]
         - [ubuntu-24.04, musllinux_x86_64]
         - [ubuntu-24.04-arm, manylinux_aarch64]
+        - [ubuntu-24.04-arm, musllinux_aarch64]
         - [macos-13, macosx_x86_64]
         # Note: M1 images on Github Actions start from macOS 14
         - [macos-14, macosx_arm64]
diff --git a/doc/source/user_guide/reshaping.rst b/doc/source/user_guide/reshaping.rst
@@ -395,7 +395,7 @@ variables and the values representing the presence of those variables per row.
    pd.get_dummies(df["key"])
    df["key"].str.get_dummies()
 
-``prefix`` adds a prefix to the the column names which is useful for merging the result
+``prefix`` adds a prefix to the column names which is useful for merging the result
 with the original :class:`DataFrame`:
 
 .. ipython:: python
diff --git a/doc/source/user_guide/user_defined_functions.rst b/doc/source/user_guide/user_defined_functions.rst
@@ -319,7 +319,7 @@ to the original data.
 
 In the example, the ``warm_up_all_days`` function computes the ``max`` like an aggregation, but instead
 of returning just the maximum value, it returns a ``DataFrame`` with the same shape as the original one
-with the values of each day replaced by the the maximum temperature of the city.
+with the values of each day replaced by the maximum temperature of the city.
 
 ``transform`` is also available for :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.transform` and
 :meth:`Resampler.transform`, where it's more common. You can read more about ``transform`` in groupby
diff --git a/doc/source/whatsnew/v1.1.0.rst b/doc/source/whatsnew/v1.1.0.rst
@@ -1039,7 +1039,7 @@ Missing
 ^^^^^^^
 - Calling :meth:`fillna` on an empty :class:`Series` now correctly returns a shallow copied object. The behaviour is now consistent with :class:`Index`, :class:`DataFrame` and a non-empty :class:`Series` (:issue:`32543`).
 - Bug in :meth:`Series.replace` when argument ``to_replace`` is of type dict/list and is used on a :class:`Series` containing ``<NA>`` was raising a ``TypeError``. The method now handles this by ignoring ``<NA>`` values when doing the comparison for the replacement (:issue:`32621`)
-- Bug in :meth:`~Series.any` and :meth:`~Series.all` incorrectly returning ``<NA>`` for all ``False`` or all ``True`` values using the nulllable Boolean dtype and with ``skipna=False`` (:issue:`33253`)
+- Bug in :meth:`~Series.any` and :meth:`~Series.all` incorrectly returning ``<NA>`` for all ``False`` or all ``True`` values using the nullable Boolean dtype and with ``skipna=False`` (:issue:`33253`)
 - Clarified documentation on interpolate with ``method=akima``. The ``der`` parameter must be scalar or ``None`` (:issue:`33426`)
 - :meth:`DataFrame.interpolate` uses the correct axis convention now. Previously interpolating along columns lead to interpolation along indices and vice versa. Furthermore interpolating with methods ``pad``, ``ffill``, ``bfill`` and ``backfill`` are identical to using these methods with :meth:`DataFrame.fillna` (:issue:`12918`, :issue:`29146`)
 - Bug in :meth:`DataFrame.interpolate` when called on a :class:`DataFrame` with column names of string type was throwing a ValueError. The method is now independent of the type of the column names (:issue:`33956`)
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -833,7 +833,7 @@ Groupby/resample/rolling
 - Bug in :meth:`DataFrameGroupby.transform` and :meth:`SeriesGroupby.transform` with a reducer and ``observed=False`` that coerces dtype to float when there are unobserved categories. (:issue:`55326`)
 - Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
 - Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
-- Bug in :meth:`Series.resample` could raise when the the date range ended shortly before a non-existent time. (:issue:`58380`)
+- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
 
 Reshaping
 ^^^^^^^^^
@@ -851,6 +851,7 @@ Reshaping
 - Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
 - Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
 - Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
+- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
 
 Sparse
 ^^^^^^
diff --git a/meson.build b/meson.build
@@ -57,6 +57,7 @@ if cy.version().version_compare('>=3.1.0')
         command: [
             cy,
             '-3',
+            '-Xfreethreading_compatible=true',
             '--fast-fail',
             '--generate-shared=' + meson.current_build_dir() / '_cyutility.c',
         ],
diff --git a/pandas/_libs/tslibs/conversion.pyx b/pandas/_libs/tslibs/conversion.pyx
@@ -797,7 +797,7 @@ cdef int64_t parse_pydatetime(
     dts : *npy_datetimestruct
         Needed to use in pydatetime_to_dt64, which writes to it.
     creso : NPY_DATETIMEUNIT
-        Resolution to store the the result.
+        Resolution to store the result.
 
     Raises
     ------
diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py
@@ -1666,7 +1666,7 @@ def __array__(
         Parameters
         ----------
         dtype : np.dtype or None
-            Specifies the the dtype for the array.
+            Specifies the dtype for the array.
 
         copy : bool or None, optional
             See :func:`numpy.asarray`.
diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py
@@ -3062,13 +3062,16 @@ def renamer(x, suffix: str | None):
     if not llabels.is_unique:
         # Only warn when duplicates are caused because of suffixes, already duplicated
         # columns in origin should not warn
-        dups = llabels[(llabels.duplicated()) & (~left.duplicated())].tolist()
+        dups.extend(llabels[(llabels.duplicated()) & (~left.duplicated())].tolist())
     if not rlabels.is_unique:
         dups.extend(rlabels[(rlabels.duplicated()) & (~right.duplicated())].tolist())
+    # Suffix addition creates duplicate to pre-existing column name
+    dups.extend(llabels.intersection(right.difference(to_rename)).tolist())
+    dups.extend(rlabels.intersection(left.difference(to_rename)).tolist())
     if dups:
         raise MergeError(
             f"Passing 'suffixes' which cause duplicate columns {set(dups)} is "
-            f"not allowed.",
+            "not allowed.",
         )
 
     return llabels, rlabels
diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py
@@ -114,7 +114,7 @@
         columns : array-like, optional, default None
             The subset of columns to write. Writes all columns by default.
         col_space : %(col_space_type)s, optional
-            %(col_space)s.
+            %(col_space)s
         header : %(header_type)s, optional
             %(header)s.
         index : bool, optional, default True
diff --git a/pandas/tests/io/xml/test_to_xml.py b/pandas/tests/io/xml/test_to_xml.py
@@ -1345,7 +1345,7 @@ def test_ea_dtypes(any_numeric_ea_dtype, parser):
     assert equalize_decl(result).strip() == expected
 
 
-def test_unsuported_compression(parser, geom_df):
+def test_unsupported_compression(parser, geom_df):
     with pytest.raises(ValueError, match="Unrecognized compression type"):
         with tm.ensure_clean() as path:
             geom_df.to_xml(path, parser=parser, compression="7z")
diff --git a/pandas/tests/io/xml/test_xml.py b/pandas/tests/io/xml/test_xml.py
@@ -1961,7 +1961,7 @@ def test_wrong_compression(parser, compression, compression_only):
             read_xml(path, parser=parser, compression=attempted_compression)
 
 
-def test_unsuported_compression(parser):
+def test_unsupported_compression(parser):
     with pytest.raises(ValueError, match="Unrecognized compression type"):
         with tm.ensure_clean() as path:
             read_xml(path, parser=parser, compression="7z")
diff --git a/pandas/tests/reshape/merge/test_merge.py b/pandas/tests/reshape/merge/test_merge.py
@@ -3060,3 +3060,12 @@ def test_merge_on_all_nan_column():
         {"x": [1, 2, 3], "y": [np.nan, np.nan, np.nan], "z": [4, 5, 6], "zz": [4, 5, 6]}
     )
     tm.assert_frame_equal(result, expected)
+
+
+@pytest.mark.parametrize("suffixes", [("_dup", ""), ("", "_dup")])
+def test_merge_for_suffix_collisions(suffixes):
+    # GH#61402
+    df1 = DataFrame({"col1": [1], "col2": [2]})
+    df2 = DataFrame({"col1": [1], "col2": [2], "col2_dup": [3]})
+    with pytest.raises(MergeError, match="duplicate columns"):
+        merge(df1, df2, on="col1", suffixes=suffixes)
diff --git a/web/pandas/pdeps/0001-purpose-and-guidelines.md b/web/pandas/pdeps/0001-purpose-and-guidelines.md
@@ -8,6 +8,8 @@
   [Noa Tamir](https://github.com/noatamir)
 - Revision: 3
 
+[TOC]
+
 ## PDEP definition, purpose and scope
 
 A PDEP (pandas enhancement proposal) is a proposal for a **major** change in
diff --git a/web/pandas/pdeps/0004-consistent-to-datetime-parsing.md b/web/pandas/pdeps/0004-consistent-to-datetime-parsing.md
@@ -6,6 +6,8 @@
 - Author: [Marco Gorelli](https://github.com/MarcoGorelli)
 - Revision: 2
 
+[TOC]
+
 ## Abstract
 
 The suggestion is that:
diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md
@@ -6,6 +6,8 @@
 - Author: [Marco Gorelli](https://github.com/MarcoGorelli)
 - Revision: 2
 
+[TOC]
+
 ## Abstract
 
 The suggestion is to add a ``NoRowIndex`` class. Internally, it would act a bit like
diff --git a/web/pandas/pdeps/0006-ban-upcasting.md b/web/pandas/pdeps/0006-ban-upcasting.md
@@ -6,6 +6,8 @@
 - Author: [Marco Gorelli](https://github.com/MarcoGorelli) ([original issue](https://github.com/pandas-dev/pandas/issues/39584) by [Joris Van den Bossche](https://github.com/jorisvandenbossche))
 - Revision: 1
 
+[TOC]
+
 ## Abstract
 
 The suggestion is that setitem-like operations would
diff --git a/web/pandas/pdeps/0007-copy-on-write.md b/web/pandas/pdeps/0007-copy-on-write.md
@@ -6,6 +6,8 @@
 - Author: [Joris Van den Bossche](https://github.com/jorisvandenbossche)
 - Revision: 1
 
+[TOC]
+
 ## Abstract
 
 Short summary of the proposal:
@@ -525,7 +527,7 @@ following cases:
 * Selecting a single column (as a Series) out of a DataFrame is always a view
   (``df['a']``)
 * Slicing columns from a DataFrame creating a subset DataFrame (``df[['a':'b']]`` or
-  ``df.loc[:, 'a': 'b']``) is a view _if_ the the original DataFrame consists of a
+  ``df.loc[:, 'a': 'b']``) is a view _if_ the original DataFrame consists of a
   single block (single dtype, consolidated) and _if_ you are slicing (so not a list
   selection). In all other cases, getting a subset is always a copy.
 * Selecting rows _can_ return a view, when the row indexer is a `slice` object.
diff --git a/web/pandas/pdeps/0009-io-extensions.md b/web/pandas/pdeps/0009-io-extensions.md
@@ -7,6 +7,8 @@
 - Author: [Marc Garcia](https://github.com/datapythonista)
 - Revision: 1
 
+[TOC]
+
 ## PDEP Summary
 
 This document proposes that third-party projects implementing I/O or memory
diff --git a/web/pandas/pdeps/0010-required-pyarrow-dependency.md b/web/pandas/pdeps/0010-required-pyarrow-dependency.md
@@ -8,6 +8,8 @@
           [Patrick Hoefler](https://github.com/phofl)
 - Revision: 1
 
+[TOC]
+
 ## Abstract
 
 This PDEP proposes that:
diff --git a/web/pandas/pdeps/0012-compact-and-reversible-JSON-interface.md b/web/pandas/pdeps/0012-compact-and-reversible-JSON-interface.md
@@ -8,36 +8,7 @@
 - Author: [Philippe THOMY](https://github.com/loco-philippe)
 - Revision: 3
 
-##### Summary
-
-- [Abstract](./0012-compact-and-reversible-JSON-interface.md/#Abstract)
-  - [Problem description](./0012-compact-and-reversible-JSON-interface.md/#Problem-description)
-  - [Feature Description](./0012-compact-and-reversible-JSON-interface.md/#Feature-Description)
-- [Scope](./0012-compact-and-reversible-JSON-interface.md/#Scope)
-- [Motivation](./0012-compact-and-reversible-JSON-interface.md/#Motivation)
-  - [Why is it important to have a compact and reversible JSON interface ?](./0012-compact-and-reversible-JSON-interface.md/#Why-is-it-important-to-have-a-compact-and-reversible-JSON-interface-?)
-  - [Is it relevant to take an extended type into account ?](./0012-compact-and-reversible-JSON-interface.md/#Is-it-relevant-to-take-an-extended-type-into-account-?)
-  - [Is this only useful for pandas ?](./0012-compact-and-reversible-JSON-interface.md/#Is-this-only-useful-for-pandas-?)
-- [Description](./0012-compact-and-reversible-JSON-interface.md/#Description)
-  - [Data typing](./0012-compact-and-reversible-JSON-interface.md/#Data-typing)
-  - [Correspondence between TableSchema and pandas](./panda0012-compact-and-reversible-JSON-interfaces_PDEP.md/#Correspondence-between-TableSchema-and-pandas)
-  - [JSON format](./0012-compact-and-reversible-JSON-interface.md/#JSON-format)
-  - [Conversion](./0012-compact-and-reversible-JSON-interface.md/#Conversion)
-- [Usage and impact](./0012-compact-and-reversible-JSON-interface.md/#Usage-and-impact)
-  - [Usage](./0012-compact-and-reversible-JSON-interface.md/#Usage)
-  - [Compatibility](./0012-compact-and-reversible-JSON-interface.md/#Compatibility)
-  - [Impacts on the pandas framework](./0012-compact-and-reversible-JSON-interface.md/#Impacts-on-the-pandas-framework)
-  - [Risk to do / risk not to do](./0012-compact-and-reversible-JSON-interface.md/#Risk-to-do-/-risk-not-to-do)
-- [Implementation](./0012-compact-and-reversible-JSON-interface.md/#Implementation)
-  - [Modules](./0012-compact-and-reversible-JSON-interface.md/#Modules)
-  - [Implementation options](./0012-compact-and-reversible-JSON-interface.md/#Implementation-options)
-- [F.A.Q.](./0012-compact-and-reversible-JSON-interface.md/#F.A.Q.)
-- [Synthesis](./0012-compact-and-reversible-JSON-interface.md/Synthesis)
-- [Core team decision](./0012-compact-and-reversible-JSON-interface.md/#Core-team-decision)
-- [Timeline](./0012-compact-and-reversible-JSON-interface.md/#Timeline)
-- [PDEP history](./0012-compact-and-reversible-JSON-interface.md/#PDEP-history)
-
--------------------------
+[TOC]
 
 ## Abstract
 
diff --git a/web/pandas/pdeps/0014-string-dtype.md b/web/pandas/pdeps/0014-string-dtype.md
@@ -220,8 +220,8 @@ in pandas 2.3 and removed in pandas 3.0.
 
 The `storage` keyword of `StringDtype` is kept to disambiguate the underlying
 storage of the string data (using pyarrow or python objects), but an additional
-`na_value` is introduced to disambiguate the the variants using NA semantics
-and NaN semantics.
+`na_value` is introduced to disambiguate the variants using NA semantics and
+NaN semantics.
 
 Overview of the different ways to specify a dtype and the resulting concrete
 dtype of the data:
diff --git a/web/pandas/static/css/pandas.css b/web/pandas/static/css/pandas.css
@@ -1,3 +1,6 @@
+html {
+  scroll-padding-top: 5rem;
+}
 body {
     padding-top: 5em;
     color: #444;
@@ -103,3 +106,23 @@ blockquote {
     color: #787878;
     font-size: 18px;
  }
+
+.toc {
+    background: #f0f0f0;
+    padding: 10px;
+    display: inline-block;
+}
+a.headerlink {
+    opacity: 0;
+}
+h2:hover a.headerlink {
+    opacity: 1;
+    transition: opacity 0.5s;
+}
+h3:hover a.headerlink {
+    opacity: 1;
+    transition: opacity 0.5s;
+}
+.container {
+    max-width: 100ch;
+}
diff --git a/web/pandas/versions.json b/web/pandas/versions.json
@@ -13,7 +13,7 @@
     {
         "name": "2.2",
         "version": "2.2",
-        "url": "https://pandas.pydata.org/pandas-docs/version/2.2/",
+        "url": "https://pandas.pydata.org/pandas-docs/version/2.2/"
     },
     {
         "name": "2.1",
diff --git a/web/pandas_web.py b/web/pandas_web.py
@@ -466,9 +466,29 @@ def main(
             with open(os.path.join(source_path, fname), encoding="utf-8") as f:
                 content = f.read()
             if extension == ".md":
-                body = markdown.markdown(
-                    content, extensions=context["main"]["markdown_extensions"]
-                )
+                if "pdeps/" in fname:
+                    from markdown.extensions.toc import TocExtension
+
+                    body = markdown.markdown(
+                        content,
+                        extensions=[
+                            # Ignore the title of the PDEP in the table of contents
+                            TocExtension(
+                                title="Table of Contents",
+                                toc_depth="2-3",
+                                permalink=" #",
+                            ),
+                            "tables",
+                            "fenced_code",
+                            "meta",
+                            "footnotes",
+                            "codehilite",
+                        ],
+                    )
+                else:
+                    body = markdown.markdown(
+                        content, extensions=context["main"]["markdown_extensions"]
+                    )
                 # Apply Bootstrap's table formatting manually
                 # Python-Markdown doesn't let us config table attributes by hand
                 body = body.replace("<table>", '<table class="table table-bordered">')

Original file line number	Diff line number	Diff line change
`@@ -13,7 +13,7 @@`
`13`	`13`	`{`
`14`	`14`	`"name": "2.2",`
`15`	`15`	`"version": "2.2",`
`16`		`- "url": "https://pandas.pydata.org/pandas-docs/version/2.2/",`
	`16`	`+ "url": "https://pandas.pydata.org/pandas-docs/version/2.2/"`
`17`	`17`	`},`
`18`	`18`	`{`
`19`	`19`	`"name": "2.1",`