Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
left = pd.DataFrame(
{
"digits": [1, 2, 3, 1],
"lowercase": ["a", "v", "f", "g"],
"uppercase": ["A", "B", "B", "A"],
"left": [100_000, 100_000, 100_000, 30_000],
},
).set_index(["digits", "lowercase", "uppercase"])
right = pd.DataFrame(
{
"uppercase": ["A", "B", "B", "D"],
"digits": [1, 2, 3, 5],
"right": [20_000, 70_000, 100_000, 500],
},
).set_index(["uppercase", "digits"]).squeeze()
result = pd.merge(left, right, how="inner", left_index=True, right_index=True)
pd.testing.assert_index_equal(left.index, result.index)
Output:
> AssertionError: MultiIndex level [0] are different
MultiIndex level [0] values are different (75.0 %)
[left]: Index([1, 2, 3, 1], dtype='int64', name='digits')
[right]: Index([1, 1, 2, 3], dtype='int64', name='digits')
Issue Description
As the documentation states:
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
Performing an inner merge should preserve the order of the left keys. In the above example, the left multi-index should be preserved - it is instead sorted.
In a similar fashion, even though align states that the left key order is preserved when doing inner join, it is not. However, aligning twice will result in the correct order, e.g.
_, aligned_right_once = left.align(right, join="inner", axis=0)
_, aligned_right_twice = left.align(aligned_right_once, join="inner", axis=0)
pd.testing.assert_index_equal(left.index, aligned_right_twice.index) # passes
pd.testing.assert_index_equal(left.index, aligned_right_once.index) # raises
which seems fairly inconsistent -
Potentially relatable issues/PR
- Index order of joined frame does not respect the order of left frame when right contains duplicates #16304
- Merge with how='inner' does not always preserve the order of the left keys #18776 (duplicate, but is closed).
- BUG: merge not always following documented sort behavior #54611
Expected Behavior
The key order of the result should the the same as the left dataframe in the example.
Installed Versions
INSTALLED VERSIONS
commit : ba1cccd
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 6.2.0-35-generic
Version : #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 6 10:23:26 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.0
numpy : 1.23.5
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 67.4.0
pip : 23.3.1
Cython : None
pytest : 7.4.0
hypothesis : 6.81.1
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.15.0
pandas_datareader : None
bs4 : 4.11.1
bottleneck : None
dataframe-api-compat: None
fastparquet : 2023.8.0
fsspec : 2023.9.1
gcsfs : 2023.9.1
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None