Skip to content

BUG: in Python3 MultiIndex.from_tuples cannot take "zipped" tuples #18434

Closed
@Xbar

Description

@Xbar
Contributor

Code Sample, a copy-pastable example if possible

# This code fails in python 3
import pandas as pd
my_index = pd.MultiIndex.from_tuples(zip(['a', 'b'], ['c', 'd'], ['e', 'f']), names=['A','B', 'C'])

Problem description

The code above gives an Exception

TypeError: object of type 'zip' has no len()

Because in python3, unlike in python2, the return from zip is NOT a list and cannot get length.

In pandas, there are multiple instances in MultiIndex and related classes, where the code tries to get len() from the arguments, which are valid input but no longer have len property in python3.

Expected Output

Same as in python2
In the case above, should be

MultiIndex(levels=[['a', 'b'], ['c', 'd'], ['e', 'f']],
           labels=[[0, 1], [0, 1], [0, 1]],
           names=['A', 'B', 'C'])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Activity

bashtage

bashtage commented on Nov 22, 2017

@bashtage
Contributor

I'm not sure this is a bug. An iterator is not a list/sequence of tuples.

Xbar

Xbar commented on Nov 22, 2017

@Xbar
ContributorAuthor

Pandas is supposed to have consistent performance between python2 and python3. If the code works in python2, then it is a bug.

bashtage

bashtage commented on Nov 22, 2017

@bashtage
Contributor

zip in Python 2 returns a list. Pandas will work with list(zip(a,b)).

bashtage

bashtage commented on Nov 22, 2017

@bashtage
Contributor

To put it another way, the "bug" is in zip which changes its behavior. Pandas from_tuples requires a list or other sequence on either Python 2 or 3.

Xbar

Xbar commented on Nov 22, 2017

@Xbar
ContributorAuthor

Granted, the issue is due to different behavior of zip; but should iterators be supported here?

Python3 changed behaviors of series of functions, so iterators and alikes are encountered more often than before. Should they be taken as valid arguments?

bashtage

bashtage commented on Nov 22, 2017

@bashtage
Contributor

It is hard/impossible to allocate contiguous data blocks like NumPy arrays unless you know the data size (hence the need for len). This happens in both NumPy and Pandas with iterators.

jreback

jreback commented on Nov 22, 2017

@jreback
Contributor

we generally support list-likes / iterables for most operations (for certain types of indexing a tuple is distinstince). so for the .from_product/tuple/array we say in the doc-string accept list / sequence, so I would be ok accepting an Iterable here.

@Xbar would you do a PR for this?

added this to the Next Major Release milestone on Nov 22, 2017
Xbar

Xbar commented on Nov 22, 2017

@Xbar
ContributorAuthor

An easy fix might be, at the beginning of "from_tuples", test if the argument is an instance of iterator, then convert it to a list. I can make the fix and test it.

added a commit that references this issue on Nov 24, 2017
c2bbe5c
modified the milestones: Next Major Release, 0.22.0 on Nov 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    ReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@bashtage@Xbar

      Issue actions

        BUG: in Python3 MultiIndex.from_tuples cannot take "zipped" tuples · Issue #18434 · pandas-dev/pandas