Description
Code Sample, a copy-pastable example if possible
# This code fails in python 3
import pandas as pd
my_index = pd.MultiIndex.from_tuples(zip(['a', 'b'], ['c', 'd'], ['e', 'f']), names=['A','B', 'C'])
Problem description
The code above gives an Exception
TypeError: object of type 'zip' has no len()
Because in python3, unlike in python2, the return from zip is NOT a list and cannot get length.
In pandas, there are multiple instances in MultiIndex and related classes, where the code tries to get len() from the arguments, which are valid input but no longer have len property in python3.
Expected Output
Same as in python2
In the case above, should be
MultiIndex(levels=[['a', 'b'], ['c', 'd'], ['e', 'f']],
labels=[[0, 1], [0, 1], [0, 1]],
names=['A', 'B', 'C'])
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Activity
bashtage commentedon Nov 22, 2017
I'm not sure this is a bug. An iterator is not a list/sequence of tuples.
Xbar commentedon Nov 22, 2017
Pandas is supposed to have consistent performance between python2 and python3. If the code works in python2, then it is a bug.
bashtage commentedon Nov 22, 2017
zip in Python 2 returns a list. Pandas will work with
list(zip(a,b))
.bashtage commentedon Nov 22, 2017
To put it another way, the "bug" is in zip which changes its behavior. Pandas
from_tuples
requires a list or other sequence on either Python 2 or 3.Xbar commentedon Nov 22, 2017
Granted, the issue is due to different behavior of zip; but should iterators be supported here?
Python3 changed behaviors of series of functions, so iterators and alikes are encountered more often than before. Should they be taken as valid arguments?
bashtage commentedon Nov 22, 2017
It is hard/impossible to allocate contiguous data blocks like NumPy arrays unless you know the data size (hence the need for len). This happens in both NumPy and Pandas with iterators.
jreback commentedon Nov 22, 2017
we generally support list-likes / iterables for most operations (for certain types of indexing a tuple is distinstince). so for the
.from_product/tuple/array
we say in the doc-string acceptlist / sequence
, so I would be ok accepting an Iterable here.@Xbar would you do a PR for this?
Xbar commentedon Nov 22, 2017
An easy fix might be, at the beginning of "from_tuples", test if the argument is an instance of iterator, then convert it to a list. I can make the fix and test it.
Fix pandas-dev#18434