Skip to content

DataFrame.to_pickle() fails for .zip format on MacOS and pandas 0.20.3 #17778

Closed
reef-technologies/pandas
#2
@normanius

Description

@normanius

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 4), index=['A', 'B', 'C'])
df.to_pickle('out.zip')
#pd.read_pickle('out.zip')

Problem description

The below exception occurs. I do have writing permissions in the working directory. The code was working for pandas 0.19.0.

No problems observed for bz2 and gzip compression (xz I haven't tested).

  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1378, in to_pickle
    df.to_pickle('tmp.zip')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/pickle.py", line 27, in to_pickle
    is_text=False)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/common.py", line 352, in _get_handle
    zip_file = zipfile.ZipFile(path_or_buf)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 756, in __init__
    self.fp = open(file, modeDict[mode])
IOError: [Errno 2] No such file or directory: 'out.zip'

Expected Output

A zip file that one can re-read with pandas.read_pickle().

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: 0.26
numpy: 1.14.0.dev0+029863e
scipy: 0.18.1
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Activity

changed the title [-]DataFrame.to_pickle() fails for .zip format on MacOS and pandas 20.3[/-] [+]DataFrame.to_pickle() fails for .zip format on MacOS and pandas 0.20.3[/+] on Oct 4, 2017
normanius

normanius commented on Oct 4, 2017

@normanius
Author

The problem is located in _get_handle() of module pandas.io.common:

# ZIP Compression
elif compression == 'zip':
    import zipfile
    zip_file = zipfile.ZipFile(path_or_buf)
    zip_names = zip_file.namelist()
    if len(zip_names) == 1:
        f = zip_file.open(zip_names.pop())
    elif len(zip_names) == 0:
        raise ValueError('Zero files found in ZIP file {}'
                         .format(path_or_buf))
    else:
        raise ValueError('Multiple files found in ZIP file.'
                         ' Only one file per ZIP: {}'
                         .format(zip_names))

With this code, the zip file is opened only for reading, and not for writing. Argument mode certainly should be used somewhere.

chris-b1

chris-b1 commented on Oct 4, 2017

@chris-b1
Contributor

Yep, problem does seem to be not passing the correct mode, PR to fix welcome!

added this to the Next Major Release milestone on Oct 4, 2017
masongallo

masongallo commented on Oct 4, 2017

@masongallo
Contributor

It looks like the code for zip was written only for reading? Why not use gzip to write a single zip file?

added a commit that references this issue on Oct 5, 2017
s4chin

s4chin commented on Oct 12, 2017

@s4chin

Can I try this? I'm looking for a first issue as an entry point.

chris-b1

chris-b1 commented on Oct 12, 2017

@chris-b1
Contributor

Yes, go ahead!

s4chin

s4chin commented on Oct 13, 2017

@s4chin

mode is 'wb' when writing to the zipfile. zipfile.Zipfile only accepts 'a', 'r', 'w' as modes, hence 'wb' needs to be converted to 'w'.
After doing this, it gives me

File "pandas/io/common.py", line 369, in _get_handle
    .format(path_or_buf))
ValueError: Zero files found in ZIP file out.zip

So I just took out the if ... elif ... else part out and did f = zipfile.ZipFile(path_or_buf, 'w') which results in

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/generic.py", line 1611, in to_pickle
    protocol=protocol)
  File "pandas/io/pickle.py", line 45, in to_pickle
    pkl.dump(obj, f, protocol=protocol)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 1123, in write
    st = os.stat(filename)
TypeError: must be encoded string without NULL bytes, not str

Any pointers on how to move ahead? As @masongallo said, the code looks like it was meant only for reading.

normanius

normanius commented on Oct 13, 2017

@normanius
Author

When I looked at it, I didn't find a straightforward way of doing it. The problem is that io.common._get_handle() needs to create an object with a file-like interface (read, write, open) to which you can later write strings/bytes. zipfile.ZipFile represents more a container for files than a container for strings, so not sure if it can be used like a normal file-handle.

Maybe one can construct something around ZipFile.writestr() that takes bytes instead of files to write into the zip file. This won't give you a file-handle or anything, but maybe you can tinker one using some functools or StringIO. But for this one needs to understand where the file-handle is used etc.

Alternatively follow up on @masongallo comment regarding gzip?

2 remaining items

ghost added a commit that references this issue on Nov 6, 2017
ghost added a commit that references this issue on Nov 6, 2017
ghost added a commit that references this issue on Nov 7, 2017
ghost added a commit that references this issue on Nov 7, 2017
minggli

minggli commented on Mar 17, 2018

@minggli
Contributor

Hi @jreback ,

Will try to fix this issue if it hasn't been fixed since last conversation. Reverting.

Thanks,

Ming

modified the milestones: Next Major Release, 0.23.0 on Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @jreback@TomAugspurger@chris-b1@masongallo@s4chin

      Issue actions

        DataFrame.to_pickle() fails for .zip format on MacOS and pandas 0.20.3 · Issue #17778 · pandas-dev/pandas