Skip to content

HDF file compression not working #29310

Open
@WuraolaOyewusi

Description

@WuraolaOyewusi
Contributor

xref #28890 (review)

While updating the performance comparison part of the IO docs it was found that compressed size values for .hdf file formats were the same as uncompressed .hdf file formats.

    24009288 Oct 10 06:43 test_fixed.hdf
    24009288 Oct 10 06:43 test_fixed_compress.hdf
    24458940 Oct 10 06:44 test_table.hdf
    24458940 Oct 10 06:44 test_table_compress.hdf

This seems to be caused by the next lines saving the same files:

df.to_hdf('test.hdf', 'test', mode='w')
df.to_hdf('test.hdf', 'test', mode='w', complib='blosc')

We need to see why the complib parameter is being ignored, and fix it so the hdf5 file is saved compressed when used.

CC @datapythonista

Activity

kylekeppler

kylekeppler commented on Nov 1, 2019

@kylekeppler
Contributor
yp1996

yp1996 commented on Nov 4, 2019

@yp1996

Hi @WuraolaOyewusi are currently working on this issue? If not, do you mind if I give it a shot?

WuraolaOyewusi

WuraolaOyewusi commented on Nov 4, 2019

@WuraolaOyewusi
ContributorAuthor

@yp1996 . Please go ahead.

joybh98

joybh98 commented on Dec 24, 2019

@joybh98
Contributor

@yp1996 ,are you still working on this? If not I'll like to work on it!

MarcoGorelli

MarcoGorelli commented on Feb 6, 2020

@MarcoGorelli
Member

@joybhallaa we haven't heard from them in over 2 weeks, so if this is something you'd like to work on I think you're OK to go ahead - please comment 'take' so the issue will be assigned to you

joybh98

joybh98 commented on Feb 7, 2020

@joybh98
Contributor

take

MarcoGorelli

MarcoGorelli commented on Feb 7, 2020

@MarcoGorelli
Member

Sure - you can see #29404 for a starting point if you like

joybh98

joybh98 commented on Feb 7, 2020

@joybh98
Contributor

@MarcoGorelli , sure will do!

joybh98

joybh98 commented on Feb 25, 2020

@joybh98
Contributor

Hey all,
So I was going through the tests in test_store.py
and the tests have lines

assert node.filters.complevel == 0
assert node.filters.complib is None

These tests are checking for conditions and returning true or false, but not changing the values for these 2 parameters.
Can I use pytest to set values in case of any errors, and use it to set a "default" value, or should I do it in pytables.py.

I may be wrong about some things, as I am still fairly new to pytest.
Open to any suggestions.

15 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @sathyz@jbrockmendel@datapythonista@mroeschke@yp1996

    Issue actions

      HDF file compression not working · Issue #29310 · pandas-dev/pandas